Showing entries 1 to 10 of 43
10 Older Entries »
Displaying posts with tag: data warehousing (reset)
On Apache Ignite, Apache Spark and MySQL. Interview with Nikita Ivanov

“Spark and Ignite can complement each other very well. Ignite can provide shared storage for Spark so state can be passed from one Spark application or job to another. Ignite can also be used to provide distributed SQL with indexing that accelerates Spark SQL by up to 1,000x.”–Nikita Ivanov.

I have interviewed Nikita Ivanov,CTO of GridGain.
Main topics of the interview are Apache Ignite, Apache Spark and MySQL, and how well they perform on big data analytics.

RVZ

Q1. What are the main technical challenges of SaaS development projects?

Nikita Ivanov: SaaS requires that the applications be highly responsive, reliable and web-scale. SaaS development projects face many of the same challenges as …

[Read more]
The stealth success of PostgreSQL

One of the more notable success stories of the open source world is in the field of databases. A company with a strong commitment to open source has seen tremendous growth and success in the enterprise while contributing to a hugely respected open source code base. Who is that? Maybe your first thought was MySQL, now owned by Oracle. But unlike MySQL, this company is actually taking business away from Oracle so effectively that it's seen an 80 percent revenue growth in the last year.

read more

Advantages of weighted lists in RDBMS processing

A list is simply a list of things. The list has no structure, except in some cases, the length of the list may be known. The list may contain duplicate items. In the following example the number 1 is included twice.

Example list:

1
2
3
1


A set is similar to a list, but has the following differences:

  1. The size of the set is always known
  2. A set may not contain duplicates

You can convert a list to a set by creating a 'weighted list'. The weighted list includes a count column so that you can determine when an item in the list appears more than once:

1,2
2,1
3,1

Notice that there are two number 1 values in the weighted list. In order to make insertions into such a list scalable, consider using partitioning to avoid large indexes.

[Read more]
Advantages of weighted lists in RDBMS processing

A list is simply a list of things. The list has no structure, except in some cases, the length of the list may be known. The list may contain duplicate items. In the following example the number 1 is included twice.

Example list:

1
2
3
1


A set is similar to a list, but has the following differences:

  1. The size of the set is always known
  2. A set may not contain duplicates

You can convert a list to a set by creating a 'weighted list'. The weighted list includes a count column so that you can determine when an item in the list appears more than once:

1,2
2,1
3,1

Notice that there are two number 1 values in the weighted list. In order to make insertions into such a list scalable, consider using partitioning to avoid large indexes.

[Read more]
Shard-Query turbo charges Infobright community edition (ICE)

Shard-Query is an open source tool kit which helps improve the performance of queries against a MySQL database by distributing the work over multiple machines and/or multiple cores. This is similar to the divide and conquer approach that Hive takes in combination with Hadoop. Shard-Query applies a clever approach to parallelism which allows it to significantly improve the performance of queries by spreading the work over all available compute resources. In this test, Shard-Query averages a nearly 6x (max over 10x) improvement over the baseline, as shown in the following graph:

One significant advantage of Shard-Query over Hive is that it works with existing MySQL data sets and queries. Another advantage is that it works with all MySQL …

[Read more]
Using Flexviews – part one, introduction to materialized views

If you know me, then you probably have heard of Flexviews. If not, then it might not be familiar to you. I’m giving a talk on it at the MySQL 2011 CE, and I figured I should blog about it before then. For those unfamiliar, Flexviews enables you to create and maintain incrementally refreshable materialized views.

You might be asking yourself “what is an incrementally refreshable materialized view?”. If so, then keep reading. This is the first in a multi-part series describing Flexviews.

edit:
You can find part 2 of the series here: http://www.mysqlperformanceblog.com/2011/03/25/using-flexviews-part-two-change-data-capture/


The output of …

[Read more]
YAPCEU 2010 – Day Two…

After enjoying the excellent hospitality of our host here in Pisa (6 courses) we were ready for our second day at YAPCEU 2010 here in sunny Pisa.

Larry’s new catch phrase “My Language is a four letter word” was the ‘Buzz word’ for today. We settled down to some very interesting talks, the highlight for me being Tim Bunce’s talk on using Devel::NYTProf to Optimize your code. Tim first gave us a quick and dirty overview of optimization which covered the basics of where to start and what to look for he followed up with real examples of Optimizer output and than wrapped up with a few before and after results on an optimization effort.

The rest of the day was dedicated in my opinion, to the future of DBs in with Nelson Ferraz giving an excellent presentation of his concepts for using Perl as to glue for a Data Warehouse application. Next on my agenda, Martin Berends reports on the present state of Perl 6 and interfaces …

[Read more]
Data Warehousing Best Practices: Comparing Oracle to MySQL pt 2

At Kscope this year, I attended a half day in-depth session entitled Data Warehousing Performance Best Practices, given by Maria Colgan of Oracle. My impression, which was confirmed by folks in the Oracle world, is that she knows her way around the Oracle optimizer.

See part 1 for the introduction and talking about power and hardware. This part will go over the 2nd “P”, partitioning. Learning about Oracle’s partitioning has gotten me more interested in how MySQL’s partitioning works, and I do hope that MySQL partitioning will develop to the level that Oracle partitioning does, because Oracle’s partitioning looks very nice (then again, that’s why it costs so much I guess).

Partition – …

[Read more]
Data Warehousing Best Practices: Comparing Oracle to MySQL pt 1

At Kscope this year, I attended a half day in-depth session entitled Data Warehousing Performance Best Practices, given by Maria Colgan of Oracle. My impression, which was confirmed by folks in the Oracle world, is that she knows her way around the Oracle optimizer.

These are my notes from the session, which include comparisons of how Oracle works (which Maria gave) and how MySQL works (which I researched to figure out the difference, which is why this blog post took a month after the conference to write). Note that I am not an expert on data warehousing in either Oracle or MySQL, so these are more concepts to think about than hard-and-fast advice. In some places, I still have questions, and I am happy to have folks comment and contribute what they know.

One interesting point brought up:
Maria quoted someone (she said the name but I did not grab it) from …

[Read more]
CAOS Theory Podcast 2010.02.05

Topics for this podcast:

*Matt Asay moves from Alfresco to Canonical
*GPL fade fuels heated discussion
*Apple’s iPad and its enterprise and open source impact
*Open source in data warehousing and storage
*Our perspective on Oracle’s plans for Sun open source

iTunes or direct download (32:50, 9.2 MB)

Showing entries 1 to 10 of 43
10 Older Entries »