Planet MySQL

Displaying posts with tag: data warehousing (reset)

Jun

2017

On Apache Ignite, Apache Spark and MySQL. Interview with Nikita Ivanov

Posted by Roberto V. Zicari on Fri 30 Jun 2017 13:40 UTC
Tags:

Uncategorized, sql, memcached, data warehousing, analytics, hadoop, mysq, Gridgain, SaaS, big data, vertica, redis, internet of things, machine learning, Tableau, Apache Ignite, Nikita Ivanov, proxysql, Apache Spark, vitess, ClickHouse, Apache Ignite In-Memory SQL Grid, Apache Kafka, ETL processes, in-memory computing, in-memory data grids, Spark Streaming

“Spark and Ignite can complement each other very well. Ignite can provide shared storage for Spark so state can be passed from one Spark application or job to another. Ignite can also be used to provide distributed SQL with indexing that accelerates Spark SQL by up to 1,000x.”–Nikita Ivanov.

I have interviewed Nikita Ivanov,CTO of GridGain.
Main topics of the interview are Apache Ignite, Apache Spark and MySQL, and how well they perform on big data analytics.

RVZ

Q1. What are the main technical challenges of SaaS development projects?

Nikita Ivanov: SaaS requires that the applications be highly responsive, reliable and web-scale. SaaS development projects face many of the same challenges as …

[Read more]

Jul

2012

The stealth success of PostgreSQL

Posted by InfoWorld on Fri 13 Jul 2012 10:00 UTC
Tags:

sql, open source software, data warehousing, cloud computing, Enterprise Architecture

One of the more notable success stories of the open source world is in the field of databases. A company with a strong commitment to open source has seen tremendous growth and success in the enterprise while contributing to a hugely respected open source code base. Who is that? Maybe your first thought was MySQL, now owned by Oracle. But unlike MySQL, this company is actually taking business away from Oracle so effectively that it's seen an 80 percent revenue growth in the last year.

Jun

2011

Advantages of weighted lists in RDBMS processing

Posted by Justin Swanhart on Fri 17 Jun 2011 21:17 UTC
Tags:

database, data warehousing, computing, dataset, aggregation, MySQL

A list is simply a list of things. The list has no structure, except in some cases, the length of the list may be known. The list may contain duplicate items. In the following example the number 1 is included twice.

Example list:

A set is similar to a list, but has the following differences:

The size of the set is always known
A set may not contain duplicates

You can convert a list to a set by creating a 'weighted list'. The weighted list includes a count column so that you can determine when an item in the list appears more than once:

1,2
2,1
3,1

Notice that there are two number 1 values in the weighted list. In order to make insertions into such a list scalable, consider using partitioning to avoid large indexes.

…

[Read more]

Jun

2011

Advantages of weighted lists in RDBMS processing

Posted by Justin Swanhart on Fri 17 Jun 2011 21:17 UTC
Tags:

database, data warehousing, computing, dataset, aggregation, MySQL

A set is similar to a list, but has the following differences:

The size of the set is always known
A set may not contain duplicates

You can convert a list to a set by creating a 'weighted list'. The weighted list includes a count column so that you can determine when an item in the list appears more than once:

1,2
2,1
3,1

Notice that there are two number 1 values in the weighted list. In order to make insertions into such a list scalable, consider using partitioning to avoid large indexes.

…

[Read more]

May

2011

Shard-Query turbo charges Infobright community edition (ICE)

Posted by Justin Swanhart of MySQL Performance Blog on Fri 06 May 2011 22:19 UTC
Tags:

Tools, benchmark, storage engine, data warehousing, Benchmarks, sharding, MySQL, Performance

Shard-Query is an open source tool kit which helps improve the performance of queries against a MySQL database by distributing the work over multiple machines and/or multiple cores. This is similar to the divide and conquer approach that Hive takes in combination with Hadoop. Shard-Query applies a clever approach to parallelism which allows it to significantly improve the performance of queries by spreading the work over all available compute resources. In this test, Shard-Query averages a nearly 6x (max over 10x) improvement over the baseline, as shown in the following graph:

One significant advantage of Shard-Query over Hive is that it works with existing MySQL data sets and queries. Another advantage is that it works with all MySQL …

[Read more]

Mar

2011

Using Flexviews – part one, introduction to materialized views

Posted by Justin Swanhart of MySQL Performance Blog on Thu 24 Mar 2011 04:37 UTC
Tags:

tips, data warehousing, olap, flexviews, Insight for DBAs, Insight for Developers, MySQL

If you know me, then you probably have heard of Flexviews. If not, then it might not be familiar to you. I’m giving a talk on it at the MySQL 2011 CE, and I figured I should blog about it before then. For those unfamiliar, Flexviews enables you to create and maintain incrementally refreshable materialized views.

You might be asking yourself “what is an incrementally refreshable materialized view?”. If so, then keep reading. This is the first in a multi-part series describing Flexviews.

edit:
You can find part 2 of the series here: http://www.mysqlperformanceblog.com/2011/03/25/using-flexviews-part-two-change-data-capture/

The output of …

[Read more]

Aug

2010

YAPCEU 2010 – Day Two…

Posted by John Scoles on Fri 06 Aug 2010 18:01 UTC
Tags:

perl, Group Blog Posts, data warehousing, Migration, Not on Homepage, DBD::Oracle, Pythian Appearances, Technical Blog, YAPCEU, MySQL

After enjoying the excellent hospitality of our host here in Pisa (6 courses) we were ready for our second day at YAPCEU 2010 here in sunny Pisa.

Larry’s new catch phrase “My Language is a four letter word” was the ‘Buzz word’ for today. We settled down to some very interesting talks, the highlight for me being Tim Bunce’s talk on using Devel::NYTProf to Optimize your code. Tim first gave us a quick and dirty overview of optimization which covered the basics of where to start and what to look for he followed up with real examples of Optimizer output and than wrapped up with a few before and after results on an optimization effort.

The rest of the day was dedicated in my opinion, to the future of DBs in with Nelson Ferraz giving an excellent presentation of his concepts for using Perl as to glue for a Data Warehouse application. Next on my agenda, Martin Berends reports on the present state of Perl 6 and interfaces …

[Read more]

Jul

2010

Data Warehousing Best Practices: Comparing Oracle to MySQL pt 2

Posted by Sheeri K. Cabral on Thu 29 Jul 2010 21:00 UTC
Tags:

Oracle, Conferences, Pythian, partitioning, data warehousing, partition, hash, data warehouse, range, dw, Technical Blog, Kaleidoscope, odtug, kscope, linear hash partitioning, subpartition, MySQL

At Kscope this year, I attended a half day in-depth session entitled Data Warehousing Performance Best Practices, given by Maria Colgan of Oracle. My impression, which was confirmed by folks in the Oracle world, is that she knows her way around the Oracle optimizer.

See part 1 for the introduction and talking about power and hardware. This part will go over the 2nd “P”, partitioning. Learning about Oracle’s partitioning has gotten me more interested in how MySQL’s partitioning works, and I do hope that MySQL partitioning will develop to the level that Oracle partitioning does, because Oracle’s partitioning looks very nice (then again, that’s why it costs so much I guess).

Partition – …

[Read more]

Jul

2010

Data Warehousing Best Practices: Comparing Oracle to MySQL pt 1

Posted by Sheeri K. Cabral on Thu 29 Jul 2010 20:53 UTC
Tags:

Oracle, Conferences, Pythian, data warehousing, Normalization, schema, data warehouse, throughput, san, dw, parallelism, Orion, Technical Blog, Kaleidoscope, odtug, kscope, 3nf, disk array, disk speed, HBA, LUN, normalize, snowflake schema, star schema, MySQL

These are my notes from the session, which include comparisons of how Oracle works (which Maria gave) and how MySQL works (which I researched to figure out the difference, which is why this blog post took a month after the conference to write). Note that I am not an expert on data warehousing in either Oracle or MySQL, so these are more concepts to think about than hard-and-fast advice. In some places, I still have questions, and I am happy to have folks comment and contribute what they know.

One interesting point brought up:
Maria quoted someone (she said the name but I did not grab it) from …

[Read more]

Feb

2010

CAOS Theory Podcast 2010.02.05

Posted by The 451 Group on Fri 05 Feb 2010 20:23 UTC
Tags:

Oracle, gpl, software, Linux, Apache, OpenOffice, ubuntu, lgpl, eclipse, Google, podcast, opensource, Apple, storage, glassfish, netbeans, data warehousing, Canonical, caostheory, matt aslett, open-source, The 451 Group, the451group, Matt Asay, iphone, developers, OpenOffice.org, Sun Microsystems, alfresco, jay lyman, android, caos theory, chrome, calpont, ipad, Chris Hazelton, Coraid, smartphones, tablets, MySQL

Topics for this podcast:

*Matt Asay moves from Alfresco to Canonical
*GPL fade fuels heated discussion
*Apple’s iPad and its enterprise and open source impact
*Open source in data warehousing and storage
*Our perspective on Oracle’s plans for Sun open source

iTunes or direct download (32:50, 9.2 MB)

Top Authors

Oracle MySQL Blogs

Vendor Blogs

MySQL Links