Home |  MySQL Buzz |  FAQ |  Feeds |  Submit your blog feed |  Feedback |  Archive |  Aggregate feed RSS 2.0 English Deutsch Español Français Italiano 日本語 Русский Português 中文
Showing entries 1 to 30 of 41 Next 11 Older Entries

Displaying posts with tag: data warehousing (reset)

The stealth success of PostgreSQL
+0 Vote Up -0Vote Down

One of the more notable success stories of the open source world is in the field of databases. A company with a strong commitment to open source has seen tremendous growth and success in the enterprise while contributing to a hugely respected open source code base. Who is that? Maybe your first thought was MySQL, now owned by Oracle. But unlike MySQL, this company is actually taking business away from Oracle so effectively that it's seen an 80 percent revenue growth in the last year.

read more

Advantages of weighted lists in RDBMS processing
+1 Vote Up -0Vote Down
A list is simply a list of things. The list has no structure, except in some cases, the length of the list may be known. The list may contain duplicate items. In the following example the number 1 is included twice.

Example list:
1
2
3
1

A set is similar to a list, but has the following differences:
  • The size of the set is always known

  • A set may not contain duplicates

  • You can convert a list to a set by creating a 'weighted list'. The weighted list includes a count column so that you can determine when an item in the list appears more than once:
    1,2
    2,1
    3,1
    
    Notice that there are two number 1 values in the weighted list. In order to make insertions into such a list scalable, consider using partitioning to avoid large indexes.








      [Read more...]
    Shard-Query turbo charges Infobright community edition (ICE)
    +2 Vote Up -1Vote Down

    Shard-Query is an open source tool kit which helps improve the performance of queries against a MySQL database by distributing the work over multiple machines and/or multiple cores. This is similar to the divide and conquer approach that Hive takes in combination with Hadoop. Shard-Query applies a clever approach to parallelism which allows it to significantly improve the performance of queries by spreading the work over all available compute resources. In this test, Shard-Query averages a nearly 6x (max over 10x) improvement over the baseline, as shown in the following graph:

    One



      [Read more...]
    Using Flexviews – part one, introduction to materialized views
    +2 Vote Up -0Vote Down

    If you know me, then you probably have heard of Flexviews. If not, then it might not be familiar to you. I’m giving a talk on it at the MySQL 2011 CE, and I figured I should blog about it before then. For those unfamiliar, Flexviews enables you to create and maintain incrementally refreshable materialized views.

    You might be asking yourself “what is an incrementally refreshable materialized view?”. If so, then keep reading. This is the first in a multi-part series describing Flexviews.

    edit:
    You can find part 2 of the series here:

      [Read more...]
    YAPCEU 2010 – Day Two…
    +1 Vote Up -0Vote Down

    After enjoying the excellent hospitality of our host here in Pisa (6 courses) we were ready for our second day at YAPCEU 2010 here in sunny Pisa.

    Larry’s new catch phrase “My Language is a four letter word” was the ‘Buzz word’ for today. We settled down to some very interesting talks, the highlight for me being Tim Bunce’s talk on using Devel::NYTProf to Optimize your code. Tim first gave us a quick and dirty overview of optimization which covered the basics of where to start and what to look for he followed up with real examples of Optimizer output and than wrapped up with a few before and after results on an optimization effort.

    The rest of the day was dedicated in my opinion, to the future of DBs in with Nelson Ferraz giving an excellent presentation of his concepts for using Perl as to glue for a Data Warehouse application. Next on my agenda,

      [Read more...]
    Data Warehousing Best Practices: Comparing Oracle to MySQL pt 2
    +5 Vote Up -3Vote Down

    At Kscope this year, I attended a half day in-depth session entitled Data Warehousing Performance Best Practices, given by Maria Colgan of Oracle. My impression, which was confirmed by folks in the Oracle world, is that she knows her way around the Oracle optimizer.

    See part 1 for the introduction and talking about power and hardware. This part will go over the 2nd “P”, partitioning. Learning about Oracle’s partitioning has gotten me more interested in how MySQL’s partitioning works, and

      [Read more...]
    Data Warehousing Best Practices: Comparing Oracle to MySQL pt 1
    +4 Vote Up -3Vote Down

    At Kscope this year, I attended a half day in-depth session entitled Data Warehousing Performance Best Practices, given by Maria Colgan of Oracle. My impression, which was confirmed by folks in the Oracle world, is that she knows her way around the Oracle optimizer.

    These are my notes from the session, which include comparisons of how Oracle works (which Maria gave) and how MySQL works (which I researched to figure out the difference, which is why this blog post took a month after the conference to write). Note that I am not an expert on data warehousing in either Oracle or MySQL, so these are more concepts to think about than

      [Read more...]
    CAOS Theory Podcast 2010.02.05
    +0 Vote Up -0Vote Down

    Topics for this podcast:

    *Matt Asay moves from Alfresco to Canonical
    *GPL fade fuels heated discussion
    *Apple’s iPad and its enterprise and open source impact
    *Open source in data warehousing and storage
    *Our perspective on Oracle’s plans for Sun open source

    iTunes or direct download (32:50, 9.2 MB)

    Some scaling observations on Infobright
    +0 Vote Up -0Vote Down

    A couple of days ago, Baron Schwartz posted some simple load and select benchmarking of MyISAM, Infobright and MonetDB, which Vadim Tkachenko followed up with a more realistic dataset and interesting figures where MonetDB beat Infobright in most queries.

    Used to the parallel IEE loader, I was surprised by the apparent slow loading speed of Baron's benchmark and decided to try and replicate it. I installed Infobright 3.2 on my laptop (see, this is very unscientific) and wrote a simple perl script to generate and load an arbitrarily large data set resembling Baron's description. I'm not going to post my exact numbers, because

      [Read more...]
    A peek under the hood in Infobright 3.2 storage engine
    +0 Vote Up -0Vote Down

    I've been meaning to post some real-world data on the performance of the Infobright 3.2 release which happened a few weeks ago after an extended release candidate period. We're just preparing our upgrades now, so I don't have any performance notes over significant data sets or complicated queries to post quite yet.

    To make up for that, I decided to address a particular annoyance of mine in the community edition, first because it hadn't been addressed in the 3.2 release (and really, I'm hoping doing this would include it into 3.2.1), and second, simply because the engine being open source means I can. I feel being OSS is one of Infobright's biggest strengths, in addition to being a pretty amazing piece of performance for such a simple,

      [Read more...]
    Scalable Star Schema Benchmark (SSB) Join Metrics
    +1 Vote Up -0Vote Down

    We ran a quick scalability test of Calpont join behavior across using a Star Schema Benchmark data set at a scale factor of 1000. The Star Schema Benchmark transforms a TPC-H / DBT-3 data to a more standardized data warehouse star schema data model, and the 1000 scale factor includes 6 billion rows in the primary fact table. Information on the star schema bench (SSB) can be found at http://www.cs.umb.edu/~xuedchen/research/publications/DataWarehousePerformanceDissertationProposal.pdf .

    -----------------------------------------------------------------------------------------------------------------------------------------
    -- Note that these queries are run without any tuning or indices created for these joins or filters.
    -- Basically, this is just 1) Create tables



      [Read more...]
    Free Kimball Group Data Warehousing Educational Webinar
    +0 Vote Up -0Vote Down

    We’re sponsoring an important webinar series along with Sun/MySQL starting this week on June 25th – The Kimball Group Data Warehousing Educational Webinar Series.  This webinar series will introduce the audience to data warehousing concepts and best practices, and will cover the history and evolution of data warehousing, provide an overview of dimensional modeling, and review the full life cycle of designing and implementing a data warehouse.  Part 1, on June 25th at 1:00P PDT, is on Data Warehousing Fundamentals.

    There are two key reasons why we think this webinar series is important:

    • First, we believe this webinar further advances data warehousing in the MySQL world. There is a whole new generation of database developers in the MySQL community that are at various stages of understanding data
      [Read more...]
    Free Kimball Group Data Warehousing Educational Webinar
    +0 Vote Up -0Vote Down
    We're sponsoring an important webinar series along with Sun/MySQL starting this week on June 25th - The Kimball Group Data Warehousing Educational Webinar Series.  This webinar series will introduce the audience to data warehousing concepts and best practices, and will cover the history and evolution of data warehousing, provide an overview of dimensional modeling, and review the full life cycle of designing and implementing a data warehouse.  Part 1, on June 25th at 1:00P PDT, is on Data Warehousing Fundamentals. There are two key reasons why we think this webinar series is important:
    • First, we believe this webinar further advances data warehousing in the MySQL world. There is a whole new generation of database developers in the MySQL community that are at various stages of understanding data warehousing -
      [Read more...]
    451 CAOS Links 2009.06.09
    +0 Vote Up -0Vote Down

    Vyatta raises series C funding. Greenplum launches data cloud initiative. Fedora 11. And more.

    Follow 451 CAOS Links live @caostheory

    # Vyatta raised $10m in series C round, led by Citrix.

    # Carlo Daffara published Horses, carriages and cars an assessment of the shifting OSS business models, and a proposal of what is the optimal model.

    # Greenplum delivered version 3.3 of its analytical database, launched its Enterprise Data Cloud initiative.

    # Daniel Abadi asked whether betting on the MySQL mass market for data warehousing a good idea.

    # Roberto Galoppini reported on open source adoption in Italian

      [Read more...]
    What we're looking for in a data integration tool
    +0 Vote Up -0Vote Down

    As our data warehousing process grows and the workflows get more complex, we've revisited the question of what tools to use in this process. Out of curiosity, I had a look at basing such a process on Hadoop/Hive for scalability reasons, but the lack of mature tools and the sacrifices on efficiency that would entail meant we're better off using something else as long as a distributed processing platform is the only thing that can get the job done. I'm also curious about the transition to continuous integration, a model I noticed showing up a couple of years ago and now getting some air under its wings as CEP, IBM's

      [Read more...]
    Three domains of data
    +1 Vote Up -0Vote Down

    My MySQL Conference presentation on Tuesday discussed my practical findings on how Infobright's technology works in developing a MySQL-based data warehouse. I also touched on a more high-level question of how to select a technology for a different kinds of data-related problem areas, and this article expands on that discussion.

    As pointed out by several other speakers at the conference, the balance of CPU, memory and storage has changed significantly in the last 10 years. Two important throughput factors on a per-thread basis have flattened out: CPU cycles per second are in fact dropping as power and cooling have become limiting factors, and the number of IO operations per device have only been increasing linearly, though Flash technologies have leaped on the latter front. However, two  [Read more...]
    Kickfire Launches MySQL Appliance for Data Warehousing Mass Market
    +0 Vote Up -0Vote Down

    The Kickfire MySQL Appliance is offically launched!

    We just announced today, along with a new customer, and strategic partnerships with ten leading service companies including Percona, the MySQL performance experts.

    Look for more news next week from Kickfire as we head into the MySQL conference. Kickfire will also give a keynote on the first day of the conference and will make a surprise announcement! Stay tuned …

    Real Time Data Warehousing Presentation and Video
    +0 Vote Up -0Vote Down

    At the March Boston MySQL User Group meeting, Jacob Nikom of MIT’s Lincoln Laboratory presented “Optimizing Concurrent Storage and Retrieval Operations for Real-Time Surveillance Applications.” In the middle of the talk, Jacob said he sometimes calls what he did in this application as “real-time data warehousing”, which was so accurate I decided to give that title to this blog post.

    The slides can be downloaded in PDF format (1.3 Mb) at http://www.technocation.org/files/doc/Concurrent_database_performance_02.pdf. The 54 minute video can be downloaded (644Mb) at http://technocation.org/node/693/download or streamed directly in your browser at

      [Read more...]
    On the need for an agile approach to data warehousing
    +0 Vote Up -0Vote Down

    I’d like to take a step back from technical issues to distill some of my thoughts on the challenges of data warehousing in the 21st century.

    Having worked on a number of warehouse projects in different industries over the years, I’ve encountered many challenges, some failures, some successes. One thing is certain: all organizations that have a reasonable amount of data should be building a data warehouse if they don’t already have one. In 2009, given the economic atmosphere, no one wants to wait as long, or pay as much, as they did in 1999 to get one.

    While this is a huge opportunity for open-source competitors like MySQL, it comes with big challenges for an organization that thinks it will get a $10MM warehouse (in 1999 dollars) for $300,000 (2009 dollars).

    My contention is that in a web-connected, high-traffic and high-speed world, a monolithic approach with a rigid set of

      [Read more...]
    Kickfire Ships to First Web 2.0 Customer
    +0 Vote Up -0Vote Down

    We just shipped and installed the Kickfire appliance in the data center of our first web 2.0 customer this week. We’re very excited about this new customer. With already over a million active members, this company continues to grow in spite of a challenging economic environment because it has a clearly defined audience and a business model which adds value to its members while adding money to its coffers. Part of the value add to their member base comes from well-targeted discount and coupon offers. In order to achieve this, the company runs complex analytics to understand members’ behaviors and responses and uses this data to help its advertising customers better target their offers.

    As with many web 2.0 companies, this customer has built its application on MySQL. MySQL has helped them scale their web application well but was presenting performance and scalability challenges for their

      [Read more...]
    Looking for a ETL engineer for our BI team
    +0 Vote Up -0Vote Down

    So, I mentioned earlier that I was looking at Infobright's Brighthouse technology as a storage backend for heaps and heaps of traffic and user data from Habbo. Turns out it works fine (now that it's in V3 and supports more of the SQL semantics), and we took it into use. Been pretty happy with that, and I expect to talk more about the challenge and our solution at the next MySQL Conference in April 2009.

    However, our DWH team needs extra help. If you're interested in solving business analytics problems by processing lots of data and the idea of working in a company that leads the virtual worlds industry excites you, let us know by sending us an application. Thanks for reading!

    New, New, New … News at Kickfire
    +0 Vote Up -0Vote Down

    It’s been a crazy month here at Kickfire which is why I have fallen a bit behind on my postings – a new product, new customers, a new CEO, a new relationship with Sun/MySQL, a new website … and a new baby girl! Here’s a quick summary of all that has been going on:

    New Product
    We quietly came out of beta a month ago. After nearly two and half years in development, this is a great achievement for the company. The team took on a hugely ambitious project: to re-design how SQL is processed today to be able to deliver an order of magnitude improvement in price/performance relative to any other data warehousing solution on the market. This project involved bringing together over 50 of the industry’s smartest database and hardware engineers to build a new type of database machine that includes the world’s first SQL chip, an ultra-modern database kernel,

      [Read more...]
    Infobright Review – Part 2
    +0 Vote Up -0Vote Down

    First, a retraction, it turns out that the performance problem with datatimes in the previous article wasn’t due to high cardinality (I speculated too much here), but due to a type conversion issue.  From a helpful comment from Victoria Eastwood of Infobright (a good sign for a startup), the Infobright engine considered ‘2001-01-01’ to be a date, not a datetime, and it couldn’t do a conversion to a datetime.  Instead it pushed the date filtering logic from the Infobright engine to MySQL.  Effectively, the slow queries were a table scan.   The solution is to add the 00:00:00 to the dates to make them datetimes.  

    With that in mind, here are some much better numbers for Infobright.   For Infobright this query took 0.05 seconds. 

    1) Select sum(unit) from Sale where

      [Read more...]
    More Good News for Data Warehousing on MySQL
    +0 Vote Up -0Vote Down

    Last week, Infobright announced it had open sourced its data warehousing software code. This is good news for the growing number of organizations looking to use MySQL as a data warehousing platform. According to IDC, MySQL is already the third-most deployed database for data warehousing and Infobright’s move will give users yet another reason to seriously consider MySQL for this application.

    For those of you not familiar with the Infobright offering, it is essentially a column-oriented data store for data warehousing. While the column-oriented approach is not exclusive to Infobright (Kickfire’s MySQL storage engine is also column-oriented, as are some other non-MySQL data warehousing solutions on the market) Infobright does have some unique technology that Lou Agosta recently described as follows in his post on Trends in Data

      [Read more...]
    An Infobright Review
    +0 Vote Up -0Vote Down

    With open source software I can install reasonably complete software and try it with my data. This way I get to see how it works in a realistic setting without having to rely on benchmarks and hoping they are a good match for my environment. And I get to do this without having to deal with commercial software sales people.

    So I glad to hear the Infobright had gone open source as I have been wanting test a column based database for a while. I was even happier that it was a MySQL based engine as I would already know many of the commands. I decided to run some of the same tests I had run when comparing InnoDB and MyISAM for reporting (http://dbscience.blogspot.com/2008/08/innodb-suitability-for-reporting.html ).  InnoDB performed better than MyISAM in my reporting

      [Read more...]
    A New Business Model for Open Source?
    +0 Vote Up -0Vote Down

    Kickfire was recently selected by Network World as one of 10 Open Source Companies to Watch. First of all, the disclaimer: we are not an open source company. As any of you reading this blog know, Kickfire is an appliance company. So, why then did we appear on the list? The link of course is MySQL.

    The Kickfire appliance was built to run MySQL for high-performance business intelligence and data warehousing workloads. So, while we are not an open source company, we are very much what I would term as an “open source-based business”. Now, for those who track the data warehousing market, it might seem that a lot of vendors could claim that mantle as a large proportion have code that is derived from PostgreSQL. However, that’s not what I mean by an open

      [Read more...]
    InnoDB's Suitability for Reporting
    +0 Vote Up -0Vote Down

    I started using Oracle, a MVCC database, to develop reporting (data warehousing, BI, take your pick) systems years ago.  I’ve come to appreciate the scalability improvements that MVCC provides, particularly for pseudo real-time reporting applications, the ones where loads are occurring at the same time as report generation.  So when people say InnoDB, partly due to MVCC, isn’t as good as MyISAM for reporting I had to look into this in more detail.

    What I found is InnoDB is a good engine for reporting.  In some ways, such as performance, it is at times better than MyISAM, and one of the downsides, such as a larger disk requirement, can be mitigated.  The trick is to for the primary key to be the one predominant access path.  In this example, the InnoDB clustered index, is purchaseDate and another column,

      [Read more...]
    When VLSI meets DBMS: The Story behind the World’s First SQL Chip
    +0 Vote Up -0Vote Down

    In April this year, Kickfire announced the first high-performance appliance for MySQL. As part of the announcement, the company released data warehouse benchmark results that broke prior records in terms of price/performance and performance in a non-clustered environment. While the creation of a new appliance built exclusively for MySQL along with the benchmark records was noteworthy, perhaps the bigger story lies in what we believe to be the beginning of a paradigm shift in the database world - one marked by the advent of the first SQL chip.

    To give some context to this story I have included a graph below which depicts the evolution of VLSI (Very-Large-Scale Integration) semiconductor technology and its growing impact on a broadening range of industries.

      [Read more...]
    Why $20 million for Kickfire?
    +0 Vote Up -0Vote Down

    As Matt Asay recently mentioned in his post about Kickfire, the company just closed a Series B for $20 million. In today’s credit-scarce market where VC funding is flat/declining, $20 million is a lot of money, especially for a company whose product is still in beta. What’s more, there seems to be an investment bubble in the broader data warehousing space in which Kickfire participates (at last count, there were over two dozen vendors, the majority of which are relatively new entrants) and that bubble looks like it is starting to burst as witnessed by Microsoft’s recent acquisition of DATAllegro. So, are the Kickfire investors

      [Read more...]
    A New Hardware-Based Approach to Data Warehousing
    +0 Vote Up -0Vote Down

    My name is Ravi Krishnamurthy - I am the Chief Software Architect here at Kickfire. I’ll be blogging about our thoughts on database technologies for data warehousing. More specifically I’ll be talking about current challenges, directions going forward, and the simplifications for wider market deployments and other ideas.

    Data Warehouse (DW) queries are known to be more complex, more demanding, and longer running than OLTP queries. Some of the distinctive features of these DW queries that produce these characteristics are:

    1) Table scan: Most OLTP queries are point queries updating or inserting a few transactional data. Most DW queries on the other hand are reporting or business intelligence (BI) queries which typically touch large numbers of rows of data, often computed by sequential table scans over the large data sets.

    2) Many/complex joins:

      [Read more...]
    Showing entries 1 to 30 of 41 Next 11 Older Entries

    Planet MySQL © 1995, 2014, Oracle Corporation and/or its affiliates   Legal Policies | Your Privacy Rights | Terms of Use

    Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.