“For analytical things, eventual consistency is ok (as long as you can know after you have run them if they were consistent or not). For real world involving money or resources it’s not necessarily the case.” — Michael “Monty” Widenius. In a recent interview, I asked Justin Sheehy, Chief Technology Officer at Basho Technologies, maker [...]
“While I believe that one size fits most, claims that RDBMS can no longer keep up with modern workloads come in from all directions. When people talk about performance of databases on large systems, the root cause of their concerns is often the performance of the underlying B-tree index”– Martín Farach-Colton. Scaling MySQL and MariaDB [...]
Earlier this week we all read GigaOM's article with this title:
"Why the days are numbered for Hadoop as we know it"I know GigaOM like to provoke scandals sometimes, we all remember some other unforgettable piece, but there is something behind it...
Hadoop today (after SOA not so long ago) is one of the worst case of an abused buzzword ever known to men. It's everything, everywhere, can cure illnesses and do "big-data" at the same time! Wow! Actually Hadoop is a software framework that supports data-intensive distributed applications, derived from Google's MapReduce and Google File System (GFS) papers.
My take from the article is this: Hadoop is a foundation, low-level platform. I used the word …
In my previous post,http://database-scalability.blogspot.com/2012/05/oltp-vs-analytics.html, I reviewed the differences between OLTP and Analytics databases.
Scale challenges are different between those 2 worlds of databases.
Scale challenges in the Analytics world are with the growing amounts of data. Most solutions have been leveraging those 3 main aspects: Columnar storage, RAM and parallelism.
Columnar storage makes scans and data filtering more precise and focused. After that – it all goes down to the I/O - the faster the I/O is, the faster the query will finish and bring results. Faster disks and also SSD can play good role, but above all: RAM! …
So I’ve been doing a fair number of automated load tests these past six months. Primarily with Sysbench, which is a fine, fine tool. First I started using some simple bash based loop controls to automate my overnight testing, but as usually happens with shell scripts they grew unwieldy and I rewrote them in python. Now I have some flexible and easily configurable code for sysbench based MySQL benchmarking to offer the community. I’ve always been a fan of giving back to such a helpful group of people – you’ll never hear me complain about “my time isn’t free”. So, let me know what you want in an ideal testing environment (from a load testing framework automation standpoint) and I’ll integrate it into my existing framework and then release it via the BSD license. The main goal here is to have a standardized modular framework, based on sysbench, that allows anyone to compare their server performance via repeatable tests. It’s fun to see …[Read more]
Letting data speak for itself through analysis of entire data sets is eclipsing modeling from subsets. In the past, all too often what were once disregarded as "outliers" on the far edges of a data model turned out to be the telltale signs of a micro-trend that became a major event. To enable this advanced analytics and integrate in real-time with operational processes, companies and public sector organizations are evolving their enterprise architectures to incorporate new tools and approaches.
Whether you prefer "big," "very large," "extremely large," "extreme," "total," or another adjective for the "X" in the "X Data" umbrella term, what's important is accelerated growth in three dimensions: volume, complexity and speed.
Big data is not without its limitations. Many organizations need to revisit business processes, solve data silo challenges, and invest in visualization and collaboration tools to make big data understandable and …[Read more]
This is a follow up to my previous post titled “MySQL analytics: information_schema polling for table engine percentages”. Here’s an updated query with more output and quicker execution time. What you get: innodb table space utilization percentage, data+index usage total and per innodb/myisam engine, innodb data/index/percentage, myisam data/index/percentages, and overall percentage values. Rather useful for profiling your table engine usage.
I think most people will agree that one of the biggest advantages of MySQL Community Server is that it’s free. Being free doesn’t get you a multi-million user community though; MySQL offers a great array of transactional engines, advanced high-availability features, robust I/O performance, and it powers many of the top-500 internet sites. When it […]
If you’ve ever needed to know how the data and index percentages per table engine were laid out on your MySQL server, but didn’t have the time to write out a query… here it is!
select (select (sum(DATA_LENGTH)+sum(INDEX_LENGTH))/(POW(1024,3)) as total_size from tables) as total_size_gb, (select sum(INDEX_LENGTH)/(POW(1024,3)) as index_size from tables) as total_index_gb, (select sum(DATA_LENGTH)/(POW(1024,3)) as data_size from tables) as total_data_gb, (select ((sum(INDEX_LENGTH) / ( sum(DATA_LENGTH) + sum(INDEX_LENGTH)))*100) as perc_index from tables) as perc_index, (select ((sum(DATA_LENGTH) / ( sum(DATA_LENGTH) + sum(INDEX_LENGTH)))*100) as perc_data from tables) as perc_data, (select ((sum(INDEX_LENGTH) / ( sum(DATA_LENGTH) + sum(INDEX_LENGTH)))*100) as perc_index from tables where ENGINE='innodb') as innodb_perc_index, (select ((sum(DATA_LENGTH) / ( sum(DATA_LENGTH) + sum(INDEX_LENGTH)))*100) as perc_data from tables where ENGINE='innodb') as …[Read more]
A new version of Kontrollbase – the enterprise monitoring, analytics, reporting, and historical analysis webapp for MySQL database administrators and advanced users of MySQL databases – is available for download. There are several upgrades to the reporting code with improved alert algorithms as well as a new script for auto-archiving of the statistics table based […]