Showing entries 1 to 10 of 22
10 Older Entries »
Displaying posts with tag: Statistics (reset)
Big Dataset: All Reddit Comments – Analyzing with ClickHouse

In this blog, I’ll use ClickHouse and Tabix to look at a new very large dataset for research.

It is hard to come across interesting datasets, especially a big one (and by big I mean one billion rows or more). Before, I’ve used on-time airline performance available from BUREAU OF TRANSPORTATION STATISTICS. Another recent example is NYC Taxi and Uber Trips data, with over one billion records.

However, today I wanted to mention an interesting dataset I found recently that has been available since 2015. This is Reddit’s comments and submissions dataset, made possible thanks to Reddit’s generous API. The …

[Read more]
Updating InnoDB Table Statistics Manually

In this post, we will discuss how to fix cardinality for InnoDB tables manually.

As a support engineer, I often see situations when the cardinality of a table is not correct. When InnoDB calculates the cardinality of an index, it does not scan the full table by default. Instead it looks at random pages, as determined by options innodb_stats_sample_pages, innodb_stats_transient_sample_pages and innodb_stats_persistent_sample_pages, or …

[Read more]
Multi-Threaded Slave Statistics

In this blog post, I’ll talk about multi-threaded slave statistics printed in MySQL error log file.

MySQL version 5.6 and later allows you to execute replicated events using parallel threads. This feature is called Multi-Threaded Slave (MTS), and to enable it you need to modify the

slave_parallel_workers

 variable to a value greater than 1.

Recently, a few customers asked about the meaning of some new statistics printed in their error log files when they enable MTS. These error messages look similar to the example stated below:

[Note] Multi-threaded slave statistics for channel '': seconds elapsed = 123; events assigned = 57345; worker queues filled over overrun level = 0; waited due a Worker queue full = 0; waited due the total size = 0; waited at clock conflicts = 0 waited …
[Read more]
Some Notes on Index Statistics in InnoDB

In MySQL 5.6 we introduced a huge improvement in the way that index and table statistics are gathered by InnoDB and subsequently used by the Optimizer during query optimization: Persistent Statistics. Some aspects of the way that Persistent Statistics work could be improved further though, and we’d really like your input on that.

How much to sample?

The statistics are gathered by picking some pages semi-randomly, analyzing them, and deriving some conclusions about the entire table and/or index from those analyzed pages. The number of pages sampled can be specified on a per-table basis with the STATS_SAMPLE_PAGES clause. For example:

ALTER TABLE t STATS_SAMPLE_PAGES=500;


This …

[Read more]
What the Mean Really Means

When analyzing response time, or latency, you need much more information than an average provides. The average, commonly the arithmetic mean, shows the index of central tendency. But, as I found in earlier posts, the tendency is often not central, but may be skewed by outliers, or split by multiple modes. How often these factors occur was determined quantitatively, using tests and a survey of hundreds of production servers and different types of latency: over 95% had six-sigma outliers, and at least 20% had multiple modes. While these numerical results are useful, nothing beats a visualization, such as a histogram, …

[Read more]
Modes and Modality

It is a truth universally acknowledged that the average is the index of central tendency. But what if the tendency isn’t central?

I’ve worked many performance issues where the latency or response time was multimodal, and higher-latency modes turned out to be the cause of the problem. Their existence isn’t shown by the average – the arithmetic mean; it could only be seen by examining the distribution as a histogram, density plot, heat map, or frequency trail. Once you know that more than one mode is present, it’s often straightforward to determine what causes the slower mode, by seeing what parameters of …

[Read more]
Fun with Bugs #14 - InnoDB in MySQL 5.6

InnoDB improvements in MySQL 5.6 are well known. One of the key reasons to upgrade to MySQL 5.6 for most users is to get the benefits of improved performance, scalability, new monitoring features and fulltext indexes support in InnoDB.

Is there anything to double check before assuming that InnoDB in MySQL 5.6 is just better than any older version for any practical purposes? Let's review known public InnoDB-specific bug reports. Here is my "Top 10" list, as of MySQL 5.6.12, starting with most recent reports:

  1. Bug #69424  - maybe I miss something (I am not the only one though), but I see no way to continue using raw devices (on Linux at least) to store InnoDB data. You had working raw device in 5.5.32, then you upgrade to 5.6.12 and just can not start MySQL any more. Check this bug for the details and maybe you'll find …
[Read more]
Statistical functions in MySQL

Even in times of a growing market of specialized NoSQL databases, the relevance of traditional RDBMS doesn't decline. Especially when it comes to the calculation of aggregates based on complex data sets that can not be processed as a batch like Map&Reduce. MySQL is already bringing in a handful of aggregate functions that can be useful for a statistical analysis. The best known of this type are certainly:

Read More »

Statistical functions in MySQL

Even in times of a growing market of specialized NoSQL databases, the relevance of traditional RDBMS doesn't decline. Especially when it comes to the calculation of aggregates based on complex data sets that can not be processed as a batch like Map&Reduce. MySQL is already bringing in a handful of aggregate functions that can be useful for a statistical analysis. The best known of this type are certainly:

Read More »

Saving time and increasing service availability with MySQL Enterprise Monitor 2.3

Nowadays MySQL Databases are encapsulated into many mission critical software solutions. Lots of companies host one or many MySQL databases in their data center, sometimes even without knowing it except when the MySQL service is no more available. In order to increase this service availability it is mandatory to have a monitoring solution. Regardles of you are using MySQL Server, MySQL replication or cluster the Oracle/MySQL monitoring solution is called MySQL Enterprise Monitor.

Showing entries 1 to 10 of 22
10 Older Entries »