And the results are in:
Innodb (no compression/original) - 660Gb
RocksDB - 209Gb
TokuDB (snappy) - 144Gb
TokuDB (LZMA) - 67Gb
Benchmark performance with mysqlslap on production sample queries :
(8-9 Very quick SELECTs + 1-2 medium SELECTs)
Avg: 0.100 seconds
Min: 0.091 seconds
Max: 0.182 seconds
Avg: 0.100 seconds
Min: 0.089 seconds
Max: 0.183 seconds
Avg: 0.113 seconds
Min: 0.104 seconds
Max: 0.164 seconds
Avg: 0.099 seconds
Min: 0.090 seconds
Max: 0.155 seconds
10 Older Entries »
And the results are in:
Things I look for when optimising or debugging a Linux OS:
- IOschedular (noop or deadline)
- Linux Kernel > 3.18 (multi queuing)
- IRQbalance > 1.0.8
- File System: noatime, nobarrier
- ext4: data=ordered
- xfs: 64k
- logfiles in different partition (if possible)
- Swapiness (0 or 1, depending)
- Jemalloc (if needed)
- Transparent hugepages - disabled
- Ulimit (open files) >1000
- PAM security
- Raid Controller/Smart HBA
- write-cache enabled
- battery backed
- For HP servers: hpssacli controller all show (detail|status)
- Tweak cache-ratio to 50/50 or 75/25 (and test)
Update: I included the results for when PCID is disabled, for comparison, as a worse case scenario.
After learning about Meltdown and Spectre, I waited patiently to get a fix from my OS vendor. However, there were several reports of performance impact due to the kernel mitigation- for example on the PostgresQL developers mailing list there was reports of up to 23% throughput loss; Red Hat engineers report a regression range of 1-20%, but setting OLTP systems as the worse type of workload. As it will be highly dependent on the hardware and workload, I decided of doing some test myself for the …[Read more]
Last year, I read 'The Phoenix Project'. I liked it and as an IT
manager in the past, I did experience high blood pressure during
the SEV1 scenarios in the book.
I also liked the way DevOps methodology helped solve issues with IT as well as help the company succeed overall.
As a DBA, however, I did have some things that didn't make sense to me about this story. Bare in mind that the two major incidents in the book were database related. So in this post, I would like to jot down some things I have noticed and how they could have been solved looking at them from a different lens.
Caution, Spoiler Alert
Incident No.1 - Tokenisation
In the first incident, a 3rd party supplier ran a script …
When I consult at a company, I aim to identify issues with their
database and give options on how to solve them.
However, sometimes implementing those solutions may be a more lengthy process than it needs to be and sometimes they may not be implemented at all. During my career, I have observed some reasons as to why that might happen within organizations.
Obviously, the following observations will never happen at your company. I am just writing about them so that you might notice them in other places.
1. Legacy code
People don't like to have anything to do with legacy code. It’s painful. It’s difficult. It’s risky to change. It runs business critical functions. Worse of all, they didn’t write it. This can be a problem as often, the most cripling database issues require changes to legacy code.
2. New Technologies or Methods
People don’t like you to introduce any …
Setting up databases in development environments can be
Normally, what I usually see is some automated process for setting up empty databases with up-to-date data structures. This is helpful for integration testing, but is tricky for actual development as well as performance testing.
- It is difficult to conceptually get your head around writing a query when you cannot see any data in your tables
- You cannot possibly know if your query is slow before you deploying it to production without running it against 'some' data.
Relevant Post: How to Not be the One that Deploys that Slow Query
In addition, there can be a strict requirement to not let sensitive customer data be available outside certain secure environments and …
A counter table is a table that keeps counts of particular items
or for certain keys. This can range from page count on your blog
to keep track of a limit the user is allowed to have from a
particular item or service.
Usually, a counter table would be better kept in something like Memcached or Redis as frequent increment updates would be better suited to those in-memory systems.
MySQL and Innodb in particular has many stringent systems to make sure that your data has been reliably written to disk. Just going through those systems alone, can make having a counter table, not suitable, not even considering the speed it takes to update the actual table.
However, sometimes there is a need for certain assurances from failure scenarios where in-memory …
There is an idea that data is sacred and needs to be stored
forever. However, if you keep your data forever, you will, sooner
or later, have a very large database.
In order to keep operations running smoothly, it would be useful to allocated data that is used more frequently in certain tables and keep data that is used less frequently in archive tables.
You have a large table that stores transactions and it's size is 200Gb. It is that way, because your company has been around for 5 years, but in the last year, your company has been successful acquiring new users and your data has doubled.
In your database, you now have a table that has 5 years worth of data, but your application usually only needs about the last 1-3 months. There may be a use case where someone might require data about a customer for a period starting a year ago and there may …
Have you ever deployed code to production that slowed the
database and the entire site down?
Did you get a feeling of anxiety when you just read that?
Well, it could be to do with your test environment.
Most places nowadays have test suites that will check for integration issues. But in very few will check for performance.
The issue lies with how developers are set up to develop code. They have their code on their local computer with a local copy of an empty database where they develop against. That setup will not give you any useful feedback about how your code will performs once its run against the production database.
How do you get Performance Feedback for your Queries?
Whenever you log into your database, lets say MySQL, you get 3 types of feedback:
10 Older Entries »