This is the first tutorial of a series dedicated to testing
techniques.
Soon all this material will find a home in a more appropriate
place. In the meantime, enjoy the lecture!
While testing large tables, it is useful to test if two tables
have the same contents. For example, if you want to compare
performance between two tables using different storage engines,
or tables created on different filesystems, you must be sure that
both tables have the same content. Having loaded both from the
same source is not a guarantee that the contents are the same: a
mistake or different SQL modes used during the load may result in
substantial differences.
General conceptsThen, you need to compare two, possibly very
large tables. There are several methods available. One is to run
a query with a LEFT OUTER JOIN. However, this method is likely to
take very long or even exhaust your system resources if your
tables are really large.
…
My new Google family since I’ve now moved to the US. Ronald, James, Michael, Shona and Kynan (front). James has literally been here in the US only a few days more then me, and he has an amazing Day 1 story.
Apparently there are a reasonable number of Australians now working for Google. Another interesting fact is the male/female ratio is 6:1 which is extremely high in comparison to the IT industry and IT related university studies.
So, mid morning especially after having a heavy and late night
drinking with new friends in Palo Alto I was seeking at Day #3 of
the MySQL
Camp a high-caffeine pick me up drink. Yesterday I had a
Bawls, and after enjoying it I was a little
concerned that when I returned to New York I would not be able to
buy it. You can get it at Think Geek but that’s more complicated then a
local supermarket.
So after getting a Googler to get me to the cafe fridge we find out that there weren’t any there. No problem, lets just go this way I’m told. So we start a quick tour of the larger cafe area and another set of fridges but no Bawls, we keep walking, no more again. At this time the recommendation is I should try a Rockstar, but so …
[Read more]
Today I noticed one of server used for web request profiling
stats logging is taking about 2GB per day for logs, which are
written in MyISAM table without indexes. So I thought it is great
to try how much archive storage engine could help me in this
case.
Original Table:
PLAIN TEXT SQL:
- *************************** 1. row ***************************
- Name: requests_061111
- Engine: MyISAM
- Version: 9
- Row_format: Dynamic
- Rows: 8048913
- Avg_row_length: 252
- Data_length: 2030206088
- Max_data_length: 4294967295
- Index_length: 1024
- Data_free: 0 …
As you probably know PHP "mysql" extension supported persistent connections but they were disabled in new "mysqli" extension, which is probably one of the reasons some people delay migration to this extension.
The reason behind using persistent connections is of course
reducing number of connects which are rather expensive, even
though they are much faster with MySQL than with most other
databases.
Not only connects are expensive but you also may run into the trouble establishing number of connections you need. The problem is there can be only so many connections active between Host "Apache" and Host "MySQL": Port 3306 as connection in TCP/IP protocol is identified by pair of IP addresses and pair of ports (local port and remote port). Yes if you're establishing thousands of connections per second you normally do not keep it open for long time, but Operation System does. According to TCP/IP protocol Ports can't be recycled …
[Read more]
Sitting here listening to a talk on Tags at MySQL Camp I keep
rolling back to the idea that a lot of website problems right now
are highly similar.
Lets take the case of application partitioning. If you want to
build a large application you need to come up with a way to
partition it... how to do this? Do it by user? Do it by time? Do
it by some other object?
The problem is dependent on the application.
What I am surprised is that no one has glued together a single
solution to these types of problems. I know how to take apart
most applications as an engineer in this problem space, but what
I don't see is a framework that simplifies this in a common way
(idea for Rails?).
The other surprise? Yesterday I made an offhand comment about
"image server". AKA creating a farm of machines to just spit out
images/ static content. I've solved this problem enough times
that I thought that the solution …
Early on Sunday Day #3, I dragged a few willing participants out
for a “different photo” based on the umbrellas in each Lobby. It
worked out well. Special thanks to Kynan who ran around to other
lobby’s to find additional umbrellas. (He is the one holding the
white one, and yes, that’s a utility Kilt).
I’ll be uploading more in this series to My Flickr Photos MySQL Camp 01 soon.
You can get a larger copy of image Here.
MySQL Replibeertion was the last scheduled session on Day 2, but not withstanding there was free beer (a lot of), there was a serious side with a Replication Discussion.
One of the first questions by Jeremy was “Are the any big replication users?” to which Sheeri quickly replied “Are you calling me fat again”.
This was a highly interactive session, here are some of the points from the audience.
Some Uses of Replication
- Backup
- Hot standby
- Scaling
- Data Warehousing
- Slaves are larger then your database
- For no special reason
- Consolidation of multiple sources
- Support for multiple indexes
Issues
- Can break
- replication lag
- bi-directional …
Continuing on from my Day 1 - Memorable Quotes from the MySQL Camp.
“Are there any big replication users” — Jeremy “Are you calling me fat again” — Sheeri
“Only some of us have problems with interruptions.” — Jeremy to Jay
“It really really sucks in production systems.” — Jeremy About Slave management by Master.
“So there are like 12 people here, it must be the CEO’s turn to talk.” — Marten Mickos MySQL CEO
“Kegs and Eggs” — Joel S. Regarding all beer that will still be available at breakfast tomorrow.
“You can fight to the death for it”, Jeremy to his two employees Joel and Justin about who gets to be called employee #1.
“Patches go to employee #1″ — Ronald directed to Joel when a replication patch was coined by Jeremy and …
[Read more]One think I can see with people using EXPLAIN is trusting it too much, ie assuming if number of rows is reported by EXPLAIN is large query must be inefficient. It may not be the case.
The question is not only about stats which may be wrong and which is why you may want to profile your queries if you have any hesitations in EXPLAIN accuracy.
The other problem however is EXPLAIN does not take LIMIT into account while estimating number of rows. Basically it gives you the estimates of producing whole result set while it may stop much faster in case LIMIT is used.
Take a look at this simple example:
PLAIN TEXT SQL:
- mysql> EXPLAIN SELECT * FROM rnd ORDER BY r DESC LIMIT 1 \G
- *************************** 1. row ***************************
- id: 1
- select_type: SIMPLE
- …