- Apertus -- open source cinema camera. (via joshua on Delicious)
- A Survey of Collaborative Filtering Techniques -- From basic techniques to the state-of-the-art, we attempt to present a comprehensive survey for CF techniques, which can be served as a roadmap for research and practice in this area. (via bos on Delicious)
- Drizzle Replication using RabbitMQ as Transport -- we're watching the growing use of message queues in web software, and here's an interesting application. (via …
- ChipHacker -- collaborative FAQ site for electronics hacking. Based on the same StackExchange software as RedMonk's FOSS FAQ for open source software.
- Democracy Live -- BBC launch searchable coverage of parliamentary discussion, using speech-to-text. One aspect we're particularly proud of is that we've managed to deliver good results for speech-to-text in Welsh, which, we're told, is unique. I think of this as the start of a They Work For You for video coverage. I'd love to be able to scale this to local government coverage, which is disappearing as local newspapers turn into …
People often ask “what’s the best hardware to run a database on?” And the answer, of course, is “it depends”. With MySQL, though, you can get good performance out of almost any hardware.
If you need *great* performance, and you have active databases with a large data set, here are some statistics on real life databases — feel free to add your own.
We define “large data set” as over 100 Gb, mostly because smaller data sets have an easier time with the available memory on a machine (even if it’s only 8 Gb) and backups are less intrusive — InnoDB Hot Backup and Xtrabackup are not really “hot” backups, they are “warm” backups, because there is load on the machine to copy the data files, and on large, active servers we have found that this load impacts query performance. As for how active a database is, we’ve found that equates to a peak production load of over 3,000 queries per second on a transactional …[Read more]
After the previous post in this caching related series I’ve received many questions on hardware and software configuration of our servers so in this post I’ll describe our server’s configs and the motivation behind those configs.
Since in our setup Squid server uses one-process model (with an asynchronous requests processing) there was no point in ordering multi-core CPUs for our boxes and since we have a lots of pages on the site and the cache is pretty huge all the servers ended up being highly I/O bound. Considering these facts we’ve decided to use the following hardware specs for the servers:
CPU: One pretty cheap dual-core Intel Xeon 5148 (no need
in multiple cores or really high frequencies – even these CPUs
have ~1% avg load)
RAM: 8Gb (basically …
So lets test some different configurations and try and build some best practices around Multiple SSD’s:
Which is better? Raid 5 or Raid 10?
As with regular disks, Raid 10 seems to performance better ( accept for pure reads ). I did get a lot of movement test to test like with the 67% read test -vs- the 75% or 80% tests. But all in all RAID 10 seemed to be the optimal config.
Should you enable the controller cache? One of the things I have found in my single drive tests is that “dumb” controllers tend to give better performance numbers then “smart” controllers. Really expensive controllers tend to have extra logic to compensate for the limitations of traditional disk. So I decided to play with some of the controller options. The obvious one is cache on the controller.
Some tests showed substantially better performance when the disk cache was disabled ( both read & write ).
If better …[Read more]
So in my previous post I showed some benchmarks showing a large drop off in performance when you fill the x-25e. I wanted to followup and say this: even if you do everything correctly ( i.e. leave 50%+ space free, disable controller cache etc ) you may still see a drop in performance if your workload is heavily write skewed. To show this I ran a 100% random read sysbench fileio test over a 12GB dataset (37.5% full ) , the tests were run back-to-back over a several hours , here is what we see:
*Note the scale is a little skewed here ( i start at 2500 reqs ).
Each data point represents 2 million IO’s, so somewhere after about 6 million IO’s we start to drop. At the end it looks like we stabilize around2900-3000 requests per second, an overall drop of about 25%.
The plan was only to do two quick posts on RAID Performance on the X-25e, but this was compelling enough to post on it’s own. So in part I Mark Callaghan asked hey what gives with the SLC Intel’s single drive random write performance, It’s lower then the MLC drive. To be completely honest with you I had overlooked it, after all I was focusing on RAID performance. This was my mistake because this is actually caused by one of the Achilles heals of most flash on the market today, crappy performance when you fill more of the drive. I don’t really know what the official title for it is but I will call it “Drive Overeating”.
Let me try and put this simply: a quick trick most vendors use to push better random write #’s and help wear leveling is to not …[Read more]
Everyone loves SSD. It’s a hot topic all around the MySQL community with vendors lining up all kinds of new solutions to attack the “disk io” problem that has plagued us all for years and years. At this year’s user conference I talked about SSD’s and MySQL. Those who follow my blog know I love IO and I love to benchmark anything that can help overcome IO issues. One of the most exciting things out their at this point are the Intel x-25e drives. These bad boys are not only fast but relatively inexpensive. How fast are they? Let’s just do a quick bit of review here and peak at the single drive #’s from sysbench. Here you can see that a single X25-e outperforms all my other single drive test.
Yep you have probably seen this type of chart on other sites… The great thing about the Intel drives is their performance on writes, this difference gives …[Read more]
Pre-UC I put out a teaser on some dbt2 scores in the 50K range. I mentioned and showed the graphs during my SSD session, but I thought I would show them here for those who skipped the UC or did not attend my session. Basically what most people consider to be a classic “CPU Bound” workload where all of your data easily fits into memory can also see benefits from moving to SSD’s. Remember just because everything fits into memory doesn’t mean your not going to be doing some operations to disk ( logging, flushes, etc ). Take a look:
|Regular Disk BBU (5.1.33)||46106.44||NA|
|SSD WO/Drive Cache (5.1.33)||50606.82||9.76%|
Take a look here:
Response Time (s) Transaction % Average : 90th % Total Rollbacks % ------------ ----- --------------------- ----------- --------------- ----- Delivery 3.98 0.211 : 0.266 274829 0 0.00 New Order 44.78 0.157 : 0.187 3090951 30925 1.00 Order Status 3.99 0.149 : 0.179 275357 0 0.00 Payment 42.76 0.150 : 0.180 2951361 0 0.00 Stock Level 3.99 0.152 : 0.182 275564 92070 33.41 50606.82 new-order transactions per minute (NOTPM) 60.5 minute duration 0 total unknown errors 31 second(s) ramping up
If you know what this output is from, and you know what 50K TPM means… your probably curious about these #’s. I am probably tantalizing you right now in fact. But I am not going to tell you more, not yet. So go ahead and guess. Better yet …[Read more]