- Defragging the Stimulus -- each [recovery] site has its own silo of data, and no site is complete. What we need is a unified point of access to all sources of information: firsthand reports from Recovery.gov and state portals, commentary from StimulusWatch and MetaCarta, and more. Suggests that Recovery.gov should be the hub for this presently-decentralised pile of recovery data.
- Memetracker -- site accompanying the research written up by the New York Times as Researchers at Cornell, using powerful computers and clever algorithms, studied the news cycle by looking for repeated phrases and tracking their appearances on 1.6 million …
During preparation of Percona-XtraDB template to run in RightScale environment, I noticed that IO performance on EBS volume in EC2 cloud is not quite perfect. So I have spent some time benchmarking volumes. Interesting part with EBS volumes is that you see it as device in your OS, so you can easily make software RAID from several volumes.
So I created 4 volumes ( I used m.large instance), and made:
RAID0 on 2 volumes as:
mdadm -C /dev/md0 --chunk=256 -n 2 -l 0 /dev/sdj
/dev/sdk
RAID0 on 4 volumes as:
mdadm -C /dev/md0 --chunk=256 -n 4 -l 0 /dev/sdj /dev/sdk
/dev/sdl /dev/sdm
RAID5 on 3 volumes as:
mdadm -C /dev/md0 --chunk=256 -n 3 -l 5 /dev/sdj /dev/sdk
/dev/sdl
RAID10 on 4 volumes in two steps:
mdadm -v --create /dev/md0 --chunk=256 --level=raid1
--raid-devices=2 …
On our road to a MySQL Proxy GA release, there are quite a few things that need attention, one of them being multithreading.
We are actively working on release 0.8, which brings the first stage of multithreading to the table: Enabling worker threads to handle network events in parallel. The code has been available for quite some time now and we’ve started to implement several performance benchmarks to track our progress, catch regressions and deficiencies early on.
Benchmarking is an interesting field, especially since you can screw up so easily :)
To avoid making mistakes and to spend less time reinventing the wheel, we are doing the same as we are doing with the code: …
[Read more]On our road to a MySQL Proxy GA release, there are quite a few things that need attention, one of them being multithreading.
We are actively working on release 0.8, which brings the first stage of multithreading to the table: Enabling worker threads to handle network events in parallel. The code has been available for quite some time now and we’ve started to implement several performance benchmarks to track our progress, catch regressions and deficiencies early on.
Benchmarking is an interesting field, especially since you can screw up so easily :)
To avoid making mistakes and to spend less time reinventing the wheel, we are doing the same as we are doing with the code: …
[Read more]Only the other day I was talking with someone who does a lot of work on the shell command line, but hadn’t used the GNU screen tool, so I’d better scribble a post about it as I regard it as an absolute must-have for any remote work, for multiple reasons.
First of all, what screen does. You start screen inside a terminal session (local or SSH remote), and then you can create additional sessions though Ctrl-A C. The initial screen is number 0, the next one 1, and so on. You can switch between screens with Ctrl-A # where # is the screen number. This way, you can have multiple things going within a single ssh connection, very handy. But that’s not all!
If you get disconnected (it happens and you reconnect, your screen sessions will still be there, and running too. You can reattach with screen -r. To do a nice disconnect, you can do Ctrl-A D (detach) before …
[Read more]Tokutek® announces the release the release of the TokuDB storage engine for MySQL®, version 2.1.0. This release offers the following improvements over our previous release:
- Faster indexing of sequential keys.
- Faster bulk loads on tables with auto-increment fields.
- Faster range queries in some circumstances.
- Added support for InnoDB.
- Upgraded from MySQL 5.1.30 to 5.1.36.
- Fixed all known bugs.
About TokuDB
TokuDB for MySQL is a storage engine built with Tokutek’s Fractal Tree technology. TokuDB provides near seamless compatibility for MySQL applications. Tables can be individually defined to use TokuDB, MyISAM, InnoDB® or other MySQL-compliant storage engines. Data is loaded, inserted, and queried using standard MySQL commands, with no restrictions or special requirements. …
[Read more]Tokutek® announces the release the release of the TokuDB storage engine for MySQL®, version 2.1.0. This release offers the following improvements over our previous release:
- Faster indexing of sequential keys.
- Faster bulk loads on tables with auto-increment fields.
- Faster range queries in some circumstances.
- Added support for InnoDB.
- Upgraded from MySQL 5.1.30 to 5.1.36.
- Fixed all known bugs.
About TokuDB
TokuDB for MySQL is a storage engine built with Tokutek’s Fractal Tree technology. TokuDB provides near seamless compatibility for MySQL applications. Tables can be individually defined to use TokuDB, MyISAM, InnoDB® or other MySQL-compliant storage engines. Data is loaded, inserted, and queried using standard MySQL commands, with no restrictions or special …
[Read more]Note: This blog post is part 1 of 4 on building our training workshop.
The Percona training workshop will not cover sharding. If you follow our blog, you'll notice we don't talk much about the subject; in some cases it makes sense, but in many we've seen that it causes architectures to be prematurely complicated.
So let me state it: You don't want to shard.
Optimize everything else first, and then if performance still isn't good enough, it's time to take a very bitter medicine. The reason you need to shard basically comes down to one of …
[Read more]As well as contributing to the CAOS research practice here at The 451 Group I am also part of the information management team, with a focus on databases, data caching, CEP, and - from the start of this year - data warehousing.
I’ve covered data warehousing before but taking a fresh look at this space in recent months it’s been fascinating to see the variety of technologies and strategies that vendors are applying to the data warehousing problem. It’s also been interesting to compare the role that open source has played in the data warehousing market, compared to the database market.
I’m preparing a major report on the data warehousing sector, for publication in the next couple of months. What follows is a rough outline of the role open source has played in the sector. Any comments or corrections much appreciated:
Unlike other …
[Read more]
I've been doing a little more playing with Cassandra, an open source distributed database. It
has several features which make it very compelling for storing
large data which has a lot of writes:
- Write-scaling - adding more nodes increases write capacity
- No single point of failure
- configurable redundancy
And the most important:
- Key range scans
Key range scans are really important because they allow
applications to do what users normally want to do:
- What emails did I receive this week
- Give me all the transactions for customer X in time range Y
Answering these questions without range scans is extremely
difficult; with efficient range scans they become fairly easy
(provided you pick your keys right).
…