Just a quick heads-up - I just commited a change to the Planet MySQL code that slightly modifies how the RSS feed is being created: now the author name is put in front of each posting's title, similar to how many other feed aggregators mark the different articles. I hope you find this change useful, let me know if you experience any problems or have any other suggestions for improvements. Thanks!
What does working with large data sets in mySQL teach you ? Of course you have to learn a lot about query optimization, art of building summary tables and tricks of executing queries exactly as you want. I already wrote about development and configuration side of the problem so I will not go to details again.
Two great things you’ve got to learn when working with large data things in MySQL is patience and careful planning. Both of which relate two single property of large data sets - it can take hell a lot of time to deal with. This may sound obvious if you have some large data set experience but it is not the case for many people - I constantly run into the customers assuming it will be quick to rearrange their database or even restore from backup.
You need …
[Read more]
Currently I need to move a bit of data around. I like to use
Kettle
for this type of work rather than writing custom scripts for a
number of reasons (which I won't discuss here).
Anyway here is a quick tip I want to share with whomever it may
concern. It is not rocket science, and many people may go "duh!"
but I hope it will still be useful to others.
Quite often, you need a batch task, like truncating a set of
tables, deleting data, dropping constraints etc. In kettle, you
might model this like a job. In this case, each separate
action can be modelled as a step of the job.
The following screenshot illustrates this approach:
So here, each step is just an SQL statement that performs exactly
the task you want and the steps are connected in order to …
I just wanted to point everybody at a recent blog post by Konstantin. In the post he discusses a solution for dealing with cache invalidation issues of very large caches under heavy load. He points out that cache invalidation can severely bog down the system. The general solution he proposes is to simply deactivate the query cache entirely during invalidation. I think this is an important caveat to be aware of and actually he is asking for feedback if this "solution" is acceptable. I think its awesome that MySQL engineers are giving us the opportunity to provide feedback on such changes. Maybe there should be a dedicated "pipeline" where such requests could be found?
We next go "In the Trenches" with Kevin Henrikson of Zimbra. Zimbra wasn't the first to build a slick email system with a strong AJAX feel, but it has clearly taken the lead among its peers. The backbone of that position is its engineering team, with Kevin at the heart of the organization.
As it turns out, regardless of all the "sex appeal" that Zimbra has in the market (and it has plenty), Kevin's comments reveal that it's community feedback that makes the company tick. Community feedback and an active engineering team that solicits and acts on that feedback, often in real-time. This is the heart of a successful open source business, and Kevin shows us how it's done.
Name, company, title, and what you actually do
Kevin Henrikson, director of Engineering, Zimbra. I currently manage our client engineering team which develops the Zimbra Advanced Client (AJAX based) and Standard Client (JSP/HTML based), the latter …
[Read more]It has become obvious that there are just too many people to meet up with, and too many locations to travel to, with so little time to do them all. So setting up temporary office, seems to make the most sense! Those that have emailed me, have also received the following in their email.
Where?
Lobby Lounge Restaurant/Cafe
Grand Copthorne Waterfront Hotel
392, Havelock Rd
Singapore
When?
Thursday, July 5 2.30pm - 6pm
Friday, July 6 8am - 11am
What to do if I’m not there?
Just drop me an SMS or a quick call to +6-012-204-3201.
This is in addition to the meetup we’re having. Depending on how my meetings on Friday go, there might be yet another afternoon session available.
Technorati Tags: …
[Read more]MySQL Toolkit distribution 620 updates documentation and test suites, includes some major bug fixes and functionality changes, and adds one new tool to the toolkit. This article is mostly a changelog, with some added notes. Many of the tools have matured and I just needed to make the documentation top-notch, but there’s still a lot to be done on the crucial checksumming and syncing tools. Time is in short supply for me right now, though.
I stumbled across this article in the International Herald Tribune today and was shocked by how off such an otherwise reputable publication could be. The general tone of the article was that open source is struggling to grow. I'm not sure how 100 percent year-over-year growth for the prominent commercial open-source start-ups connotes "struggling," but....
On one hand, open-source developers are continuing to struggle to find ways to make money from open-source software, most of which is given away.
But the only way to do so is to work closely with their biggest rivals--proprietary software makers like International Business Machines, Microsoft, SAP, Cisco and Oracle--which also have an interest in limiting erosion to their own sales.
Since when? We have a host of open-source companies jockeying to be first out the …
[Read more]
The only man I know who behaves sensibly is my tailor; he takes
my measurements anew each time he sees me. The rest go on with
their old measurements and expect me to fit them --George Bernard
Shaw
In the ideal world the operational source would provide a
mechanism for identifying changes made to the data since the last
extract, also known as change data capture (CDC). The source may
contain update date, database online log scrubbing mechanism, or
audit logs, etc. for the purpose. In the real life, many sources
will dump the complete data into a file and the responsibility
for identifying the changes to the data will fall on the data
warehouse processes. Or even if one of the CDC mechanisms is
provided by the source it may not be reliable enough.
The CDC process is straightforward for transactional data, for
example: sales transactions. Since the transactions always come
with effective dates, the new transactions are …
As mentioned before, since FooCamp I've been having ideas
around
queue services:
http://krow.livejournal.com/531369.html
http://krow.livejournal.com/530752.html
I've been thinking about this a bit more, and instead of working
on
the concept of a straight queue mechanism (like what Oracle
has),
I've been thinking more about how web services handle this,
in
particular services like Amazon's.
Instead of a flat queue structure, shoot for a temporal
queue.
A range select should force rows to go away for a set period of
time,
until the timer run's out. This gives the processing application
time
to deal with the row, and if it doesn't make it back in time, the
row
should reappear to go back in the …