i?m back to doing some work on connector/odbc, fixing bugs in the
?stable? version (otherwise known as 3.51.x). we have another
project going that is a ground-up rewrite of the driver, but i?m
not really involved with that.
the state of the stable version of the driver is pretty sad. i
keep running into pockets of code that are truly frightening. my
first big foray was a bug i reported when i was trying to get the test
suite to run, and it basically entailed throwing out a
function that made no sense and replacing it with code that has
such niceties as comments.
as i?ve started into the catalog functions, starting from
this ancient
bug, i?m finding even more frightening (and untested)
code.
my general goal is to leave things cleaner than i?ve found them,
doing things as incrementally as i …
... or "How to solve the same problem in 10 different ways".
One of the common problems to solve in SQL is "Get row with the group-wise maximum". Getting just the maximum for the group is simple, getting the full row which is belonging to the maximum is the interesting step.
SELECT MAX(population), continent FROM Country GROUP BY continent; +-----------------+---------------+ | MAX(population) | continent | +-----------------+---------------+ | 1277558000 | Asia | | 146934000 | Europe | | 278357000 | North America | | 111506000 | Africa | | 18886000 | Oceania | | 0 | Antarctica | | 170115000 | South America | +-----------------+---------------+
We use the 'world' database from the MySQL manual for the examples.
The next step is to find the countries which have the population and the continent of our gathered data.
SELECT …[Read more]
If you read the MySQL manual you might have seen the ORDER BY RAND() to randomize the the rows and using the LIMIT 1 to just take one of the rows.
SELECT name FROM random ORDER BY RAND() LIMIT 1;
This example works fine and is fast if you only when let's say 1000 rows. As soon as you have 10000 rows the overhead for sorting the rows becomes important. Don't forget: we only sort to throw nearly all the rows away.
I never liked it. And there are better ways to do it. Without a sorting. As long as we have a numeric primary key.
For the first examples we assume the be ID is starting at 1 and we have no holes between 1 and the maximum value of the ID.
move the work into the application
First idea: We can simplify the whole job if we calculate the ID beforehand in the application.
SELECT MAX(id) FROM random; ## generate random id in application SELECT name FROM random WHERE id = …[Read more]
In the next milestone build of Pentaho Data Integration (2.4.1-M1) we will be introducing advanced error
handling features. (2.4.1-M1 is expected around February
19th)
We looked hard to find the easiest and most flexible way to
implement this, and I think we have found a good solution.
Here is an example:
The transformation above works as follows: it generates a sequence between -1000 and 1000. The table is a MySQL table with a single “id” column defined as TINYINT. As you all know, that data type only accepts values between -128 and 127.
So what this transformation does is, it insert 256 rows into the table and divert all the others to a text file, our “error bucket”.
How can we configure this new feature? Simple: click right on the step where you want the error handling to take place, in this case, …
[Read more]At last night's New York MySQL Meetup we had a great presentation on the new stuff in RedHat Enterprise Linux 5 from Goutham Kandiar (from RH's NY office down the block). The highlight was when he moved a running MySQL server from one laptop to another.
Now, I'm a Sales Engineer, so I can appreciate good dog & pony shows. I can really appreciate the difficulty of making a dog or pony of a database. So this was quite a treat.
It turns out all you need is some reasonably modern hardware (they had Lenovo T60s with on-chip virtualization) and xen. You can use older hardware, but you'll have to modify your kernel (which limits you to OSes where that is a possibility). Goutham had a gigabit switch, making the 256M image move over in ~4 seconds, but he told me that an ad-hoc 802.11g network would still do the trick in about half a minute.
During the last days and weeks new releases of PHP (4.4.5 and
5.2.1), a new version of MySQL (5.0.33), and a new Apache (2.2.4)
saw the light of the day. We're currently working on the new
XAMPP version and invite you to take a closer look at the beta
versions of the next XAMPP release:
http://www.apachefriends.org/en/xampp-beta.html
XAMPP BETA versions are always for testing purposes only. There
will be no upgrade packages from and to beta versions. To all
testers: Many thanks in advance!!
At a customer site recently I came across the following table definition for an Innodb table. 140 columns, 3 part primary key totally 44 bytes, 2 million rows giving over 900MB in data size. And it had 15 indexes, totaling over 2.3GB in size. Add into the mix a Windows Server 2003 OS, 640MB innodb_buffer_pool_size setting and table scans out the warzoo. This all leads to very poor performance.
In is generally considered best practice for Innodb to use the shortest primary key as possible, and there is a clear reason for this. Innodb stores with every index the full primary key. So for example if an indexed column was 4 bytes in length, in this example the index row would be 48 bytes (before overhead). Fortunately an easy solution presented itself, because of this index storage requirement, Innodb will create an internal 6 byte primary key if none exists for a table. I of course had known about this but had never tried it in a production …
[Read more]
USA Today has come out with a new survey - apparently, three out
of every four people make up 75% of the population --David
Letterman
The GBush Jeans Co introduced in the post on the trend reports, wants to analyze sales
trends. The various stores including the company stores as well
as the big department stores have agreed to provide the daily
sales data from the respective transaction processing systems at
the end of each day. "Data profiling" helps us understand this
data before the pretty business performance trend reports can be
created for the data warehouse user. For data profiling we create
quick and dirty reports:
- that are a diluted version of the final trend reports that
will ultimately be produced; and
- with simple SQL aggregation queries, using SUM and GROUP BY, on the data loaded into the …
I've suggested before that open source companies need not migrate to Silicon Valley. It's not really an open source question, though, so much as it is a persistent belief that to be big/important you must live in a big/important city.
The WSJ took on this topic in yesterday's Journal in an op-ed piece called "The Myth of 'Superstar' Cities." It begins:
These seem the best of times for America's elite cities. Wall Street's 2006 megabonuses created thousands of instant millionaires, and, with their venture-fund soulmates in places like San Francisco, Boston and Greenwich, the best people are prowling for Ferraris, planes, multimillion-dollar condos, the newest $200 lunch place and the latest in high fashion. In some markets, office …
[Read more]Yet another industry alliance has formed (Open Solutions Alliance), in the tradition of OSDL. This one, however, promises not to fail where OSDL did (i.e., it will not focus on just one technology) because it will provide
"...business level advocacy. We need to talk to sets of customers about open source applications, make them aware of what's out there."Thus spake Barry Klawans to Dana.
It's not a bad goal. My problem with the premise is that I haven't met any enterprises that need help buying into open source. Quite the opposite. The problem we're having is managing growth, not sparking it. If the solution to be solved is customers buying into open source applications, then there is …
[Read more]