Showing entries 38543 to 38552 of 44045
« 10 Newer Entries | 10 Older Entries »
groupwise max

... or "How to solve the same problem in 10 different ways".

One of the common problems to solve in SQL is "Get row with the group-wise maximum". Getting just the maximum for the group is simple, getting the full row which is belonging to the maximum is the interesting step.

SELECT MAX(population), continent
  FROM Country
 GROUP BY continent;

+-----------------+---------------+
| MAX(population) | continent     |
+-----------------+---------------+
|      1277558000 | Asia          |
|       146934000 | Europe        |
|       278357000 | North America |
|       111506000 | Africa        |
|        18886000 | Oceania       |
|               0 | Antarctica    |
|       170115000 | South America |
+-----------------+---------------+

We use the 'world' database from the MySQL manual for the examples.

The next step is to find the countries which have the population and the continent of our gathered data.

SELECT …
[Read more]
ORDER BY RAND()

If you read the MySQL manual you might have seen the ORDER BY RAND() to randomize the the rows and using the LIMIT 1 to just take one of the rows.

SELECT name
  FROM random
 ORDER BY RAND()
 LIMIT 1;

This example works fine and is fast if you only when let's say 1000 rows. As soon as you have 10000 rows the overhead for sorting the rows becomes important. Don't forget: we only sort to throw nearly all the rows away.

I never liked it. And there are better ways to do it. Without a sorting. As long as we have a numeric primary key.

For the first examples we assume the be ID is starting at 1 and we have no holes between 1 and the maximum value of the ID.

move the work into the application

First idea: We can simplify the whole job if we calculate the ID beforehand in the application.

SELECT MAX(id) FROM random;
## generate random id in application
SELECT name FROM random WHERE id = …
[Read more]
Handling errors

In the next milestone build of Pentaho Data Integration (2.4.1-M1) we will be introducing advanced error handling features. (2.4.1-M1 is expected around February 19th)
We looked hard to find the easiest and most flexible way to implement this, and I think we have found a good solution.

Here is an example:

The transformation above works as follows: it generates a sequence between -1000 and 1000.  The table is a MySQL table with a single “id” column defined as TINYINT.  As you all know, that data type only accepts values between -128 and 127.

So what this transformation does is, it insert 256 rows into the table and divert all the others to a text file, our “error bucket”.

How can we configure this new feature?  Simple: click right on the step where you want the error handling to take place, in this case, …

[Read more]
RHEL 5 and xen demo at MySQL User group

At last night's New York MySQL Meetup we had a great presentation on the new stuff in RedHat Enterprise Linux 5 from Goutham Kandiar (from RH's NY office down the block). The highlight was when he moved a running MySQL server from one laptop to another.

Now, I'm a Sales Engineer, so I can appreciate good dog & pony shows. I can really appreciate the difficulty of making a dog or pony of a database. So this was quite a treat.

It turns out all you need is some reasonably modern hardware (they had Lenovo T60s with on-chip virtualization) and xen. You can use older hardware, but you'll have to modify your kernel (which limits you to OSes where that is a possibility). Goutham had a gigabit switch, making the 256M image move over in ~4 seconds, but he told me that an ad-hoc 802.11g network would still do the trick in about half a minute.

read more

Beta versions of the next XAMPP for Linux and Windows

During the last days and weeks new releases of PHP (4.4.5 and 5.2.1), a new version of MySQL (5.0.33), and a new Apache (2.2.4) saw the light of the day. We're currently working on the new XAMPP version and invite you to take a closer look at the beta versions of the next XAMPP release:

http://www.apachefriends.org/en/xampp-beta.html

XAMPP BETA versions are always for testing purposes only. There will be no upgrade packages from and to beta versions. To all testers: Many thanks in advance!!

Using Innodb Primary Keys wisely

At a customer site recently I came across the following table definition for an Innodb table. 140 columns, 3 part primary key totally 44 bytes, 2 million rows giving over 900MB in data size. And it had 15 indexes, totaling over 2.3GB in size. Add into the mix a Windows Server 2003 OS, 640MB innodb_buffer_pool_size setting and table scans out the warzoo. This all leads to very poor performance.

In is generally considered best practice for Innodb to use the shortest primary key as possible, and there is a clear reason for this. Innodb stores with every index the full primary key. So for example if an indexed column was 4 bytes in length, in this example the index row would be 48 bytes (before overhead). Fortunately an easy solution presented itself, because of this index storage requirement, Innodb will create an internal 6 byte primary key if none exists for a table. I of course had known about this but had never tried it in a production …

[Read more]
Data profiling

USA Today has come out with a new survey - apparently, three out of every four people make up 75% of the population --David Letterman

The GBush Jeans Co introduced in the post on the trend reports, wants to analyze sales trends. The various stores including the company stores as well as the big department stores have agreed to provide the daily sales data from the respective transaction processing systems at the end of each day. "Data profiling" helps us understand this data before the pretty business performance trend reports can be created for the data warehouse user. For data profiling we create quick and dirty reports:

  1. that are a diluted version of the final trend reports that will ultimately be produced; and
  2. with simple SQL aggregation queries, using SUM and GROUP BY, on the data loaded into the …
[Read more]
Doing open source in the 'burbs

I've suggested before that open source companies need not migrate to Silicon Valley. It's not really an open source question, though, so much as it is a persistent belief that to be big/important you must live in a big/important city.

The WSJ took on this topic in yesterday's Journal in an op-ed piece called "The Myth of 'Superstar' Cities." It begins:

These seem the best of times for America's elite cities. Wall Street's 2006 megabonuses created thousands of instant millionaires, and, with their venture-fund soulmates in places like San Francisco, Boston and Greenwich, the best people are prowling for Ferraris, planes, multimillion-dollar condos, the newest $200 lunch place and the latest in high fashion. In some markets, office …

[Read more]
Yet another alliance (OSA)

Yet another industry alliance has formed (Open Solutions Alliance), in the tradition of OSDL. This one, however, promises not to fail where OSDL did (i.e., it will not focus on just one technology) because it will provide

"...business level advocacy. We need to talk to sets of customers about open source applications, make them aware of what's out there."Thus spake Barry Klawans to Dana.

It's not a bad goal. My problem with the premise is that I haven't met any enterprises that need help buying into open source. Quite the opposite. The problem we're having is managing growth, not sparking it. If the solution to be solved is customers buying into open source applications, then there is …

[Read more]
Subqueries: the new strategy for ?NULL IN (SELECT ?)?

I hope this is my last post about this topic. It seems we've resolved all of the issues and I'll now describe the user-visible consequences.

To recall, we're talking about subquery predicates in form

(oe1, ..., oeN) [NOT] IN (SELECT ie1, ..., ieN FROM ... )

that are located in a context where it matters if the predicate's result is NULL or FALSE. The name "oe" stands for "outer expression", ie stands for "inner expression".

MySQL evaluates queries "from outside to inside", i.e. we first get the values of (oe1, .. oeN) and then we run the subquery and capture the rows it produces. An apparent and very useful optimization is to "inform" the subquery that we're looking only for rows that have "ie1=oe1", "ie2=oe2" and so on. This is done by injecting appropriate equalities into subquery's WHERE (or HAVING) clause. That is,

(oe1, ..., oeN) [NOT] IN (SELECT ie1, ..., ieN FROM ... )

becomes …

[Read more]
Showing entries 38543 to 38552 of 44045
« 10 Newer Entries | 10 Older Entries »