Showing entries 1 to 10 of 18
8 Older Entries »
Displaying posts with tag: data types (reset)
The hex datatype?

Anders posted a blog post about adding a new JSON data type to MariaDB. This made me remember my very first MySQL User conference, well, that and the next couple conferences actually. Why? Because at every conference (for awhile) I remember Brian Aker getting everybody into a room to discuss how to add a data type to MySQL. It was considered pretty much the quintessential example of how to start developing for MySQL because it touches a lot of the basic guts of the server.

The example that Brian always gave was a "hex" data type, so that you can store MD5 numbers using 128 bits of storage instead of 256. A human readable MD5 is a 256 bit string (32 characters) representing 128 bits of actual data. So storing MD5 values in the database (a very common thing) requires twice as much space as it should.

Now, …

[Read more]
Re: MySQL 5.1 vs. MySQL 5.5: Floats, Doubles, and Scientific Notation

Reading Sheeri's MySQL 5.1 vs. MySQL 5.5: Floats, Doubles, and Scientific Notation, I was baffled at this change of floating point number notation.

However, I was also concerned about the final action taken: using "--ignore-columns" to avoid comparing the FLOAT/DOUBLE types.

The --float-precision option for pt-table-checksum currently only uses ROUND() so as to disregard minor rounding issues. But it can very easily extend to handle the difference in floating point notation. Consider again the problem:

mysql> create table tf(f float);
Query OK, 0 rows affected (0.11 sec) …
[Read more]
CHAR and spaces

I know about it, I knew about it all along, but... it's so easy to fall for it; there's just so much absurdity!

A CHAR type has a known number of characters. For example, the column:

CountryCode CHAR(3) CHARSET ascii NOT NULL

- is known to have exactly three characters. These could be 'USA', 'FRA', etc.

What happens with spaces? A SQL CHAR type ignores any trailing spaces; thus, the code 'GB ' (the characters 'G', 'B', and the space ' ') is interpreted as 'GB'. Trailing spaces are not regarded as part of the text. Want to see some absurdity?

CREATE TABLE `c_test` (
  `c` char(1) DEFAULT NULL
INSERT INTO c_test VALUES ('a');
INSERT INTO c_test VALUES ('b');
INSERT INTO c_test VALUES (' ');

SELECT c, LENGTH(c), c = 'a', c = 'b', c = ' ', LENGTH(' ') FROM …
[Read more]
23:59:59 is not the end of the day. No, really!

How would you check whether some TIMESTAMP column falls within a given day, say July 26th, 2011?

This is a question I tend to ask students, and usually I get the same range of answers. Some are wrong; some cannot utilize an index, some are correct, and some are temporarily correct. I wish to take a closer look at the last. Such an answer looks like this:

SELECT * FROM some_table WHERE timstamp_column BETWEEN '2011-07-26 00:00:00' AND '2011-07-26 23:59:59'

Yikes! I get my allergies when I see this one.

Technically this seems correct. And it seems to work so far for people. There are two things that disturb me:

  1. '23:59:59' refers to the beginning of the last second of the day. This means a full second is missing in concept. No one would write such a code on a C, Java or PHP application: that would be a newbie's mistake. But people …
[Read more]
Announcing common_schema: common views & routines for MySQL

Today I have released common_schema, a utility schema for MySQL which includes many views and functions, and is aimed to be installed on any MySQL server.

What does it do?

There are views answering for all sorts of useful information: stuff related to schema analysis, data dimensions, monitoring, processes & transactions, security, internals... There are basic functions answering for common needs.

Some of the views/routines simply formalize those queries we tend to write over and over again. Others take the place of external tools, answering complex questions via SQL and metadata. Still others help out with SQL generation.

Here are a few highlights:

[Read more]
TIMESTAMP vs. DATETIME, which should I be using?

They both look the same, have roughly the same use. Nevertheless, they differ in many respects. I’ll note the difference, and note a few pitfalls and peculiarities.


TIMESTAMP starts with the epoch, ’1970-01-01 00:00:01′ UTC and ends with ’2038-01-19 03:14:07′ UTC. This is all very nice today, and may actually hold up till our retirement, but the younger readers may yet have to deal with the Bug2K+38™, certain to arrive.

DATETIME starts with ’1000-01-01 00:00:00′ and lasts up to ’9999-12-31 23:59:59′. More on that later.

In respect of range, your current event logs may well use a TIMESTAMP value, while your grandfather’s and granddaughter’s birth dates may require …

[Read more]
Implicit casting you don’t want to see around

In Beware of implicit casting, I have outlined the dangers of implicit casting. Here’s a few more real-world examples I have tackled:

Number-String comparisons

Much like in programming languages, implicit casting is made to numbers when at least one of the arguments is a number. Thus:

mysql> SELECT 3 = '3.0';
| 3 = '3.0' |
|         1 |
1 row in set (0.00 sec)

mysql> SELECT '3' = '3.0';
| '3' = '3.0' |
|           0 |

The second query consists of pure strings comparison. It has no way to determine that number comparison should be made.

Direct DATE arithmetics

The first query seems to work, but is completely incorrect. The second explains why. The third is a total mess.

mysql> SELECT …
[Read more]
Choosing MySQL boolean data types

How do you implement True/False columns?

There are many ways to do it, each with its own pros and cons.


Create you column as ENUM(‘F’, ‘T’), or ENUM(‘N’,'Y’) or ENUM(’0′, ’1′).

This is the method used in the mysql tables (e.g. mysql.user privileges table). It’s very simple and intuitive. It truly restricts the values to just two options, which serves well. It’s compact (just one byte).

A couple disadvantages to this method:

  1. Enums are represented by numerical values (which is good) and start with 1 instead of 0. This means ‘F’ is 1, and ‘T’ is 2, and they both translate to True when directly used in a booleanic …
[Read more]
The actual range and storage size of an INT

What’s the difference between INT(2) and INT(20) ? Not a lot. It’s about output formatting, which you’ll never encounter when talking with the server through an API (like you do from most app languages).

The confusion stems from the fact that with CHAR(n) and VARCHAR(n), the (n) signifies the length or maximum length of that field. But for INT, the range and storage size is specified using different data types: TINYINT, SMALLINT, MEDIUMINT, INT (aka INTEGER), BIGINT.

At Open Query we tend to pick on things like INT(2) when reviewing a client’s schema, because chances are that the developers/DBAs are working under a mistaken assumption and this could cause trouble somewhere – even if not in the exact spot where we pick on it. So it’s a case of pattern recognition.

A very practical example of this comes from a client I worked with last week. I first spotted some harmless ones, we talked about it, and then we hit …

[Read more]
But I DO want MySQL to say “ERROR”!

MySQL is known for its willingness to accept invalid queries, data values. It can silently commit your transaction, truncate your data.

  • Using GROUP_CONCAT with a small group_concat_max_len setting? Your result will be silently truncated (make sure to check the warnings though).
  • Calling CREATE TEMPORARY TABLE? You get silent commit.
  • Issuing a ROLLBACK on non-transactional involved engines? Have a warning; no error.
  • Using LOCK IN SHARE MODE on non transactional tables? Not a problem. Nothing reported.
  • Adding a FOREIGN KEY on a MyISAM table? Good for you; no action actually taken.
  • Inserting 300 to a TINYINT column in a relaxed sql_mode? …
[Read more]
Showing entries 1 to 10 of 18
8 Older Entries »