|Showing entries 1 to 18|
Reading Sheeri's MySQL 5.1 vs. MySQL 5.5: Floats, Doubles, and Scientific Notation, I was baffled at this change of floating point number notation.
However, I was also concerned about the final action taken: using "--ignore-columns" to avoid comparing the FLOAT/DOUBLE types.
The --float-precision option for pt-table-checksum currently only uses ROUND() so as to disregard minor rounding issues. But it can very easily extend to handle the difference in floating point notation. Consider again the problem:[Read more...]
I know about it, I knew about it all along, but... it's so easy to fall for it; there's just so much absurdity!
A CHAR type has a known number of characters. For example, the column:
CountryCode CHAR(3) CHARSET ascii NOT NULL
- is known to have exactly three characters. These could be 'USA', 'FRA', etc.
What happens with spaces? A SQL CHAR type ignores any trailing spaces; thus, the code 'GB ' (the characters 'G', 'B', and the space ' ') is interpreted as 'GB'. Trailing spaces are not regarded as part of the text. Want to see some absurdity?
[Read more...]CREATE TABLE `c_test` ( `c` char(1) DEFAULT NULL ); INSERT INTO c_test VALUES ('a'); INSERT INTO c_test VALUES ('b');
How would you check whether some TIMESTAMP column falls within a given day, say July 26th, 2011?
This is a question I tend to ask students, and usually I get the same range of answers. Some are wrong; some cannot utilize an index, some are correct, and some are temporarily correct. I wish to take a closer look at the last. Such an answer looks like this:
SELECT * FROM some_table WHERE timstamp_column BETWEEN '2011-07-26 00:00:00' AND '2011-07-26 23:59:59'
Yikes! I get my allergies when I see this one.
Technically this seems correct. And it seems to work so far for people. There are two things that disturb me:
Today I have released common_schema, a utility schema for MySQL which includes many views and functions, and is aimed to be installed on any MySQL server.
There are views answering for all sorts of useful information: stuff related to schema analysis, data dimensions, monitoring, processes & transactions, security, internals... There are basic functions answering for common needs.
Some of the views/routines simply formalize those queries we tend to write over and over again. Others take the place of external tools, answering complex questions via SQL and metadata. Still others help out with SQL generation.
Here are a few highlights:
They both look the same, have roughly the same use. Nevertheless, they differ in many respects. I’ll note the difference, and note a few pitfalls and peculiarities.
TIMESTAMP starts with the epoch, ’1970-01-01 00:00:01′ UTC and ends with ’2038-01-19 03:14:07′ UTC. This is all very nice today, and may actually hold up till our retirement, but the younger readers may yet have to deal with the Bug2K+38™, certain to arrive.
DATETIME starts with ’1000-01-01 00:00:00′ and lasts up to ’9999-12-31 23:59:59′. More on that later.
In respect of range, your current event logs may well use a TIMESTAMP[Read more...]
In Beware of implicit casting, I have outlined the dangers of implicit casting. Here’s a few more real-world examples I have tackled:
Much like in programming languages, implicit casting is made to numbers when at least one of the arguments is a number. Thus:
mysql> SELECT 3 = '3.0'; +-----------+ | 3 = '3.0' | +-----------+ | 1 | +-----------+ 1 row in set (0.00 sec) mysql> SELECT '3' = '3.0'; +-------------+ | '3' = '3.0' | +-------------+ | 0 | +-------------+
The second query consists of pure strings comparison. It has no way to determine that number comparison should be made.
The first query seems to work,[Read more...]
How do you implement True/False columns?
There are many ways to do it, each with its own pros and cons.
Create you column as ENUM(‘F’, ‘T’), or ENUM(‘N’,'Y’) or ENUM(’0′, ’1′).
This is the method used in the mysql tables (e.g. mysql.user privileges table). It’s very simple and intuitive. It truly restricts the values to just two options, which serves well. It’s compact (just one byte).
A couple disadvantages to this method:
What’s the difference between INT(2) and INT(20) ? Not a lot. It’s about output formatting, which you’ll never encounter when talking with the server through an API (like you do from most app languages).
The confusion stems from the fact that with CHAR(n) and VARCHAR(n), the (n) signifies the length or maximum length of that field. But for INT, the range and storage size is specified using different data types: TINYINT, SMALLINT, MEDIUMINT, INT (aka INTEGER), BIGINT.
At Open Query we tend to pick on things like INT(2) when reviewing a client’s schema, because chances are that the developers/DBAs are working under a mistaken assumption and this could cause trouble somewhere – even if not in the exact spot where we pick on it. So it’s a case of pattern recognition.
A very practical example of this comes from a client I worked with last[Read more...]
MySQL is known for its willingness to accept invalid queries, data values. It can silently commit your transaction, truncate your data.
Here’s a complication of some common and useful time & date calculations and equations. Some, though very simple, are often misunderstood, leading to inefficient or incorrect implementations.
There are many ways to solve such problems. I’ll present my favorites.
Given two timestamps: ts1 (older) and ts2 (newer), how much time has passed between them?
One can use TIMEDIFF() & DATEDIFF(), or compare two UNIX_TIMESTAMP() values. My personal favorite is to use TIMESTAMPDIFF(). Reason being that I’m usually interested in a specific metric, like the number of hours which have passed, or the number of days, disregarding the smaller minute/second resolution. Which allows one to:
Unless specified otherwise, latin1 is the default character set in MySQL.
What I usually find in schemes are columns which are either utf8 or latin1. The utf8 columns being those which need to contain multilingual characters (user names, addresses, articles etc.), and latin1 column being all the rest (passwords, digests, email addresses, hard-coded values etc.)
I find latin1 to be improper for such purposes and suggest that ascii be used instead. The reason being that latin1 implies a European text (with swedish collation). It is unclear for an outsider, when finding a latin1 column, whether it should actually contain West European characters, or is it just being used for ascii text, utilizing the fact that a character[Read more...]
InnoDB and MyISAM use B+ and B trees for indexes (InnoDB also has internal hash index).
In both these structures, the depth of the index is an important factor. When looking for an indexed row, a search is made on the index, from root to leaves.
Assuming the index is not in memory, the depth of the index represents the minimal cost (in I/O operation) for an index based lookup. Of course, most of the time we expect large portions of the indexes to be cached in memory. Even so, the depth of the index is an important factor. The deeper the index is, the worse it performs: there are simply more lookups on index nodes.
What affects the depth of an index?
There are quite a few structural issues, but it boils down to two important factors:
MySQL’s character sets and collations are often considered as a mystery, and many users either completely disregard them and keep with the defaults, or set everything to UTF8.
This post will attempt to shed some light on the mystery, and provide with some best practices for use with text columns with regard to character sets.
A thorough discussion of how the character sets have evolved through history is beyond the scope of this post. While the Unicode standard is gaining recognition, the “older” character sets are still around. Understanding the difference between Unicode and local character sets is crucial.
Consider, for example, MySQL’s
latin1 character set. In this character set there are 256 different characters, represented by one byte. The first 128 characters map to ASCII, the standard “ABCabc012 dot
During my work with companies using MySQL, I have encountered many issues with regard to schema design, normalization and indexing. Of the most common errors are incorrect data types definition. Many times the database is designed by programmers or otherwise non-expert DBAs. Some companies do not have the time and cannot spare the effort of redesigning and refactoring their databases, and eventually face poor performance issues.
Here’s a compilation of “the right and the wrong” data types.
INT(1)is not one byte long.
INT(10)is no bigger than
INT(2). The number in parenthesis is misleading, and only describes the text alignment of the number, when displayed in an interactive shell. All mentioned types are the same INT, have the same storage capacity, and the same range. If you want a
In short, it really doesn’t matter. After watching a MySQL web conference by Jay Pipes, I was gutted when I found out that they are actually exactly the same. I know im not alone in thinking that it affected the size of the data field. An unsigned int has the max value of 4294967295 no matter if its INT(1) or int(10) and will use 4 bytes of data.
So, what does the number in the brackets mean? It pretty much comes down to display, its called the display-width. The display width is a number from 1 to 255. You can set the display width if you want all of your integer values to “appear”. If you enable zerofill on the row, the field will have a default value of 0 for int(1) and 0000000000 for int(10).
There are 5 main numeric[Read more...]
Want to know the type of an arbitrary expression in MySQL? Someday in the far far future in version 7.1, you might be able to with the TYPEOF() function.
For now you can try this:
CREATE TEMPORARY TABLE typeof AS SELECT [expression] AS col;
For example, let’s see what the type of CRC32 is.
mysql> CREATE TEMPORARY TABLE typeof AS SELECT CRC32('hello world') AS col; mysql> DESCRIBE typeof; +-------+------------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+------------------+------+-----+---------+-------+ | col | int(10) unsigned | NO | | 0 | | +-------+------------------+------+-----+---------+-------+
This is one possible way to programmatically[Read more...]
Here is the video of “Database Basics”, which I presented at the February 2008 Boston MySQL User Group meeting. The presentation goes over the basics of relations, data, the Entity-Relationship Model, how to choose data types, and how to do basic
You can download:
Or just watch the video:
|Showing entries 1 to 18|