Over the past three weeks, I've been refactoring the way that Drizzle handles temporal, or time-related, values. It's been a fun little adventure into an area of the server which hasn't changed much in a long time. Today, I wanted to share a couple things I've learned about calendars, MySQL, and dealing with date-related calculations.
To start the conversation, imagine a SQL request like the following:
SELECT TO_DAYS("2008-08-11");
My guess is that many people don't really understand what is going on behind the scenes in the SQL server when they issue a request like the above. Believe it or not, there is quite a bit of calculation going on for a simple request like the above. Furthermore, there are a number of caveats that revolve around the ranges of dates that the SQL server operates on.
To demonstrate, without looking at the MySQL manual, can you answer the following questions?
- What is "day number zero" according to the TO_DAYS() function?
- Does TO_DAYS() take into account leap years?
- Can you use the results of the TO_DAYS() function to calculate intervals between dates?
The above questions, and their answers, expose a number of issues with temporal values. Let's take each question in order.
What is "day number zero" according to the TO_DAYS() function?
OK, so if you answered 0000-01-01 for question #1, you would be correct. The TO_DAYS(date) function returns the number of days from the "date" 0000-01-01 up until date. Now, why would I put the term "date" in quotes? Well, it's because 0000-01-01 isn't actually an accurate date.
"Wait a minute, Jay, of course 0000-01-01 is a real date! It's the first date of the first year in the 'modern' era...you know, A.D.!", you are probably saying. But, the fact is that there isn't such as date in existence. Or, more precisely, there is no way to accurately know what date was the first day of the year of the 'modern era'. In fact, there isn't a reliable way of telling what a "date" is if that date occurred before October 15th, 1582.
October 15th, 1582, is the start of the Gregorian calendar, which is the calendar system used by the "modern world". This calendar takes into account adjustments for leap years and leap seconds (yes, there is something called a leap second). Before this date, it's not possible to use a Gregorian date such as 0301-03-23 to reliably determine calendar information for that date. The reason is that the dates October 5 - 14, 1582 do not exist. These dates were removed from the Gregorian calendar to "make up for" leap years and seconds up until the start of the Gregorian calendar.
Given the above information, check out the results of the below statement.
mysql> select to_days("1582-10-08");
+-----------------------+
| to_days("1582-10-08") |
+-----------------------+
| 578094 |
+-----------------------+
1 row in set (0.00 sec)
If the dates 5-14 of October, 1582 don't exist, how can the above function return a value? Weird, huh? Well, kind of. The thing is, the TO_DAYS() function and it's sister FROM_DAYS() operate on a pseudo-calendar that represents the SQL range of dates, which is year 0 to year 9999. This pseudo-calendar simply assumes that up until October 15th, 1582, each year was comprised of 365.25 days.
Which leads us nicely to our second question...
Does TO_DAYS() take into account leap years?
Here, the answer depends. If the date supplied to TO_DAYS() occurs in the Gregorian calendar, TO_DAYS() will calculate the number of days using leap years. If the date occurs prior to October 15th, 1582, then TO_DAYS() will not. This, of course, has ramifications for the answer to our final question.
Can you use the results of the TO_DAYS() function to calculate intervals between dates?
The answer is both yes and no. For reliable calculations, the dates must be in the Gregorian calendar. If one date is not in the Gregorian calendar and one date is, the results may not be accurate.
Speaking of accuracy...
The astute among you may have noticed something peculiar about the result of the SQL statement above. I just got finished saying that TO_DAYS() considers all years before 1582 as containing 365.25 days. If that is so, we can calculate the TO_DAYS("1582-10-15") using the following calculation:
>>> days_in_years= (1582 * 365.25) >>> days_in_months= (31 + 28 + 31 + 30 + 31 + 30 + 31 + 31 + 30) >>> days_in_october= 15 >>> to_days= days_in_years + days_in_months + days_in_october >>> print to_days 578113.5
Strangely, this is what I get via MySQL:
mysql> select to_days("1582-10-15");
+-----------------------+
| to_days("1582-10-15") |
+-----------------------+
| 578101 |
+-----------------------+
1 row in set (0.00 sec)
That's a difference of 12.5. Is this a bug? Is it an adjustment for the missing days? Probably, but it's also not a bug that is likely to surface in an application. The reason is that FROM_DAYS() and TO_DAYS() are generally used in combination and the bug in the code exists in both functions, therefore they usually offset each other. The second reason is that TO_DAYS() returns an abstract number of days. It's not used to calculate intervals between dates unless that interval is in the modern era, like this:
SELECT * FROM Orders WHERE TO_DAYS(CURRENT_DATE()) - TO_DAYS(order_created) >= 7;
In this case, both abstract numbers returned by TO_DAYS() will be accurate relative to each other since both dates are in the Gregorian calendar and it is the difference between the two dates which is used in the range calculation.
I'll leave you today with a couple oddities of the FROM_DAYS() and TO_DAYS() functions in MySQL:
mysql> select to_days("0000-01-01");
+-----------------------+
| to_days("0000-01-01") |
+-----------------------+
| -32212253 |
+-----------------------+
1 row in set (0.00 sec)
mysql> select to_days("0000-10-01");
+-----------------------+
| to_days("0000-10-01") |
+-----------------------+
| 274 |
+-----------------------+
1 row in set (0.00 sec)
mysql> select from_days(to_days("0000-10-01"));
+----------------------------------+
| from_days(to_days("0000-10-01")) |
+----------------------------------+
| 0000-00-00 |
+----------------------------------+
1 row in set (0.00 sec)