Fairly often I hear customers say that they plan a table that
accumulates millions of rows per day, and they want to keep
around, say, the last 30 days worth of data. (For the sake of
examples, I'm going to make it the last 3 days.) So this is a
kind of round-robin table, with rolling addition of new data and
removal of the oldest data.
With a high volume of data, this sounds like a table partitioned
on day boundaries (in MySQL 5.1). See Sarah's blog and her links
for a quick ramp-up on time-based table partitioning
(http://everythingmysql.ning.com/profiles/blogs/partitioning-by-dates-the).
One great benefit of table partitioning is that you can drop a
partition to lose millions of rows in one quick statement, much
faster than deleting millions of rows. Sort of like a partial
TRUNCATE TABLE.
First create the table with 4 partitions, and then, once a day,
drop the oldest partition and add another partition to store the
next day's rows. (The table will really have exactly 3 days worth
data at the moment of this transformation and will accumulate one
more day's worth until the next such transformation.)
CREATE DATABASE IF NOT EXISTS demotimeparts;
USE demotimeparts;
DROP TABLE IF EXISTS pagehits;
CREATE TABLE pagehits (
id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
urlviewed VARCHAR(255),
whodone VARCHAR(40) DEFAULT NULL,
whendone DATETIME NOT NULL DEFAULT '0001-01-01 00:00:00',
PRIMARY KEY (id, whendone)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
PARTITION BY RANGE (to_days(whendone))
(PARTITION p20090930 VALUES LESS THAN (TO_DAYS('2009-09-30')),
PARTITION p20091001 VALUES LESS THAN (TO_DAYS('2009-10-01')),
PARTITION p20091002 VALUES LESS THAN (TO_DAYS('2009-10-02')),
PARTITION p20091003 VALUES LESS THAN (TO_DAYS('2009-10-03')));
A few notes about this table:
- The date boundary of a partition is in the partition's name. We'll use this. (It's also convenient for metadata reports.)
- The column whendone is included in the primary key because of the rule that your partitioning column must participate in every unique index.
- For this time-based table to benefit from partition pruning, in which the optimizer eliminates some of the partitions from query execution, your partitioned column must be of type DATE or DATETIME, not TIMESTAMP.
- The default value on the column whendone is to accommodate a SQL_MODE including NO_ZERO_DATE, NO_ZERO_IN_DATE.
So far, so good: The table is set up to have 4 partitions, one
for each of 4 consecutive days. Now, how to accomplish the
"rolling" part? Here's one way. The procedure below takes a
DATETIME argument and "rolls" the table to accept rows up to the
limit of that date, not inclusive. It uses prepared statements to
drop the oldest partition in the table, and add a new partition
using the DATE limit you've given.
USE demotimeparts;
DROP PROCEDURE IF EXISTS RotateTimePartition;
DELIMITER ;;
CREATE PROCEDURE RotateTimePartition (newPartValue DATETIME)
BEGIN
-- Setup
DECLARE keepStmt VARCHAR(2000) DEFAULT @stmt;
DECLARE partitionToDrop VARCHAR(64);
-- Find and drop the first partition in the table.
SELECT partition_name
INTO partitionToDrop
FROM INFORMATION_SCHEMA.PARTITIONS
WHERE table_schema='demotimeparts'
AND table_name='pagehits'
AND partition_ordinal_position=1;
SET @stmt = CONCAT('ALTER TABLE pagehits DROP PARTITION ',
partitionToDrop);
PREPARE pStmt FROM @stmt;
EXECUTE pStmt;
DEALLOCATE PREPARE pStmt;
-- Add a new partition using the input date for a value limit.
SET @stmt = CONCAT('ALTER TABLE pagehits ADD PARTITION (PARTITION p',
DATE_FORMAT(newPartValue, '%Y%m%d'),
' VALUES LESS THAN (TO_DAYS(\'',
DATE_FORMAT(newPartValue, '%Y-%m-%d'),
'\')))');
PREPARE pStmt FROM @stmt;
EXECUTE pStmt;
DEALLOCATE PREPARE pStmt;
-- Cleanup
SET @stmt = keepStmt;
END;;
DELIMITER ;
So, before calling RotateTimePartition, the pagehits table
definition includes:
/*!50100 PARTITION BY RANGE (to_days(whendone))
(PARTITION p20090930 VALUES LESS THAN (734045) ENGINE = MyISAM,
PARTITION p20091001 VALUES LESS THAN (734046) ENGINE = MyISAM,
PARTITION p20091002 VALUES LESS THAN (734047) ENGINE = MyISAM,
PARTITION p20091003 VALUES LESS THAN (734048) ENGINE = MyISAM) */
Then, after:
CALL RotateTimePartition('2009-10-04');
, the table includes:
/*!50100 PARTITION BY RANGE (to_days(whendone))
(PARTITION p20091001 VALUES LESS THAN (734046) ENGINE = MyISAM,
PARTITION p20091002 VALUES LESS THAN (734047) ENGINE = MyISAM,
PARTITION p20091003 VALUES LESS THAN (734048) ENGINE = MyISAM,
PARTITION p20091004 VALUES LESS THAN (734049) ENGINE = MyISAM) */
Notice that the generated partition names let you easily see the
date boundaries.
Then you can issue:
CALL RotateTimePartition(NOW() + INTERVAL 1 DAY);
, to get:
/*!50100 PARTITION BY RANGE (to_days(whendone))
(PARTITION p20091002 VALUES LESS THAN (734047) ENGINE = MyISAM,
PARTITION p20091003 VALUES LESS THAN (734048) ENGINE = MyISAM,
PARTITION p20091004 VALUES LESS THAN (734049) ENGINE = MyISAM,
PARTITION p20091005 VALUES LESS THAN (734050) ENGINE = MyISAM) */
There: Lose a partition, add a partition. Now you can easily
write an event to call this procedure daily, to automatically
maintain your storage for the table.
This table design and procedure work just as well for a table
with 31 day-sized partitions, 5 week-sized partitions, 25
month-sized partitions, or whatever. The procedure takes a date
input, so it doesn't care whether you're using day, week, month,
or any other intervals.
Enjoy!