EAV multi-value fields

In the article ‘An alternative way of EAV modelling’, I discussed how to do EAV modelling by casting all values (except text) to integers. I’ll continue on that and talk about more advanced topics like multi-value fields.

As binary set
Not all questions have only a single option. Some fields we want to represent by radio-buttons, allow the user to select any number of options. For this we can mimic the behaviour of the SET field type of MySQL. A SET is almost similar to an ENUM, except that each bit represents an option. The value can have multiple bits enabled to represent multiple options.

Example: field options for field ‘programming language’

+-------+-------------+
| value | description |
+-------+-------------+
| 1     | C/C++       |
| 2     | PHP         |
| 4     | Java        |
| 8     | Python      |
| 16    | Ruby        |
+-------+-------------+

Choosing ‘C/C++’, ‘PHP’ and ‘Python’ would result in value 11.

There are 2 major disadvantages to this approach. First, with an integer, the number of options is limited to 4*8 = 32. Second, retrieving the description of options would cause joining field on field_option to be done like:

SELECT value_display(`field_name`, value_display(`field_type`, `value`.`value`, GROUP_CONCAT(`field_option`.`description`), `value`.`text`, `precision`, `date_format`)) FROM `value` INNER JOIN `field` ON `value`.`fid` = `field`.`fid` LEFT JOIN `field_option` ON `value`.`fid` = `field_option`.`fid` AND (IF(`field`.`field_type`='SET, `value`.`value` & `field_option`.`value`, `value`.`value` = `field_option`.`value`)) WHERE `value`.`item_id`=? GROUP BY `value`.`fid`;

Because of this, the index on the value isn’t used. Instead, a function has to be performed on each option of the field. This should still give decent enough performance, because this scales linearly and not exponential.

Multiple values
Another way is to store multiple values for the same property. This would mean replacing the primary key from the value table by a normal index.

CREATE TABLE `value` (
  `item_id` int(10) UNSIGNED NOT NULL DEFAULT '0',
  `fid` int(10) UNSIGNED NOT NULL DEFAULT '0',
  `value` int(11) NOT NULL DEFAULT '0',
  `text` text,
  KEY `item_field`  (`item_id`,`fid`),
  KEY `value` (`fid`,`value`),
  KEY `text` (`text`(255))
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

SELECT value_display(`field_name`, value_display(`field_type`, `value`.`value`, GROUP_CONCAT(`field_option`.`description`), `value`.`text`, `precision`, `date_format`)) FROM `value` INNER JOIN `field` ON `value`.`fid` = `field`.`fid` LEFT JOIN `field_option` ON `value`.`fid` = `field_option`.`fid` AND `value`.`value` = `field_option`.`value` WHERE `value`.`item_id`=? GROUP BY `value`.`fid`;

The disadvantage of this, is that structural integrity is no longer enforced by the database. Only for a field with the type ‘SET’, multiple values should be allowed. However the database will allow multiple values for other field types (like numeric fields) as well. It is up to the application to replace existing values.

Ranges with a single field
A completely different type of multi-value is ranges. A range has a top and bottom value. We can solve this by saving 2 values in the database. Of the values, the the highest one is always the upper limit and the lowest is the bottom limit.

delimiter |
CREATE FUNCTION `value_display` (`type` enum('NUMBER', 'ENUM', 'DATE', 'TIME', 'TEXT'), `value` INT, <strong>`max_value` INT</strong>, `option` VARCHAR(255), `text` TEXT, `precision` INT, `date_format` VARCHAR(50)) RETURNS VARCHAR(255) CHARACTER SET latin1 NO SQL
BEGIN
  // ...
END

SELECT value_display(`field_name`, value_display(`field_type`, MIN(`value`.`value`), IF(COUNT(*)>1, MAX(`value`.`value`, NULL), GROUP_CONCAT(`field_option`.`description`), `value`.`text`, `precision`, `date_format`)) FROM `value` INNER JOIN `field` ON `value`.`fid` = `field`.`fid` LEFT JOIN `field_option` ON `value`.`fid` = `field_option`.`fid` AND `value`.`value` = `field_option`.`value` WHERE `value`.`item_id`=? GROUP BY `value`.`fid`;

Range by 2 fields
Another was of looking at ranges is that the are simply 2 individual fields, eg `min weight` and `max weight`. When filtering you might want to get all items that have 70 kg in their weight range. You would translate this to `min weight` <= 70 and `max weight` >= 70. (In the example query `min weight` has fid=2 and `max weight` fid=3.)

SELECT `item`.* FROM `item` INNER JOIN `value` ON `item`.`id`=`value`.`item_id` AND `value`.`fid` IN (2, 3) WHERE (`value`.`fid`=2 AND `value`.`value` <= 70) OR (`value`.`fid`=3 AND `value`.`value` >= 70) HAVING count(*) =2 GROUP BY `item`.`id`;

Conclusion
We see that using multi-value fields forces to group on fid and generally complicates the queries. If you only need ranges it might be a good idea to use 2 fields instead to keep the queries simpler.

The code in this article has not been tested.