In the article ‘An alternative way of EAV modelling’,
I discussed how to do EAV modelling by casting all values (except
text) to integers. I’ll continue on that and talk about more
advanced topics like multi-value fields.
As binary set
Not all questions have only a single option. Some fields we want
to represent by radio-buttons, allow the user to select any
number of options. For this we can mimic the behaviour of the SET
field type of MySQL. A SET is almost similar to an ENUM, except
that each bit represents an option. The value can have multiple
bits enabled to represent multiple options.
Example: field options for field ‘programming language’
+-------+-------------+ | value | description | +-------+-------------+ | 1 | C/C++ | | 2 | PHP | | 4 | Java | | 8 | Python | | 16 | Ruby | +-------+-------------+
Choosing ‘C/C++’, ‘PHP’ and ‘Python’ would result in value 11.
There are 2 major disadvantages to this approach. First, with an integer, the number of options is limited to 4*8 = 32. Second, retrieving the description of options would cause joining field on field_option to be done like:
SELECT value_display(`field_name`, value_display(`field_type`, `value`.`value`, GROUP_CONCAT(`field_option`.`description`), `value`.`text`, `precision`, `date_format`)) FROM `value` INNER JOIN `field` ON `value`.`fid` = `field`.`fid` LEFT JOIN `field_option` ON `value`.`fid` = `field_option`.`fid` AND (IF(`field`.`field_type`='SET, `value`.`value` & `field_option`.`value`, `value`.`value` = `field_option`.`value`)) WHERE `value`.`item_id`=? GROUP BY `value`.`fid`;
Because of this, the index on the value isn’t used. Instead, a function has to be performed on each option of the field. This should still give decent enough performance, because this scales linearly and not exponential.
Multiple values
Another way is to store multiple values for the same property.
This would mean replacing the primary key from the value table by
a normal index.
CREATE TABLE `value` ( `item_id` int(10) UNSIGNED NOT NULL DEFAULT '0', `fid` int(10) UNSIGNED NOT NULL DEFAULT '0', `value` int(11) NOT NULL DEFAULT '0', `text` text, KEY `item_field` (`item_id`,`fid`), KEY `value` (`fid`,`value`), KEY `text` (`text`(255)) ) ENGINE=InnoDB DEFAULT CHARSET=latin1; SELECT value_display(`field_name`, value_display(`field_type`, `value`.`value`, GROUP_CONCAT(`field_option`.`description`), `value`.`text`, `precision`, `date_format`)) FROM `value` INNER JOIN `field` ON `value`.`fid` = `field`.`fid` LEFT JOIN `field_option` ON `value`.`fid` = `field_option`.`fid` AND `value`.`value` = `field_option`.`value` WHERE `value`.`item_id`=? GROUP BY `value`.`fid`;
The disadvantage of this, is that structural integrity is no longer enforced by the database. Only for a field with the type ‘SET’, multiple values should be allowed. However the database will allow multiple values for other field types (like numeric fields) as well. It is up to the application to replace existing values.
Ranges with a single field
A completely different type of multi-value is ranges. A range has
a top and bottom value. We can solve this by saving 2 values in
the database. Of the values, the the highest one is always the
upper limit and the lowest is the bottom limit.
delimiter | CREATE FUNCTION `value_display` (`type` enum('NUMBER', 'ENUM', 'DATE', 'TIME', 'TEXT'), `value` INT, <strong>`max_value` INT</strong>, `option` VARCHAR(255), `text` TEXT, `precision` INT, `date_format` VARCHAR(50)) RETURNS VARCHAR(255) CHARACTER SET latin1 NO SQL BEGIN // ... END SELECT value_display(`field_name`, value_display(`field_type`, MIN(`value`.`value`), IF(COUNT(*)>1, MAX(`value`.`value`, NULL), GROUP_CONCAT(`field_option`.`description`), `value`.`text`, `precision`, `date_format`)) FROM `value` INNER JOIN `field` ON `value`.`fid` = `field`.`fid` LEFT JOIN `field_option` ON `value`.`fid` = `field_option`.`fid` AND `value`.`value` = `field_option`.`value` WHERE `value`.`item_id`=? GROUP BY `value`.`fid`;
Range by 2 fields
Another was of looking at ranges is that the are simply 2
individual fields, eg `min weight` and `max weight`. When
filtering you might want to get all items that have 70 kg in
their weight range. You would translate this to `min weight`
<= 70 and `max weight` >= 70. (In the example query `min
weight` has fid=2 and `max weight` fid=3.)
SELECT `item`.* FROM `item` INNER JOIN `value` ON `item`.`id`=`value`.`item_id` AND `value`.`fid` IN (2, 3) WHERE (`value`.`fid`=2 AND `value`.`value` <= 70) OR (`value`.`fid`=3 AND `value`.`value` >= 70) HAVING count(*) =2 GROUP BY `item`.`id`;
Conclusion
We see that using multi-value fields forces to group on fid and
generally complicates the queries. If you only need ranges it
might be a good idea to use 2 fields instead to keep the queries
simpler.
The code in this article has not been tested.