InnoDB supports plugin parser in fulltext index

InnoDB Fulltext Search now supports plugin parser in MySQL 5.7.3 release. It is a compatible feature as for MyISAM Fulltext Search. So the syntax and usage remain to be largely the same.

A parser plugin can operate in either of two roles:

a) The plugin can replace the built-in parser. In this role, the plugin reads the input to be parsed, splits it up into words, and passes the words to the server (either for indexing or for word accumulation).

b) The plugin can act in conjunction with the built-in parser by serving as a front end for it. In this role, the plugin extracts text from the input and passes the text to the parser, which splits up the text into words using its normal parsing rules.

If you want to write your own full text plugin, please refer to http://dev.mysql.com/doc/refman/5.7/en/writing-full-text-plugins.html.

If you have a existing plugin parser for MyISAM, there would be some minor modifications in order to make it run on InnoDB, even if your plugin can be compiled without any warnings or errors.

1. Set word position in MYSQL_FTPARSER_BOOLEAN_INFO.[MUST]
We add a new member named ‘position’ in MYSQL_FTPARSER_BOOLEAN_INFO as you can see below.

typedef struct st_mysql_ftparser_boolean_info
{
enum enum_ft_token_type type;
int yesno;
int weight_adjust;
char wasign;
char trunc;
int position;
/* These are parser state and must be removed. */
char prev;
char *quot;
} MYSQL_FTPARSER_BOOLEAN_INFO;

position‘ means a word’s offset in the document being parsered and it’s used to support phrase search in fulltext query. In InnoDB, when we check whether a document matches a phrase, we parse the documents from the word’s positions directly to see if the phrase is matched. In MyISAM, we need to parse the whole document from the beginning.

2. Check ‘mysql_add_word’ return value.[Strongly Recommended]
We should check the return value in our plugin parser whenever calling zmysql_add_word’, and stop if error occurs inside ‘mysql_add_word’.

3. Skip Stopword Check.[Strongly Recommended]
The plugin parsers don’t need to check stopword, innodb_fts_min_token_size and innodb_fts_max_token_size.
Just return every single word to InnoDB and let InnoDB do the checks.

However if there is need for plugin parser handles stopword, then the plugin must be in
‘MYSQL_FTPARSER_SIMPLE_MODE’ mode, which is for fulltext index build and natural language search. In ‘MYSQL_FTPARSER_WITH_STOPWORDS’ and
‘MYSQL_FTPARSER_FULL_BOOLEAN_INFO’m mode, we should return every single word including stopwords to InnoDB in case of phrase search, or we may get unexpected results.

BTW, We don’t support proximity search in plugin parser now. so ‘@’ is ignored and digits after it will be treated as a normal token.
For example, (‘(“msyql database”)@3′ IN BOOLEAN MODE) is equal to (‘”mysql database” 3′ IN BOOLEAN MODE).

For more information, please refer to MySQL documentation, which list above points/limits as well.