/  Technology   /  What are Full-Text Searches in MySQL?

What are Full-Text Searches in MySQL?

 

A full-text search is a technique that allows you to search for records that might not exactly match your search criteria in order to find them. When certain indexes are in use in MySQL, full-text searches are performed; however, those indexes have many unique nuances, and some of them include the following:

  • As a full-text index, the index must be of the FULLTEXT type in order for it to be considered as a full-text index.
  • Currently, FULLTEXT indexes can only be used on tables that run the InnoDB or MyISAM storage engines.
  • In order to create FULLTEXT indexes for CHAR, VARCHAR, or TEXT columns, the column type must be CHAR, VARCHAR, or TEXT.
  • It is only when MATCH() AGAINST() clause is used that FULLTEXT indexes are used.
  • In full-text searches, there are three modes that can be used: the natural language mode, boolean mode, and query expansion mode.

FULLTEXT indexes find keywords in the text instead of comparing the text with the index. Despite the fact that FULLTEXT searching is different from other types of matching, it is important to note that you can have both a BTREE index and a FULLTEXT index on the same column at the same time as they are suited to different purposes.

Full-Text Search Types

When running a full-text search in MySQL, you will need to keep in mind that there are three types of searches that you can select from:

  1. An example of a natural language search type would be one that interprets the search string as a literal phrase. When no modifier is specified or IN NATURAL LANGUAGE MODE is specified, it is enabled by default;
  2. In order to perform a query expansion search, you must perform the search twice – this is referred to as a double search mode. After a second attempt at searching, a few of the most relevant documents from the first attempt at searching are included in the result set of the second attempt at searching. The WITH QUERY EXPANSION modifier can be used to enable this feature;
  3. A boolean search type – such a search mode enables searching for complex queries that can include boolean operators such as less than (“<”) and more than (“>”) operators, subexpressions (“(” and “)”), the plus (+) sign, the minus (-) sign, double quotes (“”), an operator that lowers the value’s contribution to the results (~) and the wildcard operator (*) – the wildcard operator allows searching with fuzzy matching (for example, “demo*” would also match “demonstration”). Enabled using the IN BOOLEAN MODE modifier.

Full-Text Searches with the Natural Language Search Mode

As discussed above, the natural language search mode is activated by default when IN NATURAL LANGUAGE MODE is specified. As a result of this mode, a natural language search can be performed against a text collection (one or more columns) which can be provided. Generally speaking, the basic query format for a full-text search in MySQL should be something like this:

#start
SELECT * FROM table WHERE MATCH(column) AGAINST(“string” IN NATURAL LANGUAGE MODE);
#end

Using MATCH() with a WHERE clause will automatically sort the rows by highest relevance first when used with a WHERE clause. Enclosing a string in double quotes allows you to search for it.

Full-Text Searches with the Query Expansion Mode

As well as full-text searches, the query expansion mode is also available for full-text searches. It is common for users to use such a search mode when they are relying on implied knowledge – for example, when they search for “DBMS” in the hope that they will see both “MongoDB” and “MySQL” in the search results. A full-text search using query expansion mode involves performing the search twice: the first search phrase is concatenated with a few most relevant entries from the first search, so the user might be able to rely on implied knowledge. If a row in the first search contained both “DBMS” and “MySQL”, the second search would find entries containing “MySQL” even if they did not contain “DBMS”. To use query expansion mode, the query format would be as follows:

#start
SELECT * FROM table WHERE MATCH(column) AGAINST(“string” WITH QUERY EXPANSION); 
#end

Full-text Searches Using the Boolean Mode

The boolean mode is one of the most interesting features of MySQL’s full-text search engine. Because boolean operators can be used to increase the search capabilities of this mode, it has a number of unique caveats that make it unique. Certain characters can be assigned special meanings in the boolean mode when they appear at the beginning or end of words. As an example:

  • “+” means AND;
  • “-” means NOT;
  • The “(“ and “)” operators allows to create subexpressions;
  • “<” and “>” operators change the rank of the search value lower or higher;
  • “~” lowers the value’s contribution to the search results;
  • Double quotes (“”) only match literal values;
  • “*” is a wildcard operator (refer to the explanation above).

If you wanted to retrieve all rows that contain the word “Demo”, but not “Demo2”, you could use a query like this: For example, if you wanted to retrieve all rows that contain the word “Demo”, but not “Demo2”, then you could use a query like this:

#start
SELECT * FROM table WHERE MATCH(column) AGAINST (“+Demo -Demo2” IN BOOLEAN MODE);
#end

Double quotes can be used with single quotes as follows:

#start
SELECT * FROM table WHERE MATCH(column) AGAINST(‘“search string”’ IN BOOLEAN MODE);
#end

Full-Text Search Gotchas

Full-text search in MySQL does have a few “gotchas ”:

  • Both InnoDB and MyISAM storage engines have their own stopword lists. You can find the list of InnoDB stopwords here, and the list of MyISAM stopwords here.
    • Create a table with the same structure as the INNODB_FT_DEFAULT_STOPWORD table, insert stopwords there, and set the innodb_ft_server_stopword_table option to db_name/table_name.
    • Set the ft_stopword_file variable to the path name of the file containing your own stopword list if you wish to define your own stopword list for MyISAM. Stopwords can be separated by any nonalphanumeric character other than “_” and “‘”. The default stopword file is located in storage/myisam/ft_static.c. Stopwords can be disabled by setting the variable to an empty string.
  • There is no possibility of performing a full-text search on partitioned tables.
  • It is necessary for all columns in a FULLTEXT index to have the same character set and collation.
  • As far as I can tell, the % string does not appear to be treated as a wildcard in the context of a full-text search.

The built-in FULLTEXT parser determines where words begin and end based on specific characters, such as spaces, commas, and periods. If you include one or more of those characters in your search string, your results may not be accurate. The query “test.demo” will return more (10, 15 etc.) results if there are five rows with the string “test.demo”. Since it searches for “demo” instead of “test.demo”, you might get a lot of irrelevant matches. 

 

Leave a comment