Home / Solr Tutorial / Solr – Analyzers

Solr – Analyzers:

In Solr, Analyzers is one of the main concepts which deals with the breakdown of textual data while indexing or querying. When the data is added or updated in the documents of Solr Analyzer do the processing of text in fields and create a token stream. This kind of analysis need not be the same while indexing and querying.

Let’s understand with an example:

While indexing many of us wants to simplify the words by leveling the letters to lowercase, ignoring punctuation and quotations etc. So that “abc”, “Abc”, “ABC”, “aBc” will match for abc when you query that information. While querying we need the data what we expect precisely. Hence to narrow the results for that query we can stop leveling the letters to lowercase. There differs the analysis process needed for indexing and querying.

Analyzers will be the child tag for the <fieldType> element which is specified in the schema.xml configuration file.

Example-1:

This example will analyze the text with single class WhitespaceAnalyzer.

<fieldType name="nametext" class="solr.TextField">
 <analyzer class="org.apache.lucene.analysis.core.WhitespaceAnalyzer"/>
</fieldType>

Example-2:

This is sample example with tokenizers and filters.

<fieldType name="nametext" class="solr.TextField">
 <analyzer>
 <tokenizer class="solr.StandardTokenizerFactory"/>
 <filter class="solr.StandardFilterFactory"/>
 <filter class="solr.LowerCaseFilterFactory"/>
 <filter class="solr.StopFilterFactory"/>
 <filter class="solr.EnglishPorterFilterFactory"/>
 </analyzer>
</fieldType>

Example-3:

Analyzers in the above examples will be used for both indexing and querying. This example will show the difference in analyzing the text while indexing and querying.

<fieldType name="nametext" class="solr.TextField">
 <analyzer type="index">
 <tokenizer class="solr.StandardTokenizerFactory"/>
 <filter class="solr.LowerCaseFilterFactory"/>
 <filter class="solr.KeepWordFilterFactory" words="words.txt"/>
 <filter class="solr.SynonymFilterFactory" synonyms="synons.txt"/>
 </analyzer>
 <analyzer type="query">
 <tokenizer class="solr.StandardTokenizerFactory"/>
 <filter class="solr.LowerCaseFilterFactory"/>
 </analyzer>
</fieldType>