Elasticsearch Analyzers – Basic Analyzers

In this tutorial, we’re gonna look at some basic analysers that Elasticsearch supports.

1. Keyword Analyzer

keyword analyzer returns the entire input string as a single token.

Terms:

2. Whitespace Analyzer

whitespace analyzer breaks text into terms whenever it encounters a whitespace character.

Terms:

3. Simple Analyzer

simple analyzer breaks text into lower cased terms whenever it encounters a character which is not a letter.

Terms:

4. Stop Analyzer

stop analyzer is just like simple analyzer, but supports removing stop words (_english_ stop words by default).

Terms:

Configuration

stopwords: pre-defined stop words list or an array containing a list of stop words ([“the”, “over”] for example). Defaults to _english_.
stopwords_path: path to a file containing stop words (relative to the Elasticsearch config directory).

For example, we configure analyzer with an array containing a list of stop words (“the”, “over”):

Terms:

Elasticsearch provides predefined list of languages:
_arabic_, _armenian_, _basque_, _brazilian_, _bulgarian_, _catalan_, _czech_, _danish_, _dutch_, _english_, _finnish_, _french_, _galician_, _german_, _greek_, _hindi_, _hungarian_, _indonesian_, _irish_, _italian_, _latvian_, _norwegian_, _persian_, _portuguese_, _romanian_, _russian_, _sorani_, _spanish_, _swedish_, _thai_, _turkish_.

To disable stopwords, use: \_none_.

5. Standard Analyzer

standard analyzer is the default analyzer. It provides grammar based tokenization (based on the Unicode Text Segmentation algorithm) and works well for most languages.

Terms:

Configuration

stopwords: pre-defined stop words list or an array containing a list of stop words ([“the”, “over”] for example). Defaults to \_none_.
stopwords_path: path to a file containing stop words (relative to the Elasticsearch config directory).
max_token_length: maximum token length. If a token exceeds this length, it is split at max_token_length intervals. Defaults to 255.

For example:

Terms:

By JavaSampleApproach | November 13, 2017.

Related Posts


Got Something To Say:

Your email address will not be published. Required fields are marked *

*