Elasticsearch Multi Match Query – More Practice

We have known some basic Elasticsearch Multi Match Queries. This tutorial shows you more practice: how Operater affects to Best Fields/Most Fields/Cross Fields type, how to use Tie Breaker with Cross Fields type, Fuzziness in Multi Match Query…

I. Operator
1. Best Fields/ Most Fields

best_fields and most_fields types are field-centric - they generate a match query per field.
>> operator parameter is applied to each field individually.

For example, the query below is executed as: (+title:spring + title:integration) | (+tags:spring +tags:integration)

>> All terms must be present in a single field (title and tags) for a document to match. Check the Response:

2. Cross Fields

cross_fields type is particularly useful with structured documents where multiple fields should match. For example, when querying the title and tags fields for “spring integration”, the best match is likely to have “spring” in one field and “integration” in the other.

This approach is different from most_fields. Instead of applying per-field, cross_fields type is per-term. It’s a term-centric approach.

Firstly, it analyses the query string into individual terms.
Then it looks for each term in any of the fields, as though they were one big field.

For example:

This query is executed as: (+title:spring +tags:integration) | (title:integration +tags:spring)
Response may be:

II. Cross Fields and Tie Breaker

tie_breaker parameter can change the default behaviour:
– 0.0 (default): take the single best _score out of title:integration and tags:integration
– 1.0: add together the _scores for title:integration and tags:integration
– 0.0 < tie_breaker < 1.0: take the single best score (tie_breaker = 0.0) + tie_breaker multiplied by each of the scores from other matching fields.

For example:
– tie_breaker = 0:

Response:

– tie_breaker = 1:

Response:

– 0.0 < tie_breaker < 1.0:

Response:

III. Fuzziness

For Multi Match Query with Best Fields and Most Fields, Elasticsearch also accepts:

fuzziness: maximum edit distance (0..2). Defaults to AUTO.
prefix_length: number of initial characters which will not be “fuzzified”. Defaults to 0.
max_expansions: maximum number of terms that the fuzzy query will expand to. Defaults to 50.

For more details about fuzzy query and fuzziness, please visit:
Elasticsearch Term Level Queries – Fuzzy Query

For example:

Response:

IV. More…

We can also add analyzer, boost, minimum_should_match, lenient, zero_terms_query to the query.

For more details about analyzer, you can find at:
Elasticsearch Analyzers – Basic Analyzers
Elasticsearch Analyzers – Custom Analyzer

Just like match query, multi_match also supports cutoff_frequency that allows specifying:
– an absolute document frequency (greater or equal to 1.0), or
– relative document frequency (in the range [0..1) ): high frequency terms will be moved into an optional subquery and are only scored if:
+ one of the low frequency (below the cutoff) terms matchs (or operator) , or
+ all of the low frequency terms match (and operator).

cutoff_frequency allows handling stopwords dynamically at runtime without a stopword file.
– It prevents scoring/iterating high frequency terms and only takes the terms into account if a more significant/lower frequency term matches a document.
– If all of the query terms > cutoff_frequency, query is automatically transformed into and operator for fast execution.

By JavaSampleApproach | November 17, 2017.


Related Posts


Got Something To Say:

Your email address will not be published. Required fields are marked *

*