Elasticsearch Term Level Queries – Regexp Query

regexp query help us to use regular expression term queries. Elasticsearch will apply the regexp to the terms produced by the tokenizer for that field (not to the original text).

I. Regexp Query

This is an example that uses regexp query to match documents with fullname field that contains term start with “j”:

Response:

*Note: Performance of this query heavily depends on chosen regular expression.
If possible, we should use a long prefix before regular expression starts. Wildcard matchers like .*?+ will mostly lower performance.

Boost

boost parameter can give term query a higher relevance score than another query.
The default boost value is 1.

Special Flags

Different flag combinations can be used to enable/disable specific operators:

For more detail, just scroll down to 2_Optional_Operator_with_Special_Flags

II. Regex Syntax

Lucene regular expression engine only supports a small range of operators:

1. Standard Operator
1.1 Match any character

Use "." to represent any character (just one character):

For string “jsa”:

1.2 Zero-or-one

"?" matches preceding shortest pattern zero or one time.

For string “jsa”:

1.3 Zero-or-more

Use "*" to match the preceding shortest pattern zero or more times.

For string “jsa”:

1.4 One-or-more

Use "+" to match match the preceding shortest pattern one or more times.

1.5 Min-to-max

Use "{}" to specify a minimum (and a maximum number of times) the preceding shortest pattern can repeat:
{n} # repeat exactly n times
{a,b} # repeat at least a times and at most b times
{a,} # repeat at least a times

For string “aaabbb”:

1.6 Grouping

Use "()" to create sub-patterns (instead of shortest pattern).

For string “javajavajava”:

1.7 Alternation

"|" acts as an OR operator. It applies to the longest pattern, not the shortest.

For string “aabb”:

1.8 Character classes

Just enclose character classes in square brackets “[]”, we can create range of potential characters. ^ can negate the character class.

For string “abcd”:

2. Optional Operator with Special Flags

flags parameter defaults to ALL (enable all flags).

2.1 COMPLEMENT

The shortest pattern that follows "~" is negated.

For example, “ab~cd” means:

For the string “abcdef”:

2.2 INTERVAL

interval enables the use of numeric ranges with "<>".

For string: “java90”:

2.3 INTERSECTION

"&" joins two patterns in a way that both of them have to match.

For string “javasample”:

2.4 ANYSTRING

"@" matches any string. It could be combined with intersection and complement to express “everything except”.

For example:

2.5 NONE

Enable no optional regexp syntax.

By JavaSampleApproach | November 7, 2017.

Related Posts


Got Something To Say:

Your email address will not be published. Required fields are marked *

*