Article summary
In the realm of data analytics and search technologies, term queries are an important tool created for exact matching. While they may seem straightforward, a surprising quirk needs careful attention. Watch out for the distinction in behavior for text
and keyword
term queries in OpenSearch and Elasticsearch.
TL;DR
The common expectation for term
queries is that they will not modify your input value. If you’re dealing with a text term
query, this theory holds true. However, an unexpected divergence occurs when you use keyword term
queries and apply input analyzers. This fundamental contrast creates different behaviors for text
and keyword
queries.
Why it Matters
When you are trying to find something specific, say “Waldo”, it may seem obvious to reach for a term
query such as:
{
"query": {
"term": {
"people_found_here": {
"value": "Waldo"
}
}
}
}
If the index you are hitting defines people_found_here
as a text
field, you are unlikely to find a match, whereas if people_found_here
is a keyword
field, you will.
So now you’re asking, “Why won’t you find anything for text
”? The answer here lies in the fact that, by default (and in most configurations), text
fields apply a lowercase
analyzer. As a result, the term “Waldo” will not exist, whereas “waldo” might.
Now you’re asking “How in the world will this be any different for keyword”? This is where the behavior shifts, as term
queries on keyword
fields will apply the analyzers specified in the index. As a result, if your keyword
field applies the lowercase
analyzer you just might find “Waldo” no matter where he is (or how he was stored).
How We Got Here
This issue was discussed in GitHub thread 25487 where the community deliberated on the unique behavior of keywords
. They ultimately decided they’d retain this divergent behavior for keyword term
queries.
The alteration originated within Elasticsearch but later found its way into OpenSearch. What’s interesting is the subtle difference in how both search technologies document this behavior. Elasticsearch explicitly notes this particular trait in its documentation, which you can find here. This inclusion alerts developers beforehand, giving them a chance to adapt their practices accordingly.
On the contrary, the OpenSearch documentation does not mention this unique twist. The guidance on term
queries advises that they are suitable for keyword
use only, but it does not explicitly mention the application of input analyzers.
Text vs. Keyword
This incongruity can lead to confusion among developers and users who transition between the two systems or rely on the documentation for clarification. Remember, keyword term
queries in both Elasticsearch and OpenSearch apply input analyzers, which may not be expected behavior based on standard term query definitions.
As we delve deeper into these advanced data analytics tools, it’s essential to understand such distinct behaviors. This will not only help improve your queries but also give you better control over the data retrieved, enhancing the overall search experience.
As they say, the devil is in the details. In this case, understanding this detailed difference can save you from unexpected outcomes and lead to a more streamlined, accurate data search process.