Why Term Queries Aren’t the Same for Text and Keyword in OpenSearch and Elasticsearch

In the realm of data analytics and search technologies, term queries are an important tool created for exact matching. While they may seem straightforward, a surprising quirk needs careful attention. Watch out for the distinction in behavior for text and keyword term queries in OpenSearch and Elasticsearch.

TL;DR

The common expectation for term queries is that they will not modify your input value. If you’re dealing with a text term query, this theory holds true. However, an unexpected divergence occurs when you use keyword term queries and apply input analyzers. This fundamental contrast creates different behaviors for text and keyword queries.

Why it Matters

When you are trying to find something specific, say “Waldo”, it may seem obvious to reach for a term query such as:


{
    "query": {
        "term": {
            "people_found_here": {
                "value": "Waldo"
            }
        }
    }
}

If the index you are hitting defines people_found_here as a text field, you are unlikely to find a match, whereas if people_found_here is a keyword field, you will.

So now you’re asking, “Why won’t you find anything for text”? The answer here lies in the fact that, by default (and in most configurations), text fields apply a lowercase analyzer. As a result, the term “Waldo” will not exist, whereas “waldo” might.

Now you’re asking “How in the world will this be any different for keyword”? This is where the behavior shifts, as term queries on keyword fields will apply the analyzers specified in the index. As a result, if your keyword field applies the lowercase analyzer you just might find “Waldo” no matter where he is (or how he was stored).

How We Got Here

This issue was discussed in GitHub thread 25487 where the community deliberated on the unique behavior of keywords. They ultimately decided they’d retain this divergent behavior for keyword term queries.

The alteration originated within Elasticsearch but later found its way into OpenSearch. What’s interesting is the subtle difference in how both search technologies document this behavior. Elasticsearch explicitly notes this particular trait in its documentation, which you can find here. This inclusion alerts developers beforehand, giving them a chance to adapt their practices accordingly.

On the contrary, the OpenSearch documentation does not mention this unique twist. The guidance on term queries advises that they are suitable for keyword use only, but it does not explicitly mention the application of input analyzers.

Text vs. Keyword

This incongruity can lead to confusion among developers and users who transition between the two systems or rely on the documentation for clarification. Remember, keyword term queries in both Elasticsearch and OpenSearch apply input analyzers, which may not be expected behavior based on standard term query definitions.

As we delve deeper into these advanced data analytics tools, it’s essential to understand such distinct behaviors. This will not only help improve your queries but also give you better control over the data retrieved, enhancing the overall search experience.

As they say, the devil is in the details. In this case, understanding this detailed difference can save you from unexpected outcomes and lead to a more streamlined, accurate data search process.

Conversation

Join the conversation

Your email address will not be published. Required fields are marked *