Search Query Processing: Spell Correct

📘

Spell Correct was formerly known as AutoCorrect

Spell Correct and Did You Mean

A search request returns the autoCorrectQuery field when Bloomreach automatically corrects your customer's query. Spell Correct is triggered when the original query has 0 results and there's a query with similar text that does have results. You can display an indicator on the search results page when a query was spellcorrected.

549549

You can also use the did_you_mean field to ask customers if they meant a different query from the one they entered. The did_you_mean field is similar to autoCorrectQuery except it displays a list of other related queries (excluding the autoCorrectQuery result) rather than automatically changing the query.

Example request and response

GET https://http://core.dxpapi.com//api/v1/core/?
account_id=<Bloomreach Provided Account ID>
&auth_key=jazzhands
&domain_key=example_com
&request_id=8830241055782
&ref_url=http://www.example.com/home
&url=http://www.example.com/index.html?q=blkca
&request_type=search
&rows=20
&start=0
&fl=pid,title,brand,price,sale_price,colors,sizes,thumb_image,price_range,sale_price_range
&q=blkca
&search_type=keyword
{
  "response": {
    "numFound": 23141,
    "start": 0,
    "docs": [
     ]
  },
  "facet_counts": {
  },
  "autoCorrectQuery": "black", //Query "blkca" was autocorrected to "black"
  "did_you_mean": [ //Similar queries
    "block",
    "blanca"
    ],
 
 
  "category_map": {
  },
  "sort_fields": [
     ]
}

Spell Correct Algorithms


Spell Correct has the following two algorithms:

  • Term Frequency
  • Closest Match

Term Frequency (default mode)


This algorithm uses term frequency to rank the different candidates for a correction. In other words, the algorithm will consider a term that appears more frequently in your catalog as the likely candidate for spell correction.

For example, your customer enters the query "shrts". The list of corrected spellings can be "shirts", "shorts", and "shoes". Suppose the term “shorts” appears more frequently in your catalog than “shirts”, then the query “shrts” will get corrected to “shorts”.

In case of a tie, Edit Distance (the number of edits it takes to get from one term to another - in this case between the user query and the spell correct candidate) is used to select the spell check candidate

Closest Match


This algorithm uses Edit Distance to select the Spell Correct suggestion. The Edit Distance between two sets of letters or numbers is the minimum number of edits it takes to get from one term to another.

The smaller the edit distance of a spell-correct candidate is to the original query, the higher it will rank. It helps determine which term is closest to what the user entered.

Consider the example in the table below:

Misspelled Query Corrected Candidate Edit Distance
Shooes Shoes 1

( It removes one “o” from the original query )

Shooes Shorts 2

( It replaces one “o” with an “r” and one “e” with a “t”)

"Shoes" will rank higher than "Shorts" since its Edit Distance is comparatively smaller. Hence, "Shoes" will be considered as the closest match for the given misspelled query.

Note: Clean data is important for selecting the closest match for the misspelled term.

The algorithm assumes that the product data provided is always correct. If your catalog contains spelling mistakes, the closest match algorithm may surface the incorrect terms. For instance, the query “orangic” can be autocorrected to “oragnic” instead of organic if your catalog has the misspelled term “oragnic”.

Spell Correct Algorithms Comparison


The following table can help you compare and understand how the above two modes prioritize candidates for autocorrecting the mistyped query:

Misspelled Query Term Frequency

(picks the term that appears most frequently in your product data)

Closest Match

(picks the term that is closest to the original query)

goldem goose gold goose golden goose
strapy shoes stripe shoes strappy shoes
nugets nuts nuggets

Spell Correct Algorithm Enablement


You can easily enable either of these modes per your use case. Take a look at the Algorithm Controls guide to understand the enablement process.