Build the MCP server

MCP server overview

Your MCP server bridges ChatGPT with Bloomreach Discovery. It transforms queries, calls APIs, and formats responses.

This guide walks through a demo Loomi Connect MCP server built in Python.

Important files

The following files contain the logic for the major functionalities that run the MCP server:

main.py: Orchestrates Discovery API calls, fetches and formats results, returns structured JSON.
search_utils.py : Contains the utility functions to build the request URL, extract price range, and facet enrichment to build filters for the request.
search_client.py: Makes resilient HTTP GET requests with retries and timing logs.

How the server works

Here is a high-level overview of the working of the MCP server:

Listen for query: Get a free-text natural language query from the ChatGPT app.
LLM extraction: Parse query to extract search terms and filters.
JSON fallback: Check for JSON-formatted query.
Extract price range: To get price preference, handle queries with “under,” “over,” or “between.”
Enrich facets: Derive filters from the user’s query, and map to allowed facet fields for the Discovery Search API.
Build request URL: Combine the search query and facet list for the Discovery API request URL.
Fetch results: Make an HTTP request to get search results.
Review results (optional): An LLM verifies results match user intent.
Return output: Send structured object to ChatGPT for rendering.

LLM extraction

Implement AI-powered parameter translation, which transforms natural language into structured parameters for product search.

Best practices:

Use a compact model to minimize latency.
Enforce strict JSON output format.

discovery_server_python/search_utils.py

resp = _requests.post(
    "https://api.openai.com/v1/chat/completions",
    headers={"Authorization": f"Bearer {api_key}", "Content-Type": "application/json"},
    data=json.dumps({
        "model": "gpt-4o-mini",
        "response_format": {"type": "json_object"},
        "messages": [
            {"role": "system", "content": "You are a strict JSON extractor. Output ONLY JSON."},
            {"role": "user", "content": prompt},
        ],
    }),
    timeout=20,
)
data_llm = resp.json()
content = data_llm.get("choices", [{}])[0].get("message", {}).get("content", "")
parsed_llm = json.loads(content) if content else {}

JSON fallback

Check for the edge case that the user’s query is already in JSON format, and filter out the allowed facets.

discovery_server_python/search_utils.py

maybe = json.loads(q_text)
if isinstance(maybe, dict) and "q" in maybe:
    q_text = str(maybe.get("q") or "").strip() or q_text
    facets_json = maybe.get("facets")
    if isinstance(facets_json, list):
        for f in facets_json:
            try:
                field = f.get("field")
                if not field:
                    continue

Extract price range

Convert plain-English phrases like “under”, “over”, “less than”, and more to numeric ranges for the API request, and strip them from the original query.

discovery_server_python/search_utils.py

m = re.search(r"under\s*\$?([0-9,]+(?:\.[0-9]+)?)", q_text, flags=re.I)
...
range_filters.append(f"price:[* TO {val}]")
...
m = re.search(r"over\s*\$?([0-9,]+(?:\.[0-9]+)?)", q_text, flags=re.I)
...
range_filters.append(f"price:[{val} TO *]")
...
m = re.search(r"between\s*([0-9,]+(?:\.[0-9]+)?)\s*(?:and|to)\s*([0-9,]+(?:\.[0-9]+)?)", q_text, flags=re.I)
...
range_filters.append(f"price:[{lo} TO {hi}]")

Enrich facets

Fetch facet buckets (facet=true) for the current query, and tokenize the facet values and the user text (case-sensitive). Include the values whose tokens/bigrams match the user’s text.

Best practices:

Add a cache TTL (Time-To-Live) of 60 seconds to avoid fetching too often.
For each facet field, look through up to 200 values.

discovery_server_python/search_utils.py

_FACET_TTL_SECONDS = 60.0
_FACET_MAX_ENTRIES = 200
_ALLOWED_TERM_FIELDS = {"sfccClass", "brand"}

def _get_facet_fields(base_url: str, q_text: str) -> Dict[str, Any]:
    if os.getenv("DISABLE_FACET_PROBE") == "1":
        return {}
...

facet_fields = _get_facet_fields(base_url, q_text)
original_text = (query or "").lower()

def _tokens(s: str) -> List[str]:
    s2 = re.sub(r"([a-z])([A-Z])", r"\1 \2", s)
    return re.findall(r"[a-z0-9]+", s2.lower())

toks = _tokens(original_text)
tokset = set(toks)
bigrams = set(" ".join(x) for x in zip(toks, toks[1:]))
for fname, raw in facet_fields.items():
    values_list = []
    ...

Build request URL

We preserve the fq parameter entries and create the final request URL. We also return an applied object useful for debugging the functionality.

discovery_server_python/search_utils.py

qp_out = dict(query_params)
qp_out["q"] = q_text
if fq_list:
    qp_out["fq"] = fq_list
request_url = urlunparse(parsed._replace(query=urlencode(qp_out, doseq=True)))
applied = {
    "range_filters": list(range_filters),
    "term_filters": {k: list(v) for k, v in term_filters.items()},
    "allowed_term_fields": sorted(list(_ALLOWED_TERM_FIELDS)),
    "fq": list(fq_list),
}

Fetch results resiliently

Make the HTTP request, and implement retry logic with exponential backoff.

Create structured logs that include the attempt number, status, transferred bytes, and duration of the request.

discovery_server_python/search_client.py

def http_get_json(url: str, timeout: int = 10, retries: int = 2, backoff_seconds: float = 0.5) -> Dict[str, Any]:
    last_exc: Optional[BaseException] = None
    for attempt in range(retries + 1):
        try:
            start = time.time()
            if attempt == 0:
                logger.info("HTTP GET %s", url)
            else:
                logger.info("HTTP GET retry=%d %s", attempt, url)
…
except (urllib.error.URLError, urllib.error.HTTPError, TimeoutError, json.JSONDecodeError) as exc:
    last_exc = exc
    ...
    if attempt < retries:
        time.sleep(backoff_seconds * (2 ** attempt))
    else:
        break

Review results (optional)

Condense the items, and ask an LLM to return the indices of the results to keep. Add a flag DISABLE_LLM_RESULT_FILTER=1 to enable/disable the result verification functionality.

discovery_server_python/search_utils.py

request_body = {
    "model": "gpt-4o-mini",
    "response_format": {"type": "json_object"},
    "messages": [
        {"role": "system", "content": "You are a precise search results judge. Output ONLY JSON."},
        {"role": "user", "content": json.dumps(user_payload)},
    ],
}