Format your Data - Bloomreach Experience - Open Source CMS

Format your Data

DataConnect is the tool used to send and manage your content search data. Your site's content data has to comply with the feed format prescribed by Bloomreach to make it searchable along with the metadata and easy for Bloomreach to ingest.

Let’s first explain the following DataConnect terms: items, catalogs, and collections. Each piece of content is called an item, such as "Awesome Omelette Recipe" or "How to Prepare a Lasagna". Data from your content feed(s) form catalogs for specific types of content, such as Recipes, Blogs, or Videos. Collections gather the catalogs of a specific content type.

If you are an existing customer using the earlier version of Content Search (site crawl based), we recommend you upgrade to our newer version of Content Search (feed based). To do so, you must prepare your data as per the specification on this page and integrate the pixels and API. View the Integration Steps here

Feed Format

Bloomreach supports JSON Patch format for Content Search data. JSON Patch is a format for describing changes to a JSON document. It can be used to avoid sending a whole document when only a part has changed. When used in combination with the HTTP PATCH method, it allows partial updates for HTTP APIs in a standards-compliant way. The patch documents are themselves JSON documents. For more information about JSON Patch refer to RFC 6902 from the IETF. You can upload the data via the DataConnect API.   

  • File type: .jsonl (JSON Lines)
  • File name: {Date in YYYYMMDD}-{Time in HHMMSS}-{CatalogName}-{full/patch}.jsonl
    • Example: 20191128-235959-recipes_en-full.jsonl
    • File name format is optional, but we recommend following this format to help in debugging
  • Number of feeds: create one feed per catalog
  • Frequency: No requirements

Items 

Items are any piece of content or page on your live site that you want Bloomreach to index and return in search. While you can define attributes within your items as you wish, ensure that the items you define have the following components:

  • Path as a unique identifier of an item. 
  • Attributes depending on the content such as title, description, tags, etc.  Attributes may be defined differently for each item. For example, a video could have an attribute “duration” defining the total duration of the video, but you would not have to include “duration” for blogs. 
  • Views (optional) if you wish to display only a certain version of the content to specific user groups. 

Catalog

A catalog is simply a grouping of items. These items can be a collection of blog posts, news articles, videos, etc. Bloomreach understands and tracks your items using a catalog. A catalog has a unique name preconfigured, that is also unique to a domain (if you have multiple sites). Further, a catalog also has a unique identifier automatically generated by DataConnect. Catalogs are preconfigured in the DataConnect UI by Bloomreach, however, you can change the display name. 

Example: 

Homeoasis.com is a lifestyle, food and fashion site that has blog posts on various types of cooking recipes. Homeoasis is provided with a preconfigured catalog for “Best Potluck Recipes” with an unique identifier: “best_potluck_recipes_1”. This catalog contains many other recipes (items) such as “Potato Salad”, “Cheese Chicken Fritters”, etc. that have unique identifiers such as "Potato_Salad", "cheese_chicken_fritters", etc. 

Sample Feed 

Sample recipe: "Awesome Omelette"

{
   "op":"add",
   "path":"/items/awesome_omelette",
   "value":{
      "attributes":{
         "title":"Awesome Omelette",
         "url":"https://www.homeoasis.com/recipe/awesome-omelette.html",
         "description":"Omelettes can be a little intimidating. Omelette also falls on the healthier end of the spectrum, whereas some omelettes are oozing with cheese. The pan-roasted tomatoes are one of favorite additions, but they can be skipped if you’re in a hurry or substituted with another juicy vegetable of your choosing (sautéed mushrooms, zucchini or eggplant, for example). Serve it up with a side of fruit and a steaming cup of coffee or tea and you’re all set!",   
"medium_image_url":"https://www.homeoasis.com/images/recipe/201851/img1.jpg",
         "rating":4.7,
         "reviews":22,
         "prep_time_mins":10,
         "cook_time_mins":10,
         "servings":10,
         "ingredients":[
            "10 Eggs",
            "240g of grape or cherry tomatoes, halved",
            "1 tablespoon ghee or olive oil",
            "2-3 tablespoons pistachio pesto, or other", 
            "100g Olives",
            "Salt: white & fine"
         ],
         "category":[
            "Breakfast",
            "Brunch"
         ],
         "directions":"Melt about 1 teaspoon of ghee/oil in an 8-inch cast iron [or non-stick] pan over medium heat. Once hot, add the tomatoes to the pan and sprinkle with salt. Let cook for about 12-18 minutes, flipping every few minutes until the liquid has mostly cooked off and they look caramelized [refer photo 1]. Reduce heat to medium-low and let the pan cool down for a few minutes. Add in remaining ghee/oil. Whisk the eggs briskly for about 30 seconds. Pour eggs into the pan and swirl around to evenly distribute. It should sizzle a bit but not go crazy. You want the eggs to cook slowly. Let the eggs cook without stirring for about 2 minutes until the edges and bottom start to set. Once the omelet starts to set gently lift up the edges with a spatula and tilt the pan towards that edge to help some of the uncooked egg run beneath. Dollop the pesto on one half of the omelet and sprinkle the same half with roasted tomatoes. Loosen the edges of the side with no toppings and carefully fold it over to cover the toppings. Let cook 1 more minute then slice in half and serve immediately. Top with salt + pepper as desired."
      },
    }
}

Sample video: “How to Make Our Awesome Omelette”

{ 
   "op":"add", 
   "path":"/items/awesome_omelette_video", 
   "value":{ 
      "attributes":{ 
         "title":"How to Make Our Awesome Omelette", 
         "url":"https://www.homeoasis.com/video/awesome-omelette-video.html", 
         "description":"Follow along our Awesome Omelette recipe with this companion video.", 
         "medium_image_url":"https://www.homeoasis.com/images/recipe/201851/img1.jpg", 
         "rating":4.7, 
         "video_id":HDRS2748, 
         "video_duration":5, 
         "category":[ 
            "Videos", 
            "Breakfast" 
         ], 
      }, 
   } 
}

Sample PDF, “Awesome Omelette”

{
   "op":"add",
   "path":"/items/awesome_omelette_pdf",
   "value":{
      "attributes":{
         "title":"Awesome Omelette",
         "url":"https://www.homeoasis.com/pdf/awesome-omelette.pdf",   
  "medium_image_url":"https://www.homeoasis.com/images/recipe/201851/img1.jpg",
         "rating":4.7,
         "category":[
            "PDF",
            "Breakfast"
         ],
      },
      "@import":{
         "path":"/pdfs/awesome_omelette.pdf"
      }
   }
}

Item Attributes

Every item requires the op and path attributes:

Field name

Description

Example

op

Defines the type of operation to be performed on the data. The operations supported are “add” (which can also replace) and “remove”.

add, remove 

path 

Unique identifier linked to a specific piece of content, defined as “items/{item_id}”. The item_id must be unique from all items in the dataset. Example: for an item_id “awesome_omelette”, the path is “items/awesome_omelette”

The same item_id should be used in both your feed and your pixel.

JSON Pointer in this case "path", defines a string format for identifying a specific value i.e. item within a JSON document. It is used by all operations in JSON Patch to specify the part of the document to operate on. A JSON Pointer is a string of tokens separated by / characters, these tokens either specify keys in objects or indexes into arrays. For more information refer to IETF RFC 6901. You can read more about JSON Pointer here

 

/items/awesome_omelette

If your catalog is configured for document search (PDFs), you must include the following field:

Field name Description Example
@import Extracts the contents of the PDF. Within this field, you must include the path, which is the relative FTP path to the PDF. This will extract “title” and “body” from the pdf as item attributes. If you have already provided values for “title” or “body” as item attributes, then those take precedence over the extracted values. "path":"/pdfs/awesome_omelette.pdf"

Sample Attributes (optional)

The following fields are sample attributes, which are all optional. Attributes consist of a name and value; for example, an attribute named “title” with a value of “Awesome Omelette”. 

Field name

Description

Example

title

The actual title of the content that you are making is searchable. 

Awesome Omelette

url

The url of the HTML page within which this content lies.

https://www.homeoasis.com/recipe/awesome-omelette.html

description

This is the body of your content. You can use it to capture the summary of the content piece. 

 

publication_date

The date when the content was published online. 

1556803380000

author_name

The name of the author who wrote the content piece. 

John  Smith 

category

The category or categories that the content belongs to, provided as an array.

Recipes

Content search does not require items to have any specific attributes, but we recommend providing at least a title to ensure the item is searchable.

On the other hand, we recommend excluding attributes that are irrelevant to search. Sending attributes or information that are not relevant to your desired search experience will increase both search request latency and index generation time. 

Rules for naming item_id and attributes

  • item_id name and attribute names may only use alphanumeric characters (A to Z, 0 to 9) or underscores ( _ )
  • Attribute names should not start with a number
  • item_id and category are reserved names
  • Attribute values can be one of the following types:
    • string
    • integer
    • float
    • boolean
    • A homogeneous array of any types above
    • Objects are not currently supported, so if you have objects of arbitrary depths, you would need to flatten them out.
  • Max length of any attribute value should be 32 KB. For arrays, max length of any single value should be 32 KB

Examples:

  • “awesome_omelette_123” is a valid item_id
  • “awesome:omelette_123” is not a valid item_id because “:” is not a valid character
  • “prep_time_mins” is a valid attribute name
  • “1st_prep_time_mins” is not a valid attribute name because it starts with “1”
     

Delta feed

You can send a delta feed to modify an existing catalog. Delta feeds can either modify an entire item or its attributes using various patch operations, which are listed below:

Description

Op

Path

Value Schema

Add or replace an item

add

/items/{item_id}

Item

Remove an item

remove

/items/{item_id}

n/a

Replace all attributes of an item

add

/items/{item_id}/attributes

Attributes

Add or replace an attribute of an item

add

/items/{item_id}/attributes/{name}

Attribute value

Remove an attribute of an item

remove

/items/{item_id}/attributes/{name}

n/a

Sample delta feed

Remove the entire Awesome Omelette item

{
   "op": "remove",
   "path": "/items/awesome_omelette"
}

Replace the Awesome Omelette ratings value with 4.8 and reviews value with 24

{
   "op": "add",
   "path": "/items/awesome_omelette/attributes/ratings", 
   "value": “4.8”
}
{
   "op": "add",
   "path": "/items/awesome_omelette/attributes/reviews", 
   "value": “24”
}

Remove the Awesome Omelette reviews attribute, and replace ratings value with 4.9 

{
   "op": "remove",
   "path": "/items/awesome_omelette/attributes/reviews"
}
{
   "op": "add",
   "path": "/items/awesome_omelette/attributes/ratings",
   "value": “4.9”
}

Views (optional)

You can specify views to show different versions of the same content item to different viewers. This scenario requires a multi-view catalogs setup and is typically used in cases like Contracts, Price Lists, and Entitlements wherein you want to show different versions of the same content to different viewers. For example, you could use views to show specific content to logged-in or premium users only, or to show different content for different regions. For sites in different languages, when you have a different feed for each language, you can use domain keys to distinguish between the sites. 

To integrate content search with views, you will have to modify your pixel and feed.

Adding Views to Pixel

The steps to add views to your pixel will vary, depending on how your site is set up. This is to ensure that pixel events are tracked on the correct catalog, view, and/or domain. 

Here are some high level scenarios to help:

  • If you only have a single site and display different versions of content depending on the user (such as Logged in vs. Not logged in), then you will only add the View to catalogs.view_ids
  • If you have multiple sites (such as Florida site, New England site, California site, etc.), and display different versions of content depending on the site, then you will add Views to br_data.view_id and catalogs.view_ids
  • If you have multiple sites in different languages (such as French site and English site) and each site has a different feed, then you will use domain_key to highlight the different sites

You can find more details and examples in our Content Search Pixel Integration Scenarios.

Adding Views to Feed

To specify views for an item, you must include the following fields:

  • View ID - Unique identifier linked to a specific view. The view ID must be unique from all other views. Example: “Basic”, “Premium” 
  • View attributes - Item attributes defined for the view specified by the view ID. Attributes nested inside a view ID only apply to that view, while attributes nested outside of “views” apply to all views.
{
   "op":"add", 
   "path":"/items/awesome_omelette",
   "value":{
      "attributes":{
         //Attributes shared across all views
      },
      "views":{
         "Basic":{
            "attributes":{
               "title":
               "url":
               "description":
                  ...
            }
         },
         "Premium":{
            "attributes":{
               "title":
               "url":
               "description":
                  ...
            }
         },
      }
   }
}


Adding Views to Delta Feed

You can use the following patch operations to modify values within a view.

Description

Op

Path

Value Schema

Add or replace all views of an item

add

/items/{item_id}/views

Views

Add or replace a view of an item

add

/items/{item_id}/views/{view_id}

View

Remove a view from an item

remove

/items/{item_id}/views/{view_id}

n/a

Replace all attributes of a view

add

/items/{item_id}/views/{view_id}/attributes

Attributes

Add or replace an attribute of a view of an item

add

/items/{item_id}/views/{view_id}/attributes/{name}

Attribute value

Remove an attribute from a view of an item

remove

/items/{item_id}/views/{view_id}/attributes/{name}

n/a

Operations on the /items/{item_id}/attributes path only apply to attributes nested outside of views. 

Example Feed Files

To get started, here are some example feed files that you can download. 

Did you find this page helpful?
How could this documentation serve you better?
On this page
    Did you find this page helpful?
    How could this documentation serve you better?