It is possible to import data to your project from sources outside of Bloomreach Engagement. These can be CSV or XML files (uploaded through the import wizard or through a URL feed), integrated SQL databases, imports from file storage integration (Google Cloud Storage, SFTP, Amazon S3 and Azure Storage), or data from your analyses (reports, etc.) in Bloomreach Engagement.”

Bloomreach Engagement works with 4 types of data:

13521352

Watch this short introductory video about this feature:

Before you start, check the technical reference section of this guide and best practices to understand the size limits, etc.

To import new data, go to Data & Assets > Imports > + Create new.

Next, you will be asked to choose which type of data you want to import. This article will now explain the basics that you need to know about imports while guiding you through the import process for each type of data.

Importing customers

In customer imports, columns represent the customer attributes while each row stands for a single customer. The first row in the data will be taken to identify the attribute. You can change the naming, discard the whole column, or add a new column with a static value later in the data mapping step.

16581658

In the row below, you can see the four steps that you will have to go throw when making data imports.

Data source

In this step, you will have to choose your data source. You can import your data from the following sources:

Copy & paste

You can directly paste your data into the application.

Upload a file

The file must be in the CSV or XML format. You can select a File format or drag&drop the file onto the canvas.

Database

You can feed the data from one of your integrated databases using the SELECT statements. The most commonly used databases are supported.

URL

If your data is saved online, you can create a feed to its URL, e.g.: https://www.hostname.com/exports/file.csv. It is also possible to input username and password for HTTP Basic Authentication if required in the form. Password will be securely stored in Bloomreach Engagement and it will not be visible for other users.
When feeding a CSV or XML file from a database through URL, Bloomreach Engagement uses the following headers:

  • X-Exponea-Import-Phase "sample" to encoding detection sample request
  • X-Exponea-Import-Phase "preview" to request a preview which is shown when a user clicks "preview data" in the Bloomreach Engagement import wizard
  • X-Exponea-Import-Phase "download" to request the download of the full file

File storage

Import files from your own SFTP server or Google Cloud Storage bucket. Once you click on this option, click on Select a file storage integration and choose your SFTP, GCS (Google Cloud Storage for Imports and Exports), Amazon S3 or Azure integration from the list. If you don't have any, click on the + button that will take you to set up a new SFTP or GCS integration.

Bloomreach Engagement

You can also import data from analyses that you created in Bloomreach Engagement in the current or a different project.

🚧

After selecting your data source, always click on "Preview data" to see if the format looks as expected. If there is an issue, check that correct delimiter and encoding is selected.

Imports from file storage

Imports from file storage support all functional requirements to import data for any Use Case. Importing data from files located in different storage is one of the most secure & convenient ways to automatically transfer data to Bloomreach Engagement. This allows:

  • imports to filter files that should be imported
  • listen to new file uploads
  • ability to not import the same file twice

Further functions:

  • remembers the state of the folder
  • Bloomreach Engagement remembers all files already uploaded per active import. When you delete a whole import, all remembered files are also wiped.
  • general regex match for any file storage to filter files that should be imported. The enclosing slashes ( / ) are mandatory to identify a regular expression! Example: /filename[0-9]+/
  • general "new file" trigger for any file storage (Bloomreach Engagement checks file storage for new files every 5 minutes and imports those that were not imported yet)

File storage limitations

  • Scheduling file imports with selection rule All and execution rule New file uploaded is not allowed.
  • Scheduling file imports with the selection rule All or Not imported yet has a hard limit of 200 files. Imports attempting to schedule over 200 files will not be allowed to be saved and started. Already imported files also count towards this limit, so we recommend regular cleaning of the files in the file storage to avoid hitting the limit,

📘

Example

Automatic imports - you have a DWH that exports data to file storage on a daily basis (e.g. purchases). The exported file name will be different every day (regex match) and can be finished at a different time due to the complexity of the export (new file trigger).

Selection rule

Once you fill in the pre-mapping step for file storage integrations (Amazon S3, SFTP & Google Storage), you can choose the Selection rule.
It applies if there are more files on the storage matching file search:

  • All- all matched files will be imported (if already imported files are not removed, it will cause repeated imports)
  • Last modified - only 1 last modified file will be imported even when there are multiple files matching the filter
  • Last alphabetically - only 1 last file when sorted alphabetically will be imported even when there are multiple files matching the filter
  • Not imported yet - all newly discovered files will be imported, Bloomreach Engagement remembers all files already uploaded per active import. When you delete a whole import, all remembered files are also wiped.

📘

The selection rule is not applied when using the Preview function during the import configuration.

🚧

When selecting Bloomreach Engagement as a data source, be aware that analyses' time filter uses the timezone defined in user settings, while import uses a time filter in UTC. This can create discrepancies between the data shown in analysis and the data imported.

Data mapping

In the dropdown menu for each column name, you can map that attribute to already existing attributes. When mapped to an existing attribute, the name of the column changes the font to blue. Be sure that your imported data have the same structure and formatting as the existing data. You can also create a new attribute altogether. Just type the name and then click on the + sign.

10281028

During every import, you need to pair at least one of your columns with one of the identifiers in the yellow boxes. For example, for customers, this is the ID registered to identify each row with a customer, so Bloomreach Engagement knows where to import those values. However, you do not want to import such a column as a new attribute, so you need to untick the checkbox for that column.

20902090

Make sure that the correct data type is set for each column (text, number, date, etc.). Click on a column's name and select one of the existing attributes to match it with. The data type affects how Bloomreach Engagement treats those values. When filtering data, for example, the data type will decide what type of operators will be shown by default to match that type. It also affects how the values are shown in the customer profile.

Each data type is explained in the Data Manager article. You can also change the data type for each attribute later in the Data Manager. However, be cautious with list values as these are processed differently when imported under an incorrect data type.

12921292
992992

Value "28.9.1996 13:45" imported under different data types and shown in a customer profile.

You can also add a new column with a static value in the import wizard by clicking on the + sign on the very right of the preview table. We recommend always adding a column with the name source and a value describing the source of the data, such as "Import_11 June 2019". This can help later with troubleshooting or orientating your data.

26922692

🚧

If you want to send emails through Bloomreach Engagement, your customers must have an attribute called email filled with their email addresses. Similarly, if you want to send text messages, the attribute for this must be called phone.

Set repeated import (optional)

If you're uploading from a dynamic source (File Storage, Database, URL, or Bloomreach Engagement), you can schedule a repeated import. Automatically repeated imports are available if you import from a database, URL, or Bloomreach Engagement analysis.

To set a repeated import, get to the last fourth step of the import process and click on "Schedule repeated import" in the top right corner. You can set daily imports to run several times per day, weekly imports to run at one time on several days and monthly imports to run at one time on several days each month. You can also specify the time range, until when you want to keep the repeated imports running.

26882688

:white-check-mark: Run the import

You can check the status in the list of imports when you hit "Refresh". Large files will take some time to get imported.

You can use this CSV to try importing yourself before importing a big file. The screenshot below shows how the correct import with this data should look like.

registered;first_name;last_name;email;birthday
[email protected];John;Smith;[email protected];28.9.1996
26942694

"registered" is unchecked to use it only for pairing the data with the correct customer. The "birthday" attribute is defined in the date format. As it doesn't have the blue color, it means that such attribute doesn't exist in our data yet and will be created.

🚧

Customer ID error

"No valid customer ID" error occurs either when the customer is missing ID or when it is longer than 256 bytes. To resolve this, check customer IDs in each row in the document from which you are importing the customers.

🚧

Reimporting customers

If you reimport a customer but leave some of their properties empty, those properties will be overwritten with blank fields regardless of whether they had values before the import.

📘

Importing lists

If you have multiple values that you wish to import at the same time, the proper way to import such a list is as follows: ["value1", "value2"]. Use square brackets in which type in the individual values in double quotes, each separated by a comma.

Importing events

You can name your events as you like but we recommend following our naming standards for the basic events.

14021402

Data source

Choose your data source. This step works the same way as data source selection in Customer Imports. Always preview your data before proceeding to make sure it is in the expected correct format.

Data mapping

Here, you can map columns, assign customer ID tags and timestamp tags and set correct data types

This is a crucial step explained in detail in the customers´ data mapping section above.

There are 2 columns required for event imports: One must contain customer IDs to pair each event with the correct customer. You need to assign the appropriate yellow tag to this column. The second required column is timestamp of each event with the blue tag assigned to it. This must be in the datetime (eg: "2020-08-05T10:22:00") or numeric (eg: number of seconds since 1970 would be written as "1596615705". Note that this works only in seconds) format. If only a date is provided, then Bloomreach Engagement will add midnight as the time to the event's timestamp.

Remember that the columns with IDs should not be imported and hence should be left unchecked.

📘

You can have more columns with different IDs (registered, cookie, etc.). You can map all of them and Bloomreach Engagement will use the one available for a particular customer to pair the event with. registered hard ID is always prioritized.

Make sure that the correct data type is set for each column (text, number, date, etc.). Click on a column's name and select one of the existing attributes to match it with.

Set repeated import (optional)

If you're uploading from a dynamic source (Database, URL, or Bloomreach Engagement), you can schedule a repeated import.

:white-check-mark: Run the import

You can check the status in the list of imports when you hit "Refresh". Big files will take some time to get imported.

You can use this CSV to try to import yourself before importing a big file. The screenshot below shows how the correct import with this data should look like.

registered;timestamp;purchase_id;total_price;total_quantity
[email protected];1474886373;100001;53.90;3
registered;timestamp;purchase_id;price;product_id;title;quantity;total_price
[email protected];1474886373;100001;35.30;29;Colorful bracelet;1;35.30
[email protected];1474886373;100001;9.30;47;Fancy necklace;2;18.60
27022702

"registered" is unchecked to use it only for pairing the data with the correct customer. Notice the data types set for other columns.

🚧

Imported events cannot be selectively altered or deleted.

Importing catalogs

You can work with 2 types of catalogs in Bloomreach Engagement: general and product catalogs.

General catalog in Bloomreach Engagement is basically a lookup table with one primary key (item_id). It can store any table with fixed columns and can be used for various advanced use cases.

Product catalog is designed to store the database of your products. It has some additional settings and IDs like product_id, title, size, etc. These are to ensure that the attributes are properly identified by Bloomreach Engagement and can be used for recommendations.

You can select one of your existing catalogs to import additional/new data or create a new catalog by typing its name and clicking on the arrow.

13961396

🚧

Naming your catalog

Your selected catalog name should be without spaces to avoid an error.

Data source

Choose your data source. This step works the same way as data source selection in Customer Imports. Always preview your data before proceeding to make sure it is in the expected correct format.

📘

Recommended data format for item_id

It is recommended to use item_id's that do not include /. Due to the way the endpoint is decomposed, future editing using our Catalog API will not work with item_id's containing /.

Data mapping

Firstly, check "Is product catalog" at the top if you want to import a product catalog. This will display additional tags that you need to assign to your columns (if they are present in your data).

Secondly, at least one ID tag has to be assigned. There are two IDs for products: item_id and product_id.

The difference is that one product with a single product_id can have multiple variants with different item_ids. For example, you can have a product "iPhone X Case" with a product ID "123". Next, you will also have iPhone X Case in black, white, green, etc. Each of the color variants will have its own item_id but the same product_id.

For general catalogs, item_id is used as the identifier.

See the Catalogs article for more information about each column.

Thirdly, set the correct data type for each column. Read more about navigating the customer data mapping editor in the section above in this article.

26702670

Map predefined IDs to you data structure. If you are importing a general catalog only "item_id" is available and mandatory.

Set searchable fields

If you are importing a catalog for the first time, you have to choose which fields are "searchable". Searchable fields are available in Bloomreach Engagement search and also recommendations are based only on these columns (this means that if you want to make product recommendations based on brand, for example, this column has to be set as a "searchable field").

🚧

  • Searchable fields can only be defined during the first import. If you want to add more columns as searchable later, you will have to re-import the data to a new catalog.
  • Maximum of 20 columns can be set as searchable and available for recommendations.
16821682

Set repeated import (optional)

If you're uploading from a dynamic source (Database, URL, or Bloomreach Engagement), you can schedule a repeated import.

:white-check-mark: Run the import

🚧

Maximum length of catalog item value

The maximum length of the catalog item value is 28672 bytes. If you exceed this length you need to either shorten the value. It is possible to split the longer property into two short ones.

Importing vouchers

The import process for vouchers is explained in the Vouchers article.

Additional resources

Working with CSV

To import historical data into Bloomreach Engagement it is best to use CSV import. Each event should be imported in an individual CSV file. Each event i.e. each row in the CSV needs to contain a customer identifier and a timestamp in the UNIX format.

Customers may be imported to Bloomreach Engagement via CSV

512512

In the first row of the CSV file, we need to specify the identifiers and attributes of customers. These names do not need to match naming in Bloomreach Engagement (mapping of the columns onto customer identifiers/attributes in Bloomreach Engagement will be shown later).

Each cell can only contain a single value - e.g. a customer record often has several cookies. However, it is possible to only import 1 cookie in 1 row. To import multiple cookies we would need to have some other identifier for the customer. The situation would look as follows.

560560

Now customer record with registered ID equal to 1, would have [cookie1, cookie2].

In the same way, you can also update individual properties of existing customer profiles i.e. you can only import the relevant columns with the values you wish to update.

Column names in your CSV import do not need to match the names of customer attributes in Bloomreach Engagement but can be mapped. The following image illustrates the situation - to generate this window in Bloomreach Engagement go to: Under Data&Assets - imports - new import - Customers.

16001600

We can see that one of the columns in the CSV import was called ‘fn’ (first_name). We can click on the field highlighted in yellow and then select the appropriate attribute. In this case, we are assuming that ‘fn’ stands for ‘first_name’. This column will now be imported as a customer attribute first name.

📘

Technical limitations

Data should be imported in UTF-8 encoding. File size must not be greater than 1 GB. The file cannot contain more than 260 columns. It is recommended not to import more than approximately 1 million rows in a single file.

CSV data source requirements:

  • Provide HTTPS endpoint with SSL certificate signed by one of Trusted CAs (trusted CA list)
  • Authenticates requests with HTTP authorization header (HTTP Basic Auth described in RFC 2617)
  • Output CSV follows specification of CSV format (as described in RFC 4180)
  • Encoding: Allowed encodings are 'utf-8', 'utf-16', 'cp1250', 'cp1251', 'cp1252', 'cp1253', 'cp1254', 'cp1255', 'cp1256', 'cp1257', 'cp1258', 'latin1', 'iso-8859-2'
  • Delimiter: Allowed delimiters are ' ', '\t', ',', '.', ';', ':', '-', '|', '~'
  • Escape: double quote "
  • New line: CR LF (\r\n)
  • File size limit 1 GB
  • Transfer speed allowing Bloomreach Engagement to download the file in less than 1 hour

Working with XML

XML format supports all import types. The XML format is supported as a copy-paste, manual file upload, import from URL or in file storage (SFTP, Google Cloud Platform, S3) integration. XML is one of the most used data formats, usually when it comes to product catalogs for e-commerce. Adding XML format support will speed up the integration phase and make it cheaper to run and more secure at the same time.

Brief introduction to XML

XML stands for eXtensible Markup Language (similar to HTML). XML serves as a way to store or transport data in separate XML files. Using XML makes it easier for both producers and consumers to produce, receive, and archive any kind of information or data across different hardware or software.

XML documents are formed as element trees. An XML tree starts at a root element and branches from the root to child elements. XML documents must contain one root element that is the parent of all other elements. Moreover, XML elements must have a closing tag "/".
For more information about XML syntax and its usage visit this online guide.

<root>
  <child>
    <subchild>.....</subchild>
  </child>
</root>

For instance, this XML example would be used to store data for a bookstore company.

<bookstore>
  <book category="cooking">
    <title lang="en">Everyday Italian</title>
    <author>Giada De Laurentiis</author>
    <year>2005</year>
    <price>30.00</price>
  </book>
  <book category="children">
    <title lang="en">Harry Potter</title>
    <author>J K. Rowling</author>
    <year>2005</year>
    <price>29.99</price>
  </book>
  <book category="web">
    <title lang="en">Learning XML</title>
    <author>Erik T. Ray</author>
    <year>2003</year>
    <price>39.95</price>
  </book>
</bookstore>

Below you can see the way Bloomreach Engagement would read this XML code.

10981098

XML imports copy the existing functionality of CSV imports. XML does not have a flat structure. However, Bloomreach Engagement supports only flat structures in import (e.g. CSV, database connections), so the structure transformation (flattening) is needed.

19191919

Amazon S3 integration

Amazon S3 is one of the most used cloud file storage. You can connect Amazon S3 file storage with Bloomreach Engagement directly and with the support for data imports only. Importing from Amazon S3 works Amazon S3 REST API, which we call using the HTTP protocol

Enablement

Users can specify the exact region and bucket where Bloomreach Engagement will look for files. In imports, it includes all features - regex file lookup, new file uploaded trigger, and selection rules for imports.

To set up the integration,
Press Data & Assets and select Integrations (if you do not see the button integrations contact your project manager to enable this function). You will be directed to a page where you can press Add new integration in the upper right corner. At this point, you should be able to + add this integration from the list of available integrations.

675675

To complete your integration, you need to generate Access Tokens through Amazon's AWS. Each user will have its access level. This is a configuration within AWS itself. To set up the integration, you need to select either "list objects", "read-only", or "read all" access level. You have to connect those rights to the Access Token, which you use in the AWS integration to ensure that imports from Amazon S3 work correctly.

To find the already set up integration when importing your files:
Simply import your data through the Imports and you can access the Amazon S3 integration by clicking on Select a File storage integration. You can choose Amazon S3 and by clicking on the + sign you will be directed to the same image as the one above this text.

Save your Access Key ID and your Secret Access Key. When you will be importing your data, you can use the already set up Amazon S3 integration.

Use cases

  • One-time imports - Importing historical data, one-time campaign segments
  • Regular imports - Synchronization with internal tools (e.g. CRM)
  • Adform - Exporting customers to custom audiences in Adform (handled automatically by Bloomreach Engagement)

Best practices

When you import events, there are certain things to look out for so that you avoid errors/duplications and also keep your data organized.

🚧

Errors' influence on the import

Only the rows with an error will not be imported. The rest of the file is.

General

  • Split large files into multiple smaller ones and schedule large imports for off-peak times to achieve the best performance and stability. Start importing the oldest events first and in case of an error in the middle of the process, you can remove the newly added information by simply using “events expiration” in the Data manager.
  • It is a good practice to add a static attribute "source" that will tell you what import the data comes from. This will make troubleshooting and orienting your data much easier should there be any discrepancies in the future.
  • To be able to distinguish between old and newly imported events add a new attribute (e.g. import_date=20190715). The total number of events with this attribute should match the number of lines in the imported CSV file.
  • Double-check whether all headers have a unique name, especially when importing the same event from multiple .csv files. Import wizard would treat two headers with the same name as one column and subsequently, within Bloomreach Engagement, as one attribute.
  • Make sure that the customer IDs in the import files are consistent with the IDs of existing customers. Some lists might use automatically-generated IDs while others may use emails.
  • Use test projects/events before doing the actual imports so that you avoid accidental deletion/duplication and the subsequent need for reimporting.
  • Ensure that all purchase_item events have a purchase_id attribute, that is tracked in another purchase event so that more purchase_item events coming from a single purchase can be treated as such.

Timestamp

  • Check the timezone of the timestamp.
  • Transform all the time formats into UNIX timestamp which stands for the number of seconds since 1.1.1970 00:00:00 UTC. Using a UNIX timestamp avoids problems with different time formats where, for example, 02/04/2017 could be both April and February.

Values

  • Check whether commas, dots, and spaces in the values are used correctly. When working with .csv files, one comma can move values to the wrong column. When locating and changing the values, beware that the Excel dropdown filter only lists the first 10 000 rows.
  • Be aware of regional differences with various Excel settings. Excel can read the value "6.1" as price in one language setting while reading it as a date in another. When using the “Text to Columns” function you can you can specify the formatting of each column and set its decimal point to avoid this problem.
  • Lowercase and trim the most important values (like IDs) to avoid wrong identification. To easily achieve this result you can use the TRIM(LOWER(xx)) function in Excel.
  • When combining two sets of events, be sure that the price attribute within the purchase event is in both sets indexed with the same VAT, shipping, and discount setting. Also, check whether both sets include refunded purchases.

Re-import

  • After deleting events, do not start reimporting right away. Events and customers might look deleted in the Bloomreach Engagement app but they might be deleted only from the frontend while the deletion process on the backend might still be in process.
  • Deletion process has low priority on the backend, so it takes a longer time than import.

Technical reference

Maximum size and number of rows

Any import should not have more than 1 million rows. If your import file is bigger, we recommend splitting it into several smaller files.

The size of the file needs to be under 1GB.

Date format and time format supported

The supported date formats and time formats are explained in the Data Manager article.

Deduplication of events

Every imported or tracked event is automatically checked with all events imported/tracked in the last 3 months based on the time of given import/tracking.
Here are the criteria for detecting identical events:

  • events have the same event type
  • events have the same timestamp (to the last decimal place)
  • events have the same event attribute values together with the same attribute value types (e.g. if one attribute is a string, the other has to be a string, too)

Such new identical events will be discarded. Hence, only events with explicitly set event timestamps can be deduplicated. If possible, we recommend to always import/track events with event timestamps because this will allow you to safely re-import or retry tracking of such events in case it is needed (e.g. if the import is partially successful or there was a network problem).

📘

In case you import the same events again (including the same timestamp), but this time with a different trigger option for a scenario, deduplication does not allow the event to actually trigger that scenario.

🚧

Please bear in mind that consent events cannot be deduplicated.

Order of events with identical timestamps

In case some events have identical event timestamps, but different attributes (no deduplication) and events are assigned to the same customer profile, the Bloomreach Engagement application does not guarantee the order of such events.

Therefore, in case the order of events is crucial (e.g. for funnel analytics) for your use case, make sure that event timestamps differ by at least one decimal place. Bloomreach Engagement API should accept timestamp precision up to 6 decimal places.

Automatic trimming and lowercasing of IDs

This feature will trim and lowercase your IDs when they are imported or tracked to Bloomreach Engagement. You must enable it first in settings.

Duplicate identifiers in the import source

Import sources are processed from top to bottom and in the case of customers and catalogs, each row is an update operation for a record found by a given ID.

  • In the case of catalog imports, the order of rows is respected. In other words, if you import a source, which contains lines with duplicate item IDs, but different columns, in general, the last occurrence of the row will execute the final catalog update and therefore that will be the final value.

  • In the case of customer imports: the order of rows is guaranteed as long as they have exactly the same values in all ID columns. Rows using different IDs are processed in an arbitrary order, even if they belong to the same customer.

Delays in the visibility of imported data

Our import system was designed to cope with imports of large data sets in sequential order.
We optimized it for throughput instead of reactivity. For this reason, users can sometimes see delays between seeing a finished import in UI and seeing actually imported data.

Here is a simplified description of how our import system works:

Imports from all projects use the same queue. This means that many large imports in one project can delay imports in other projects. When you create an import, it is first scheduled into the queue and it will start once there is an available worker. In general, this scheduled phase should take only a few seconds, but if there are many parallel imports (e.g. dozens) in other projects, a new import has to wait until the previous ones finish. Once the import starts, in UI you will see an import progress bar, which indicates the loading of your source into our system. Loading phase time depends on the size of the import and in the case of external sources (e.g. databases) also on connectivity. After the import loading phase finishes, it still needs to be further processed by our system. The processing phase is not yet visible in UI. Smaller imports are processed in minutes, but the processing of your import depends on how many other imports are being processed at the same time. After the processing phase finishes, your data is available in the system.

Timeouts when using HTTP(S) as a source

In order to prevent the situation, when the server doesn't respond, but the import is still in progress and therefore blocks the workers' slots, there is a set timeout.
The first part of the timeout is connect timeout and it is set to 30 seconds. It is time to open the socket to the server (TCP/SSL). Therefore, if the server is not available, we drop the import.
The second part of the timeout is read timeout. If the socket was opened, this is the time to get the first bytes of the HTTP response to an HTTP request from a remote web server. It is set to 3600 seconds. If that doesn't happen in the set time, the import is canceled.


Did this page help you?