Databricks
This guide introduces the Databricks integration with Bloomreach.
Integration overview
The Databricks native integration connects your Databricks data lakehouse directly to Bloomreach, enabling automated data imports without manual file transfers or third-party middleware.
This integration supports importing customers, events, and catalogs with automatic synchronization as often as every 15 minutes.
Why use Databricks integration
-
Automated data imports: Set up scheduled, near real-time data transfers from Databricks to Bloomreach. Your campaigns and analytics always use up-to-date information without manual intervention.
-
Reduced complexity: Eliminate time-consuming and error-prone SFTP file transfers or custom connectors. The native integration handles data movement automatically.
-
Lower integration costs: Avoid extra expenses for middleware or manual processes. Reduce setup time and ongoing maintenance requirements.
-
Better support for AI features: Ensure Bloomreach's AI-powered tools, such as Loomi AI, have access to the latest customer and product data for more accurate insights and automation.
-
Enterprise-ready: The integration supports organizations already using Databricks or migrating to it, accommodating advanced analytics and marketing use cases.
How Databricks integration works
The integration connects to Databricks and requires a one-time connection setup. Once configured, it can import from tables, views, and user-defined query results.
When importing from tables with change tracking enabled, the system automatically imports subsequent changes from Databricks as often as every 15 minutes to keep your data current.
Prerequisites
Access requirements
-
A Databricks service principal (Client ID and Client Secret) with access to the relevant catalog, schema, and tables. Account admin privileges may be needed to create the service principal.
-
Bloomreach account enabled with Data hub module. Contact your Customer Success Manager to enable Databricks support for catalog imports in your project.
Data format requirements
Bloomreach has flexible data format requirements. During import setup, you map the source data format to the relevant data structure.
The following columns must be present in source tables or views:
| Data category | Required columns | Optional columns |
|---|---|---|
| Customers | ID to be mapped to the customer ID in Bloomreach (typically registered) | Timestamp to be mapped to update_timestamp |
| Events | ID to be mapped to the customer ID in Bloomreach (typically registered); Timestamp | |
| Catalog | ID to be mapped to the catalog item ID in Bloomreach (typically item_id) |
Supported attribute data types
-
Text
-
Long text
-
Number
-
Boolean
-
Date
-
Datetime
-
List
-
URL
-
JSON
Delta update requirements
Delta updates are supported for tables only. They aren't supported for views.
This feature requires delta.enableChangeDataFeed to be enabled on the source table in Databricks.
Enable the Change Data Feed using the following command:
ALTER TABLE myDeltaTable SET TBLPROPERTIES (delta.enableChangeDataFeed = true)
For more details, see the Change Data Feed documentation by Databricks.
Set up Databricks integration
-
Go to Data & Assets > Integrations and click + Add new integration in the top right corner.
-
In the Available integrations dialog, enter "Databricks" in the search box.
-
Configure your Databricks connection by filling in the following fields (tooltips guide you with the correct format):
-
Server hostname
-
Port number
-
HTTP path
-
Client ID
-
Client secret
-
Catalog (optional)
-
Schema (required if you specify a catalog)
-
-
Once you fill in all mandatory fields, click Save integration.
Important
Removing the Databricks data source integration from a project won't delete any data already imported from Databricks. However, this cancels any future delta updates using the integration.
Import data from Databricks
The import process follows the same general steps for all data types, with specific configuration differences outlined below.
Import process
-
Navigate to Data & Assets > Imports and click + New import.
-
Select your data type (Customers, Events, or Catalog) and complete any type-specific selections.
-
Enter a name for the import (for example, Databricks customers import).
-
On the Database tab, in the SQL Integration dropdown, select the Databricks integration.
-
Select Table, then select the table from available tables/views, or Query to write a custom SQL query, to import from in the Source Table dropdown.
-
Click Preview data to verify the data source is working, then click Next.
-
Map your ID to the matching column in the Databricks table using drag and drop, then click Next.
-
Configure your schedule execution and click Finish to start the import.
Data-specific configurations
Customers
-
Select Customers.
-
Map your customer ID (typically ID registered) to the matching column in the Databricks table.
- Customer update timestamp: As a best practice, add an extra column to your customer table with a timestamp indicating when customer properties were last updated. Set the timestamp to the current time on every change of customer data that gets imported to Bloomreach. Map this column to
update_timestampwhen setting up the import. This prevents the delta update from overwriting customer property values that were tracked in Bloomreach since the previous import.
- Customer update timestamp: As a best practice, add an extra column to your customer table with a timestamp indicating when customer properties were last updated. Set the timestamp to the current time on every change of customer data that gets imported to Bloomreach. Map this column to
-
Schedule options
- Single import: A one-off import of all records.
- Repeated import: A scheduled, recurring import of all records.
- Sync updates: A scheduled, recurring delta import of changes since the previous import.
-
Schedule frequency: Every 15, 30, or 45 minutes, 1 hour, or 2 hours, with an optional time range (start and end dates).
Important
Deleting imported customer profiles in a project won't trigger the recreation of those customer profiles during the next delta update unless the source record has changed.
Events
-
Select Events and select or enter the event type to import (for example, view), then click Next.
-
Map your customer ID (typically ID registered) to the matching column in the Databricks table.
-
Schedule options: Same as customers (Single import, Repeated import, Sync updates).
-
Schedule frequency: Same as customers.
Use Sync updates to keep both platforms synchronized. This approach generates computation costs on your side, so use it only for the most critical data. Use Single imports for fixed data that doesn't change and doesn't need regular imports.
Important
- You can import one event type per import (for example,
view). To import multiple event types, set up a separate import for each type.- Events in Bloomreach are unchangeable. Delta updates import new events, but won't update previously imported events.
Catalogs
Configure product data Databricks imports via Data hub imports. For general catalogs, follow the steps in Create and manage Data hub catalogs.
Important
If you delete imported catalog items within a project, they aren’t automatically recreated during the subsequent delta update unless you modify the corresponding source record.
Export data to Databricks
Exports can be done in scheduled mode only.
-
Export your data from Bloomreach using the Exports to Google Cloud Storage (GCS) option.
-
Keep your data in a GCS-based data lake.
-
If you need to import data from files into Databricks database tables, use LOAD. For more details, see the overview article by Databricks.
-
Trigger automatic loading of files from Engagement by calling the public REST API endpoints. For more details, or to bulk load data from Google Cloud Storage, see this article by Databricks.
Delete data in Bloomreach
Using the API, you can anonymize customers individually or in bulk. To delete customers, mark them with an attribute to be filtered out and delete them manually in the UI.
You can also delete events by filtering only in the UI. Catalog items can be deleted using the Delete catalog item API endpoint.
Example use cases
-
One-time imports: Importing purchase history for historical analysis and segmentation.
-
Regular delta imports: One-way synchronization of customer attributes to ensure marketing campaigns use the most current customer data.
Limitations
The Databricks integration's delta updates don't support "delete" database operations. If you delete a record in Databricks that was previously imported into Bloomreach, that record won't be deleted in Bloomreach on the next sync.
Support resources
For additional guidance on importing data to Bloomreach, refer to the following documentation:
Updated 8 days ago
