Copy Content

Introduction

Goal

Copy content from a production environment (where Content Editors manage content for a live site) to a test environment (where developers develop and test new components, page layouts, etc.).

Background

The ability to copy content from production to test environments allows developers to test and preview new features using exactly the same content these features will encounter when deployed to production.

A process can be implemented using the Content Batch Import and Export APIs. It's advisable, but not a prerequisite, to read the relevant reference documentation prior to going through this tutorial.

This is a sync process, intended to support development and testing, and therefore it should be fast enough to run several times a day, it should be possible to fully automate it without human intervention, and it should support exceptions to avoid overwriting content that is deliberately different in the test environment.

Currently, this process excludes images and assets from our built-in file management features. This will be addressed once APIs are available for those data types. Therefore, this version of the process is targeted towards customers that use an external DAM integration.

In this tutorial, we'll walk you through the process step by step with examples showing how to export and import content between environments in Bloomreach Content.

Prerequisites

Before starting:

The API requests in this tutorial are provided as curl commands.

👍

The commands use the channel ID my-channel, replace it with your channel ID if it's different. You will also need to substitute your API token in the x-auth-token header.

You are free to use any other tool of your choice to interact with the APIs. An alternative we do maintain and recommend is Postman, see our collections for Import and Export.

Step by Step Tutorial

1. Export content

As per the official Content Batch Export API documentation, an export is requested for a particular path (attribute sourcePath), branch (branch, defaults to core) and content types (dataTypes). The request is executed against an environment, which here is the source environment, and using an authentication token.

Execute the following POST request to the Request an export endpoint:

curl POST \
     --url https://developers.bloomreach.io/management/content-export/v1/ \
     --header 'accept: application/json' \
     --header 'content-type: application/json' \
     --header 'x-auth-token: 242f2d6b-65a2-4b13-ba13-e0f14a74b2ab' \
     --data '
{
     "sourcePath": "/content/documents/my-channel"
}
'

You should receive a 201 response with a body similar to the following:

{
   "operationId":"06ba4cda-3b7e-41d1-8b1f-3ee2ad06c5f5",
   "status":"STARTING",
   "exitMessage":"",
   "readCount":0,
   "writeCount":0,
   "skipCount":0,
   "startTime":null,
   "endTime":null,
   "errorLog":[],
   "fileSize":0,
   "sourcePath":"/content/documents/my-channel",
   "branch":"core",
   "modifiedAfter":null,
   "dataTypes":["document","page","resourcebundle","folder"]
}

The operationId must now be used to request for the status of this export. The export is an asynchronous process as it may take a considerable amount of time to export large content trees. To obtain operation status information, execute the following request to the Get operation details endpoint:

curl --request GET \
     --url https://developers.bloomreach.io/management/content-export/v1/operations/06ba4cda-3b7e-41d1-8b1f-3ee2ad06c5f5 \
     --header 'accept: application/json' \
     --header 'x-auth-token: 242f2d6b-65a2-4b13-ba13-e0f14a74b2ab'

A 200 response is expected with operation status, which can look like this:

{
    "operationId":"06ba4cda-3b7e-41d1-8b1f-3ee2ad06c5f5",
    "status":"COMPLETED",
    "exitMessage":"",
    "readCount":64,
    "writeCount":63,
    "skipCount":1,
    "startTime":"2023-02-27@08:58:57.790+0000",
    "endTime":"2023-02-27@08:58:57.937+0000",
    "errorLog":[
        {
            "path":"/content/documents/my-channel/pages/someproblematicpage",
            "error":"A detaled error of why this page couldn't be exported"
        }
    ],
    "fileSize":24163,
    "sourcePath":"/content/documents/my-channel",
    "branch":"core",
    "modifiedAfter":null,
    "dataTypes":["document","page","resourcebundle","folder"]
}

When the status is COMPLETED, the operationId can be used to download the exported content as a zipped ndjson file. The following request to the Download the files endpoint accomplishes that:

curl --request GET \
     --url https://developers.bloomreach.io/management/content-export/v1/operations/06ba4cda-3b7e-41d1-8b1f-3ee2ad06c5f5/files \
     --header 'accept: application/octet-stream' \
     --header 'x-auth-token: 242f2d6b-65a2-4b13-ba13-e0f14a74b2ab' \
     --output my_exported_content.zip

A 200 response is expected, returning a zip file containing the exported content, in JSON lines format.

A few prerequisites exist for a successful export, specifically for exporting page folders and pages:

  • the sourcePath must be, or be under, a content root folder that is associated with a channel
  • this channel must have been added to the project we're exporting for (if that project is core, no action is required as all channels are by default added to core)

Unzip the downloaded file and examine the ndjson file. The exported payload is project agnostic. That means that the same payload could be used for imports of this content into different projects. The exported payload is not channel agnostic however, meaning that references to a channel exist inside the payload. In any environment you wish to import this payload to, you must make sure this dependency is met, i.e. a channel with this name exists (and more prerequisites, covered in next step).

In our export request we did not use the attribute modifiedAfter. This attribute is a timestamp and if specified during export, then only documents modified after that date will be exported. Consequently, this can be used to set up an end-to-end, automated and scheduled process for migrating content from one environment to another. The process could be as follows:

  • Do an initial migration (export-import) from source to target env and record and store the date of this migration. This is the "latest migration date"
  • Schedule a migration, where the "latest migration date" will be used as the value to attribute modifiedAfter
  • When the migration runs, if successful, update the "latest migration date". Go to step 2

2. Import content

The ndjson file that was exported in the previous step, will now be imported into a target environment. For this tutorial, the target environment will be the same environment we exported from, but in a real-life scenario that would not be the case.

Additionally, while in previous step we exported from core, for the import we will use a different project, as imports are not allowed to occur against core. In a real-life scenario, the project id in the target environment (the project we're importing to) is always different than the project id we have in source, even if the project name is the same. This is because a project id is auto generated in each environment

The project is specified in the URL of the import operation, as the import payloads are project agnostic. To execute the import, use the following POST request to the Create content endpoint, uploading the previously saved ndjson (replace the filename in the payload below):

curl --request POST \
     --url https://developers.bloomreach.io/management/content-import/v1/project/vTnTj \
     --header 'accept: application/json' \
     --header 'content-type: multipart/form-data' \
     --header 'x-auth-token: 242f2d6b-65a2-4b13-ba13-e0f14a74b2ab' \
     --form file=@my_exported_content.ndjson

A 201 response is expected, with a body similar to the following:

{
    "operationId": "3496f615-c489-4351-a358-26cec3f0f94b",
    "projectId": "vTnTj",
    "status": "STARTING",
    "exitMessage": "",
    "readCount": 0,
    "writeCount": 0,
    "skipCount": 0,
    "startTime": null,
    "endTime": null,
    "errorLog": []
}

Use the operationId with a request to the Get operation details endpoint, substituting in your import operation's ID:

curl --request GET \
     --url https://developers.bloomreach.io/management/content-import/v1/operations/3496f615-c489-4351-a358-26cec3f0f94b \
     --header 'accept: application/json' \
     --header 'x-auth-token: 242f2d6b-65a2-4b13-ba13-e0f14a74b2ab'

You should receive a response similar to the one below. If there were any errors, they are listed in the errorLog array. Please review the error messages in detail, since they may indicate individual content items that were not successfully imported, even if the overall status of the job is COMPLETED.

Error messages should include more specific context about the problem, such as the message below. This indicates a problem with a reference to a component definition that does not exist in the target environment, channel, and project.

{
  "operationId": "3496f615-c489-4351-a358-26cec3f0f94b",
  "projectId": "vTnTj",
  "status": "COMPLETED",
  "readCount": 63,
  "writeCount": 62,
  "skipCount": 1,
  "startTime": "2022-03-21@10:47:00.448+0000",
  "endTime": "2022-03-21@10:47:08.632+0000",
  "errorLog": [
    {
      "path": "pages/home",
      "error": "Item not found; nested exception is javax.jcr.ItemNotFoundException: Component definition 'hst:components/sample/single-banner-carouselZ' not found"
    }
  ]
}

You can check the imported content in the UI, using the Projects app: preview the imported pages and documents in the development project and approve them so that a merge can be done. Merging a project requires the Site Admin role to be assigned to a specific user, since merging the project will result in changes to the published version of the content in that environment.

By using an API token created by a user with the Site Admin role, you can also use the Projects API to approve all changes and merge a project in a single API request. This is most useful for cases where you wish to fully automate a scheduled process to update content in a test environment, where specific approvals for individual content items are not required.

A number of prerequisites exist for a successful import:

  • The project to import to must be a development project, see Appendix I on how to create one.
  • All content types used in the payload must exist in the target environment (in core or in the project we’re importing to). Consequently, any entities a content type may depend on must also exist. These for example can be: taxonomies (via taxonomy fields), resource bundles (via SelectableString or Boolean fields), and Field extensions (via FieldExtension fields). Preferably, use the Content Type Management API to do this.
  • All content types referenced in the payload must not be disabled.
  • The channel referenced in the payload must be associated with the project we’re importing to.
  • An Experience manager channel preview of the channel referenced in the payload must exist. If the channel was created from a channel template, this is always the case. Otherwise, the easiest way to initialize the preview is by opening the channel in the Experience manager at least once.
  • All referenced folders/documents (via link or richtext fields) must either be present in the target environment or be contained within the payload being imported. Otherwise, warnings (but not errors) are issued in the job status report.
  • If an editor is editing a document in Content SaaS and at the same time an import is attempted for that document, the import will not succeed (for this item) and an error will be recorded in the operation status report.

Appendix I. Create a Development Project

The content must be imported into a specific development project. This constraint is to guarantee that all operations are performed without having any chances of affecting the live website, since all write operations are performed on the unpublished variant of a page or document associated with the development project.

Therefore the first step is to create a development project and add your channel to the project. You can do this using the Projects app in the UI or using the Management APIs.

To create a new development project, use a POST request to the Project Management API's Project endpoint:

POST https://<your-content-environment>.bloomreach.io/management/projects/v1/

Use the following JSON payload:

{
    "name": "content-import-project",
    "includeContentTypes": false,
    "description": "Content Import Project"
}

You should receive a 201 response with a body similar to the following:

{
    "id": "vTnTj",
    "name": "content-import-project",
    "includeContentTypes": false,
    "description": "Content Import Project",
    "state": {
        "status": "IN_PROGRESS",
        "message": "In progress",
        "errors": null,
        "availableActions": []
    },
    "items": null,
    "system": {
        "createdBy": "[email protected]",
        "createdAt": "2023-02-13T19:32:44.968+01:00",
        "updatedBy": "[email protected]",
        "updatedAt": "2023-02-13T19:32:44.968+01:00",
        "mergedBy": null,
        "mergedAt": null
    }
}

Note the project ID (vTnTj in the example above), you'll need it in the next steps.

Now add your channel to the project using the Site Management API's Channels endpoint:

POST https://<your-content-environment>.bloomreach.io/management/site/v1/channels/

Use the following JSON payload, substituting in the IDs of your project and your channel:

{
    "branch": "vTnTj",
    "branchOf": "my-channel"
}

You'll need the project ID for the actual content import. If you used to UI to create the project and add your channel, you can retrieve the project ID (for example vTnTj) in the Projects app, the Site development app, or the Site Management API.

In the latter case, use the endpoint Get all channels.

{
  ...,
  "branch": "vTnTj",
  ...
}

The channel's branch property contains the project ID.

Appendix II. Aborting an Import Attempt

If something goes wrong during the batch import process, there is the option of deleting the project before it's merged and starting over from scratch in a new development project. Imported data related to pages, documents, and resource bundles is contained within the dev project, so this will result in all such data being left in a "clean" state as if the import was not attempted. This might be appropriate when testing an automated process or an integration with a Continuous Integration server.

One exception to the above rule is that changes to folders take effect immediately, so an import attempt that is aborted may still result in changes to, for example, the allowed document types or the locale of folders included in the import payload.