Reindex Data Task
Learn more about how to reindex entity data in a tenant and control the scope, execution mode, and follow-up processing for this background task.
The Reindex Data task refreshes the Elasticsearch index for a tenant's entity data. Use it when existing tenant data must be rebuilt into search structures after specific configuration changes or other maintenance scenarios.
When to run this task
Reindexing should not be a scheduled or routine task. Reltio automatically maintains Elasticsearch indexes during normal operations, such as:
- Loading or updating entities
- Creating relationships or interactions
Only reindex your data when the following changes occur, and set enableSeparateIndexing=true to ensure complete index refresh:
- Metadata Configuration/L3 changes such as:
- Existing attribute changes (type, label)
- Existing attribute is removed
- Existing source is removed
- Survivorship rules are modified
- Cleanse configuration is modified
- Lookups (RDM) mappings are modified
- MatchFieldUris are modified
- Surrogate crosswalk settings are modified
- Reference attribute settings are modified
- Sub-nested attributes are added or removed
- Tenant physical configuration changes in:
- survivorshipAdvancedBehavior
- indexOvStrategy
HTTP method and endpoint
Use the following HTTP method and endpoint to start the task:
Administrator role is required:
POST {ApplicationURL}/reindex?tenantId={tenantId}
Tenant admin role is required:
POST {ApplicationURL}/api/{tenantId}/reindex
Request headers
Include the following headers when you run the task in file mode.
| Name | Required | Description |
|---|---|---|
awsAccessKey | No | S3 access key for the Amazon bucket. |
awsSecretKey | No | S3 secret key for the Amazon bucket. |
googleCredentials | No | Google credentials used for file-based input. |
Query parameters
The following table describes the supported query parameters.
| Parameter | Required | Description |
|---|---|---|
tenantId | Yes | ID of the tenant to reindex entities. |
updateEntities | No | If set to true (default), in addition to refreshing the Elasticsearch index, the task also updates history, match tables, and the analytics layer (RI). If set to false, the task updates Elasticsearch data only. It does not perform rematching or update history or analytics. |
entityType | No | If provided, restricts the reindexing scope to the specified entity type or entity types. Supports multiple values as comma-separated items, for example entityType=Person,Organization. |
skipEntitiesCount | No | If provided, sets the number of entities skipped during reindexing. Default value: 0. |
entitiesLimit | No | If provided, sets the maximum number of entities reindexed. Default value: infinity. |
updatedSince | No | Timestamp in Unix format. If this parameter is provided, only entities with a greater or equal timestamp are reindexed. Note: The task must still scan the entire database even when this parameter limits the number of indexed entities. As a result, the task execution time will generally remain similar to running it without this parameter. |
checkCrosswalksConsistency | No | Specify true to reindex each entity, whether it has changed or not. Reltio does not recommend this option because it can decrease performance dramatically and overload the server. Default value: false. |
distributed | No |
If set to
true, the task runs in distributed mode. Default value is false. For more information, see Distributed mode.Note: For large tenants, set
distributed=true and configure taskPartsCount to match the number of available API nodes. This enables parallel execution across nodes and can improve task performance. |
taskPartsCount | No |
Specifies the maximum number of sub-tasks for distributed execution. The platform determines the optimal number based on performance limits. Default value is
2. Note: This parameter is only applicable when distributed=true. Otherwise, it s ignored. |
forceIgnoreInStreaming | No | If set to true, only events produced by the task are ignored in streaming. Default value: false. Note: When you set this parameter to true, events are generated but not streamed to external queues. The generated events are still used by the internal queue to rebuild the index. For more information about internal and external queues, see Queues at a glance. |
enableSeparateIndexing | No |
If set to Note: When
enableSeparateIndexing=true, PotentialMatchesReindexTask is automatically triggered upon completion of the entity reindex because the new index must contain potential matches data before the tenant switches to the new index..Default value: |
bucket | No | Bucket name. |
s3Region | No | AWS S3 region. |
Request body
The request body is optional.
Use a JSON array of entity URIs to reindex only the specified entities, or use a JSON array of file names that contain entity URIs when you provide awsAccessKey and awsSecretKey or googleCredentials.
The following example shows the file format for a JSON array of entity URIs:
["entities/aaaaaaa", "entities/bbbbbbb", "entities/ccccccc"]
Example request
The following example shows how to run the task in file mode:
POST {ApplicationURL}/api/{tenantId}/reindex?bucket={bucket}
Headers:
Authorization: Bearer {accessToken}
awsAccessKey: {awsAccessKey}
awsSecretKey: {awsSecretKey}
Body:
["path/file01.json", "path/file02.json", "path/file03.json"]
Reindexing and streaming
The Reindex Data task rebuilds the Elasticsearch index for tenant entities and, depending on your configuration, updates history, match tables, and analytics data.
Because this task can process a large volume of data, system load and event volume can increase significantly. Plan the task to minimize disruption to search, matching, analytics processing, and streaming.
To control task scope and reduce processing overhead, you can:
- Reindex only Elasticsearch data by setting
updateEntities=false. In this mode, the task does not update history, match tables, or analytics data. - Restrict the task to specific entity types with entityType.
- Skip a number of entities with
skipEntitiesCount. - Limit the number of processed entities with
entitiesLimit. - Run the task in distributed mode with
distributed=trueandtaskPartsCount. - Run the task only for selected entities by passing entity URIs in the request body.
- Prevent task-generated events from being sent to external queues by setting
forceIgnoreInStreaming=true.
PotentialMatchesReindexTask is triggered automatically only when the Reindex Data task creates a new index, such as when enableSeparateIndexing=true is specified. For all other reindex scenarios, run PotentialMatchesReindexTask separately if needed. See Potential Matches Reindex Task.
Do not update the tenant business configuration while the Reindex Data task is running.
The reported numberOfProcessedObjects can vary between task executions because entities may be added to or deleted from the tenant during processing. When the task runs in distributed mode, add the numberOfProcessedObjects values from all child tasks to calculate the total.
For reindexing analytics attributes, run the Reindex Analytics Attributes Task task.