Unify and manage your data

Configure the GBQ compaction scheduler

Learn how to set compaction thresholds, timing, and enablement flags for automatic deduplication in BigQuery.

You must have ROLE_ADMIN_TENANT or equivalent permission to edit tenant configuration and environment properties.
Use this task to configure when and how the GBQ compaction scheduler evaluates tenant data for duplicate removal.
To configure the GBQ compaction scheduler:
  1. Access your tenant's configuration (L3).
  2. In the dataPipelineConfig.bigqueryAdapterConfig section, set compaction parameters.
    
    "dataPipelineConfig": {
      "bigqueryAdapterConfig": {
        "compactionThreshold": 0.8,
        "hoursToCompaction": 168
      }
    }
            
    Use lower values to make compaction more aggressive. The default threshold is 0.8 and default interval is 168 hours (7 days).
  3. In your environment configuration, define the scheduler cron expression.
    
    datapipeline.gbq.compaction.cron.expression=0 1 * * * *
            
    In this example, the scheduler runs daily at 01:00 UTC.
  4. To disable the scheduler for specific tenants, set the following environment property:
    
    com.reltio.datapipeline.gbq.compaction.enabled=false
            
    If the property is missing or set to true, the scheduler runs as expected.
  5. Save and deploy your changes.
The scheduler will apply your configuration changes during the next scheduled run.