Unify and manage your data

Set up automated pipelines

Learn how to create a document pipeline so AgentFlow Unstructured can process documents from a configured source by using a published extraction template.

Prerequisites:
  • You need an active Reltio tenant with AgentFlow Unstructured enabled.

  • You need permissions to use AgentFlow Unstructured.

  • You must create templates and configure sources for the pipeline.

A pipeline is configured to use a selected source and a published template to process documents. It also lets you define the following:
  • the crosswalk source, which defaults to the cloud source
  • decide whether to create a data change request before publishing extracted data to Reltio

  • set a schedule for pipeline runs

To set up a pipeline:
  1. In AgentFlow, select AgentFlow Unstructured.
  2. In AgentFlow Unstructured, select Document AI.
  3. Go to the Setup automated pipelines section and select Let's get started. The Create New Pipeline page is displayed.
  4. In Select Source, click the dropdown for the source type you want to use, such as AWS S3 or GCS.
  5. Search and select the source from the list of configured sources.
  6. In the Folder path to be included field, enter the folder path for the pipeline. You can leave this field empty if the documents are stored directly in the bucket.
  7. Select Continue to open the Select template page.
    1. In the Template field, select the published template you want to use for the pipeline.
  8. Select Continue to open the Save & Schedule page.
    1. In the Pipeline Name field, enter a name that identifies the pipeline.
    2. In the Description field, enter a short summary of what the pipeline does.
    3. In the Crosswalk field, select the Reltio crosswalk source. This source context is used when extracted data is pushed to Reltio. By default, this value is set to the document source, such as AWS S3 or Google Cloud Storage, but you can change it if you need to use a different crosswalk when the extracted data is pushed to Reltio.
    4. In Do you want to make this a DCR?, select Yes or No to indicate whether the pipeline should create a Data Change Request (DCR) during processing.
    5. In Do you want to schedule this workflow?, select Yes or No to run the pipeline on a schedule.
      • Select Yes and enter the schedule in the Cron expression field.
  9. Select Save Pipeline.

Result

The pipeline is saved and available for batch processing of documents. For more information, see Review pipeline executions.