Unify and manage your data

Understanding workflows in the Unstructured Data Studio

Learn how the Workflows tab helps you configure, manage, and execute document processing jobs in Unstructured Data Studio.

The Workflows tab is where you define how unstructured documents are ingested, processed, and routed to target system. It provides a visual designer to configure processing logic, a file selection area to preview available documents, and a list of all saved workflows.

You can think of each workflow as a reusable automation pipeline that connects a source (like Google Drive or S3), applies a prompt or configuration, and delivers the result into target systems (for example Reltio) as profiles. Here's a screenshot of all the available workflows:

You can view a list of saved workflows in the above screenshot. You can load an existing workflow or create a new one.

Configure workflows and workflow settings

The Workflow page includes a visual builder that lets you define how documents move through the processing pipeline in the Unstructured Data Studio. It includes three key components — Source, Processing, and Destination — that you can configure individually. Each step controls where the documents come from, how they're analyzed using prompts, and where the extracted information is sent.

Each time you execute a workflow, the system creates a job. For every document processed, one job is generated. If multiple documents match the criteria, the same workflow is executed multiple times and each execution is tracked using its own job ID.

Note: Each workflow can be mapped to only one Reltio entity type. If a single document contains data for multiple entity types (e.g., both Person and Organization), you must define separate workflows for each entity type.

To configure a workflow:

  1. Go to the Workflows tab and select the Create New Workflow option.

  2. In the Workflow settings section, enter a Name for your workflow.

  3. In th Description field, add a description to clarify the purpose of the workflow (optional).

  4. For the Do you want to schedule this workflow? option, select Yes to run the workflow at scheduled intervals else No to manually execute it.

    1. In the Cron Expression field, choose a schedule option, for example Daily, Weekly. Alternatively, select the Create cron option to define a custom cron expression.

    2. In the File Mask field, specify a file name pattern to include only matching files during processing.

    3. In the File Extensions field, enter the file extensions you want to include in this workflow (e.g., .pdf, .docx). This controls which file types are picked from the source.

  5. In the Workflow Map section, select Configured Source.

    1. From the Document Source dropdown, select your source system. For example, AWS or Google drive.

    2. Click Done to save the source configuration.

  6. In the Workflow Map, select the middle box to configure processing logic.

    1. In the LLM Provider dropdown, select a model provider (e.g., Bedrock).

    2. Under Choose a prompt, select a Category, then pick a specific prompt from the list or search for one. For more information about prompts, see topic Prompt library in the Unstructured Data Studio.

    3. Optionally, add Processing Instructions to further guide the model.

    4. Click Done to confirm your processing configuration.

  7. In the Workflow Map , click Destination.

    1. In the Destination dropdown, Reltio is selected by default, as this is the only destination supported currently.

    2. In the Reltio configuration dropdown, select your preconfigured tenant.

    3. Select a Crosswalk Type to define how the entity will be uniquely identified. For example, Reltio or External sources configured in Reltio tenant.

    4. In the Entity type dropdown, select the required entity type.

    5. Select an attribute in the optional Source path attribute field to store a reference to the source document.

    6. Based on the entity type selected, attributes are populated in the Simple attributes and Nested attributes fields. Select or deselect attributes in these fields to be populated from the extracted content.

    7. Select the Enable Reference attributes checkbox if your entity includes these type of attributes, and then select which reference attributes to populate in the Reference attributes field. .

    8. Enter Additional instructions, if required. For example, if you want to override the default crosswalk key.

    9. Select the Create as DCR checkbox if this workflow requires data stewardship approval before sending it to Reltio.

    10. Select the LLM Provider for post-processing, if applicable.

    11. Select Save to confirm the output destination.

The workflow is now fully configured with a source, processing logic, and destination. You can save the workflow and test it.

Load a saved workflow

The Workflow page lists saved workflows from where you can select and reopen one of the saved workflows for further configuration or execution. The Workflow table contains the name of the workflow, its description, timestamps, and an action button to delete a workflow. Loading a workflow will populate its configuration into the visual designer and workflow settings.

To be able to load workflows, create one as described in the Configure workflows and workflow settings section.

  1. In the Workflow page, select a saved workflow to load it. The Edit Workflow page opens with the details of the workflow auto-populated.

  2. In the Workflow Map section, click on the Show files option in Source .

    The View Files page displays a list of available documents from your configured source, such as Google Drive or AWS S3. For each file, you can either simulate and view results to test the extraction of entities without saving to Reltio, or process and save results to execute the workflow, and push the extracted data directly to your Reltio tenant.

  3. Close the View files page.
  4. Make any other required changes in the Edit workflow page.
  5. Select Save workflow to save the changes.