Unify and manage your data

Configure document sources for AgentFlow Unstructured

Learn how to configure document sources so AgentFlow Unstructured can access the files you want to process.

Configure a document source so that AgentFlow Unstructured can access files from a supported storage location for document processing.

Prerequisites:
  • You must have access to AgentFlow Unstructured.

  • A published extraction template must be available.

  • You have the required authentication details for the source you want to configure.

  • For AWS S3, you must have the Role ARN provided by Reltio, the bucket name, and the bucket region. Your AWS administrator must also update the bucket policy to allow that role to access the selected bucket.

  • For Google Cloud Storage, you must have the service account JSON credentials, bucket name, and location details.

To configure a document source
  1. From AgentFlow, select AgentFlow Unstructured.
  2. In AgentFlow Unstructured, select Document AI.
  3. Go to Configure sources and select Let's get started. The Sources page is displayed.
  4. Select + Configure Source to add a new source.
    1. Select a source type: AWS S3 or Google Cloud Storage.
      If you plan to configure an AWS S3 source, update the bucket policy before you continue.
      In your S3, navigate to Buckets > your bucket > Permissions > Bucket policy. Update the bucket policy to allow this role to access the bucket. Replace <role-arn> with the Role ARN provided by Reltio and <bucket-name> with your S3 bucket name.
      {
        "Version": "2012-10-17",
        "Statement": [
          {
            "Sid": "AllowAFUS3Role",
            "Effect": "Allow",
            "Principal": {
              "AWS": [
               "<role-arn>"
              ]
            },
            "Action": [
              "s3:GetObject",
              "s3:PutObject",
              "s3:ListBucket",
              "s3:DeleteObject"
            ],
            "Resource": [
              "arn:aws:s3:::<bucket-name>", "arn:aws:s3:::<bucket-name>/*"
            ]
          }
        ]
      }
    2. In the Source Configuration Name field, enter a unique name for the source.
    3. For AWS S3 source type, confirm the Authentication Type and enter Role ARN, Bucket Region, and Bucket Name.
      Note: The IAM role you enter in Role ARN must have access to the selected AWS S3 bucket.
    4. For Google Cloud Storage storage type, paste the service account JSON credentials into Service Account JSON, enter Bucket Name, and Location.
    5. Select Test Connection to verify that the source is accessible.
    6. Select Save .

Result

The document source is configured. You can now use the source when you create or configure a document pipeline. For more information, see Set up automated pipelines.