Prompt samples for Profiler
Learn how to interact with the Data Profiler agent using effective prompts.
What is it?
The Profiler agent is a pre-ingestion data quality assistant for Reltio. It automatically scans and validates incoming source data from cloud file sources such as AWS S3, Google Cloud Storage, Azure Blob Storage, and SFTP, then aligns it with your Reltio tenant schema before ingestion.
This agent computes metrics such as completeness, uniqueness, and validity, identifies mismatches, missing fields, and pattern deviations, and uses tenant metadata to map source columns to entity attributes. By validating data against your schema and preparing load-ready mappings for Reltio Data Loader, it helps prevent ingestion failures, reduces rework, and accelerates data onboarding across clouds. The agent runs within the AgentFlow framework in your Reltio environment and is detection-only: it analyzes and reports data quality issues but does not modify source data.
For more information, see Profiler.
Start a data quality check
✅ Prompt: Check data quality for "credentials": {"role": "arn:aws:iam::<account-id>:role/reltio.client.<cusomer-role-suffix>", "externalId": "<customer defined value at the role>", "region": "us-east-1"} s3://<your-bucket-name>/<file-name>.csv
Why it works: The phrase Check data quality matches one of the Profiler Agent’s core verbs (profile, validate, check) and triggers the Profile stage in the Connect → Profile → Map & Validate → Load & Monitor lifecycle. Including both credentials and the explicit S3 file path provides all required metadata for the agent’s connector logic.
✅ Prompt: I need to analyze a customer file in our S3 bucket called dataloader-test. The file is people-1000.csv. Check its data quality before we load it.
Why it works: Even in more conversational language, the prompt still provides the bucket name, file name, and intent (quality check before load). Profiler can resolve or construct the full S3 URI and start a profiling workspace.
✅ Prompt: We are onboarding a new vendor data feed from Azure. The file is at https://mystorageaccount.blob.core.windows.net/container/vendor-data.csv. Validate its quality before we load it into Reltio.
Why it works: This combines business context (new vendor feed), a concrete Azure Blob URL, and a clear pre-ingestion validation goal. Profiler treats this as a new workspace: connect, profile, then prepare mapping and load options.
⚠️ Prompt: Check the quality of my file.
What did not work: Profiler cannot start a quality check without knowing where the file is. It needs a file path such as s3://bucket/path/file.csv (or equivalent for Azure, GCP, or SFTP) to access and analyze it.
Provide credentials in a structured way
The Connect step in the Profiler Agent lifecycle depends on valid cloud credentials. Whether you use AWS, Azure, GCP, or SFTP, provide credentials in a structured, key–value format so the agent can authenticate and start a profiling workspace without guesswork.
✅ Prompt: Here are my AWS credentials: Role ARN: aarn:aws:iam::<account-id>:role/<role-name>, External ID: dataquality, Region: us-east-1.
Why it works: The credentials are complete, organized, and use the expected key names. The agent can map these values directly to its connector parameters and attempt a secure connection.
✅ Prompt:
Role: arn:aws:iam::<account-id>:role/reltio.client.<role-name>,
External ID: <external-id>,
Region: <aws-region>.
Use these credentials to connect to s3://<your-bucket-name>/<file-name>.csv and profile the file.
Why it works: A simple key–value list plus a concrete bucket and file name is still easy for the agent to parse. It can use the credentials to authenticate and then start profiling the specified file.
⚠️ Prompt: I have AWS access.
What did not work: This does not provide role, external ID, or region. Profiler must ask follow-up questions before it can read the file or start a workspace.
Review schema and validation rules
✅ Prompt: Yes, the schema looks correct. Please proceed with the analysis.
Why it works: After Profiler previews the file and proposes column types, this gives a clear approval signal to move from structure detection into full quality checks.
✅ Prompt: Change column 3 from STRING to DATE with format yyyy-MM-dd, and add email pattern validation to column 5.
Why it works: The prompt references specific columns and desired changes: a concrete type change plus pattern validation. Profiler can adjust its validation rules before running the quality check so that final metrics reflect the updated expectations.
⚠️ Prompt: Some of those types do not look right.
What did not work: The agent knows there is a concern but does not know which columns or what types you expect. It must ask for more detail before it can update schema or validation rules.
Explore invalid values and quality metrics
After Profiler completes a quality analysis, you can query the cached profiling results to understand column quality and invalid values.
✅ Prompt: Show me invalid values in the Email column.
Why it works: The agent has already built a structured metadata table with column indices and quality metrics in the first profiling run. It does not re-scan the file but queries cached profiling results. Beyond listing invalid values, it applies pre-built email regex patterns and groups failures such as missing @, truncated domains, typos, and corrupted values.
✅ Prompt: Show me invalid values for both the Phone and Email columns.
Why it works: The prompt explicitly lists multiple columns, so the agent can retrieve invalid values for both columns from the same profiling workspace instead of guessing which fields you care about.
✅ Prompt: Can you explain why the Phone column has low quality?
Why it works: The agent retrieves the Phone column’s quality score and decomposes it using its metric model (for example, fill rate, uniqueness, and validation rate). Because it already knows these components from the earlier quality summary, it can explain the numeric result and the formula behind it.
✅ Prompt: Show me the first 100 invalid phone numbers.
Why it works: The agent already has profiling results with invalid-value counts per column. It can query the cached dataset for the Phone column and return a sample of the first 100 invalid values instead of rescanning the file.
✅ Prompt: What are the invalid values in column 4?
Why it works: Column indices are part of the profiling metadata. When you reference column 4, Profiler can map that index to the correct column in the cached results and return the invalid values for that position.
✅ Prompt: View invalid Email column values to understand those entries.
Why it works: The agent uses the schema and invalid-value cache to pull only the corrupted Email entries, not the full dataset. It can group similar patterns (such as character insertions or substitutions) to infer likely root causes such as keyboard mashing, encoding problems, or ETL errors.
⚠️ Prompt: Show me invalid values for the Email column.
What did not work: If no quality analysis has been performed yet, there is no profiling workspace or cached invalid values. The get_column_invalid_values tool cannot function standalone; it only works after a successful quality analysis produces results.
Adjust validation rules and re-profile
✅ Prompt: Re-run quality check with a more flexible phone pattern, or consider adjusting Phone validation to accept common formats with extensions and punctuation.
Why it works: This prompt triggers a schema update for the Phone validation rule. The agent runs a new profiling job in a new workspace and retrieves the updated results. It reruns the scoring function and shows the percentage improvement in the Phone column's quality score, demonstrating how more flexible validation rules directly affect data quality metrics.
⚠️ Prompt: Use pattern matching to fix the corrupted values in Email column.
What did not work: Profiler is designed purely for analysis and reporting, not data modification. Its tools (such as reading file content, starting workspace jobs, and retrieving invalid values) only read or analyze datasets. It does not fix or write corrected values back to the source.
Plan and run data loads
After profiling and mapping verification, Profiler can help you set up and execute data loads into Reltio while reusing tenant metadata and mapping context.
✅ Prompt: Proceed with Reltio data load with transformation rules.
Why it works: This prompt falls within the Profiler allowed operations: setting up a data load configuration, retrieving tenant metadata (entity types, relation types, sources), and preparing mapping context. Once you confirm which Reltio object type the data represents, the agent fetches the right schema and generates a mapping plan and transformation logic.
⚠️ Prompt: Load my data into Reltio.
What did not work: The agent cannot initiate a load without knowing the data source location and the type of object it represents. It needs at least a file path and a target entity or object type to create a valid data load job.
✅ Prompt: Load data without checking quality first: "credentials": {"role": "arn:aws:iam::<account-id>:role/reltio.client.<role-name>", "externalId": "reltio-profiler", "region": "us-east-1"}
Why it works: The prompt includes both key prerequisites: a valid AWS S3 file path and IAM role credentials, so the agent can securely access the source file without requesting additional details. The workflow follows Reltio’s full ingestion sequence, where the agent verifies file access and previews columns, retrieves tenant metadata and entity types, maps CSV columns to Reltio attributes, and creates and executes a valid data load job.
Advanced prompting patterns
Advanced prompts typically span the full Connect → Profile → Map & Validate → Load & Monitor lifecycle. Use conditional instructions, batch queries, and references to earlier reports to steer the agent without restarting from scratch.
✅ Prompt: Show me invalid values for these columns: Email, Phone, PostalCode.
Why it works: Grouping multiple columns into one request reduces back-and-forth. The agent can pull invalid values for each named column from the same profiling workspace.
✅ Prompt: First, check the quality. Then, if quality is acceptable, show me invalid values for the top three worst-performing columns. Finally, if the remaining issues are manageable, proceed with the load.
Why it works: You define a clear sequence: profile, triage, then load. Profiler can execute the steps in order, ranking columns by quality metrics to identify the worst performers. The agent still does not fix values; you use its findings to decide when a load is acceptable.
✅ Prompt: Based on the quality report you just showed, focus on the Email column, which had 45 invalid values, and show me examples.
Why it works: Referencing the previous quality report makes it clear which workspace and column to reuse. Profiler can drill down into the existing profiling results for that column instead of starting a new analysis.
Best practices
- Always provide a concrete file location (for example, s3://bucket/path/file.csv) and, when required, cloud credentials so Profiler can connect and start a profiling workspace.
- Run a quality check before asking for invalid values or detailed quality explanations. Prompts such as “Show me invalid values in the Email column” only work after a completed profiling run.
- Use exact column names (such as Email, Phone) or explicit indices when requesting invalid values or quality explanations so the agent can map requests to its metadata table.
- Treat Profiler as a pre-ingestion, detection-only agent. It analyzes and reports issues but does not correct source data; apply fixes in upstream systems or during transformation.
- When you need to refine validation rules (for example, phone number formats), ask the agent to re-run profiling with adjusted patterns and compare the quality scores to see the impact.
- For data loading, always pair your load request with a clear source location and target object type so the agent can retrieve tenant metadata and build a valid job definition.