Profiler
Learn about the way in which the Profiler Agent helps you assess source data quality before loading it into Reltio and optionally guides you through loading validated data using Data Loader.
The Profiler agent is a conversational assistant that runs in AgentFlowTM, Reltio's agent execution environment. It evaluates structured source files for schema alignment and data quality before loading, minimizing failed loads and manual preparation.
The agent connects to structured files in remote locations, such as cloud storage or Secure File Transfer Protocol (SFTP) servers. It previews the file structure, computes column-level quality scores, and identifies issues with ranked suggestions. You review and confirm the schema and mappings before continuing. If the data meets your standards, you can use the same workflow to load it into Reltio using Data Loader.
The Profiler agent is designed for the following business and technical users who assess and prepare source data quality before loading it into Reltio:
Data Steward
Reltio Configurator
For more information about each of these user roles, see About roles.
Capabilities
- Connect to structured data files stored in remote locations, including AWS S3, Azure Blob Storage, Google Cloud Storage, and Secure File Transfer Protocol (SFTP) servers.
- Validate access credentials and confirm file availability before profiling begins.
- Preview the file to detect delimiters, headers, and column types before schema confirmation.
- Infer column-level data types such as string, number, date, and boolean.
- Compute data quality metrics per column, including completeness, uniqueness, validity, and structural consistency.
- Identify missing values, invalid formats, and pattern deviations grouped by severity.
- Cache profiling results to support follow-up queries on specific columns, values, or issues without rerunning analysis.
- Generate and present sample invalid values (up to 100 per column) to support root-cause investigation.
- Align source fields to tenant schema using similarity-based mapping and attribute metadata from the Reltio tenant configuration.
- Create and validate Data Loader mappings based on profiling output and user-confirmed schema alignment.
- Generate Data Loader jobs to ingest validated data into Reltio, with support for load monitoring and error reporting.
- Run all profiling and loading steps asynchronously with job tracking and workspace IDs for context.
- Supports CSV and XLSX file formats only. XML, nested JSON, and streaming data are not supported.
Inputs and outputs
| Inputs | Outputs |
|---|---|
|
File Location (S3, GCS, Azure, or SFTP path) Access Credentials (per cloud provider or protocol) Schema Confirmation (approve or adjust inferred column types) Mapping Decisions (optional column-to-attribute configuration) Tenant Selection (target tenant for Data Loader operations) Load Type and Mode (entities/relations, full or partial update) |
File Preview (header row, delimiter, initial type detection) Quality Scores (dataset and column-level, 0–100%) Issue Breakdown (grouped by severity with cleanup suggestions) Invalid Samples (up to 100 invalid values per column) Data Loader Mapping (suggested or confirmed mappings) Load Job Results (record counts, status, and error logs) Job Identifiers (workspace ID and load project ID) Processing Summary (data quality and load results in one view) |
How it works
- Connect: Provide the file location (S3, Azure, GCS, or SFTP) and access credentials. The agent checks access and confirms file availability.
- Preview: The agent scans the file to detect delimiters, infer column headers and types, and identify potential structural issues. You must confirm the inferred schema.
- Profile: The agent runs a profiling job to compute quality metrics, such as completeness, uniqueness, and validity, per column. It identifies anomalies, missing values, and invalid formats.
- Review: You receive a summary of quality scores, issue severity, and invalid value samples. You can ask follow-up questions (for example, “Why is email quality low?” or “Show invalid phone numbers”).
- Map and validate: If profiling results are acceptable, you can ask the agent to generate a mapping to the Reltio tenant schema. The agent uses tenant metadata to align fields.
- Load: Once the mapping is confirmed, the agent creates a job for Reltio Data Loader and monitors its execution. The final output includes load status, job ID, and any load errors.
Watch how it works
Watch how the Profiler Agent helps you evaluate the quality of a source file and load high-quality data into Reltio. This short demo walks through file preview, schema confirmation, profiling results, and optional loading using Data Loader. You'll see how AgentFlow guides the process step-by-step through a conversational interface.
Profiler prompt samples: what works and what to avoid
To get comfortable using the Profiler Agent, review concrete prompt examples before relying on its profiling results or load preparation in production. The topic shows how to specify source files, credentials, and follow-up queries using valid workspace context.
For more information, see Prompt samples for Data Profiler Agent.
Safeguards, permissions, and governance
- Credentials are provided by the user and used only for the requested operation.
- The agent does not generate or assume credentials.
- Schema and mapping confirmation is required before profiling or loading.
- Data loading requires explicit user approval.
- Job identifiers and timestamps provide traceability for operations.
Limitations and edge cases
- Only CSV and XLSX file formats are supported. XML, nested JSON, and binary formats are not supported.
- The agent does not support streaming or real-time ingestion workflows.
- Cross-column validations and referential integrity checks are not performed.
- Large files may require batching or staged execution to complete profiling.
- Follow-up queries, such as requesting invalid values, require a completed profiling job and a valid workspace ID.
- The agent does not modify, correct, or write changes to the source file.
- Profiling jobs fail if the file path or access credentials are missing or misconfigured.
- Mapping and schema alignment are limited to flat attribute structures defined in the Reltio tenant configuration.
- Match rules and survivorship logic are not configured or applied by this agent.