Unify and manage your data

System prompt security scan

Learn more about how Agent Builder scans system prompts for security violations on every publish request and test run, and what the scan result includes.

Agent Builder automatically scans the system prompt on every publish request and every test run. The scan evaluates the prompt for security violations and produces a result of either pass or blocked.

When scans run

The scan runs automatically in two contexts:

  • Publish request: The scan runs when an author submits a publish request. The request does not proceed to a reviewer until the scan completes. If a violation is detected, the request is auto-blocked before it reaches the reviewer queue.
  • Test run: The scan runs when an author tests a draft agent in preview mode. If a violation is detected, the test run is blocked. This is a build-time check — runtime conversation guardrails do not re-check the system prompt on each conversation turn.

The scan evaluates the full system prompt text, including template variables, tool references, and constraint sections.

Scan result statuses

StatusMeaning
passNo violations detected. The publish request proceeds to the reviewer queue.
blockedCritical violation is detected. The publish request is auto-blocked and does not reach the reviewer queue. The author is notified with the scan result.

What scan results include

When the scan result is blocked, the result includes a policy_category value identifying the type of violation detected. If the scanner cannot classify the violation, the value is UNKNOWN. Scan results are stored alongside the publish request and are visible to the author in their request history.