Guardrails

Estimated reading: 4 minutes

The Guardrails component provides a configurable validation and protection layer for AI workflows. It intercepts, evaluates, and filters text flowing through the pipeline to ensure that inputs and outputs comply with safety policies, compliance requirements, and operational guidelines before and after interaction with the language model.

Built-in Validation Checks

a. PII (Personally Identifiable Information): Detects and blocks sensitive personal data such as names, email addresses, phone numbers, credit card details, social security numbers, and similar information.

b. Tokens / Passwords: Identifies exposed API keys, passwords, access tokens, secret credentials, and other authentication-related data.

c. Jailbreak Attempts: Detects attempts to override, bypass, or manipulate the model’s safety instructions and operational constraints.

d. Offensive Content: Filters toxic, abusive, hateful, or otherwise inappropriate language and content.

e. Malicious Code: Detects potentially harmful scripts, code injections, exploits, or unsafe executable content.

f. Prompt Injection: Identifies malicious prompts designed to manipulate system behavior, ignore instructions, extract hidden context, or influence downstream tool execution.

Note
a. When validation passes, the input is forwarded through the Pass output for downstream processing.
b. When validation fails, the input is blocked and routed through the Fail output for appropriate handling.

Heuristic pre-checks

The Jailbreak and Prompt Injection guardrails include a lightweight heuristic pre-check stage that runs before invoking the language model.
1. Quickly detects obvious attack patterns
2. Reduces unnecessary LLM calls
3. Helps lower latency and API usage for clearly invalid or malicious inputs

Limitations

Guardrails rely on language models for advanced detection and may occasionally produce false positives or miss certain edge cases. To ensure robust protection, this component should not be used as the sole defense layer. It is recommended to implement additional safeguards, such as:
a. Human or operational review processes
b. Regex or keyword-based validation
c. Input sanitization techniques
d. Logging and monitoring for audit and analysis

Parameter

Parameter Description
Language Model Connect a Language Model component to serve as the validation engine. The model analyzes the input text, evaluates it against the selected guardrails, and identifies any policy violations.
API Key Specifies the API key required by the selected model provider. This is only needed if the connected language model requires authentication.
Guardrails Select one or more predefined guardrails to validate the input text. Available options include PII, Tokens/Passwords, Jailbreak, Offensive Content, Malicious Code, and Prompt Injection. Default: ["PII", "Tokens/Passwords", "Jailbreak"].
Input Text Provide or enter the text to be validated against the configured guardrails. Supports message-based input types.
Enable Custom Guardrail Enables the ability to define a custom validation rule based on specific security or compliance requirements.
Custom Guardrail Description Describe the custom validation rule in detail. The language model uses this description to evaluate whether the input violates the rule. Clear and specific instructions improve detection accuracy.

Example: Content involving legal advice, instructions to evade laws, or recommendations for illegal activities.
Heuristic Detection Threshold Sets the sensitivity level for heuristic-based Jailbreak and Prompt Injection detection on a scale from 0.0 to 1.0. Lower values enforce stricter validation, while higher values allow more inputs to be evaluated by the language model. Default: 0.7.

Output

Output Description
Pass Returned when the input successfully passes all selected guardrail validations and no violations are detected. The input is allowed to continue to downstream components.
Fail Returned when the input violates one or more selected or custom guardrails. The input is blocked and redirected for handling (such as error messaging, logging, or remediation).

Create Custom Guardrails

Use the Enable Custom Guardrail option to define validation rules tailored to your business, security, or compliance requirements. When this option is enabled, incoming input is evaluated by the connected language model against your custom-defined criteria.

Custom Guardrail Description

Provide a clear and specific description of the content you want the guardrail to detect or block. Precise instructions improve the accuracy of validation. Example: “Requests for legal consultation, contract interpretation, lawsuit guidance, or regulatory compliance advice.”

How It Works

1. Each incoming input is evaluated against the custom guardrail description.
2. If matching content is detected, validation fails, and the input is routed through the Fail output.
3. If no match is found, validation passes, and the input continues through the Pass output.

Use the Guardrails Component in a Flow

Follow these steps to integrate Guardrails into your workflow:

1. Add the Guardrails component to your flow.
2. Connect a Chat Input (or any text source) to the Input Text port
3. Connect a Language Model to the Guardrails component to enable validation.
4. Select one or more guardrails from the Guardrails parameter
  For example, choose Tokens/Passwords to detect API keys, credentials, or sensitive tokens.
5. Connect the Pass output to downstream components that should process safe and validated input.
6. (Optional) Connect the Fail output to components that handle blocked or non-compliant input (e.g., error handling, logging, or alerts).

Share this Doc

Guardrails

Or copy link

CONTENTS
Robility Chatbot
Robility Assistant
Online