Guardrails Configuration

The Filtration Gateway enforces safety policies through a pipeline of guardrails. These checks happen before the prompt reaches the LLM (Input Guardrails) and after the LLM generates a response (Output Guardrails).

Available Guardrails

1. PII Redaction

Automatically detects and sanitizes Personally Identifiable Information (PII) such as emails, phone numbers, and addresses.

Provider: Microsoft Presidio (Local)
Config: Enabled by default via GUARDRAIL_PII_DETECTION_ENABLED=true

2. Prompt Injection Detection

Detects attempts to bypass system instructions or "jailbreak" the model.

Provider: Configurable (Lakera Guard, LLM Guard, etc.)
Config: Managed via Dashboard > Settings > Guardrails

3. Toxicity Detection

Evaluates content for hate speech, harassment, and explicit material.

Provider: Configurable (Llama Guard via Groq, LLM Guard, etc.)
Config: Managed via Dashboard > Settings > Guardrails

Note: Provider API keys (Lakera, Groq, etc.) are now configured through the Dashboard, not environment variables.

Performance Optimization

The Filtration Gateway utilizes Parallel Execution to minimize latency.

Concurrent Scanning: PII redaction and Input Guardrails (e.g., Prompt Injection) run in parallel using asyncio.gather.
Latency Impact: This reduces the overhead of safety checks by approximately 40%, ensuring that robust security doesn't compromise user experience.

Configuration

Guardrails are configured via the policy engine. A typical policy definition looks like this:

{
  "policy_id": "strict_safety",
  "input_guardrails": ["pii", "jailbreak", "toxicity"],
  "output_guardrails": ["toxicity"],
  "threshold": 0.8
}

Policies can be assigned per-API key in the Dashboard.

Adding Custom Guardrails

To add a custom guardrail, you must implement the GuardrailProvider interface and register it with the engine.

1. Create the Provider

Create a new file in package/src/inferia/services/filtration/guardrail/providers/ (e.g., regex_provider.py).

from typing import Dict, Any
from ..models import GuardrailResult, Violation
from .base import GuardrailProvider

class RegexProvider(GuardrailProvider):
    @property
    def name(self) -> str:
        return "regex-guard"

    async def scan_input(
        self, 
        text: str, 
        user_id: str, 
        config: Dict[str, Any], 
        metadata: Dict[str, Any] = None
    ) -> GuardrailResult:
        # Example: Block specific pattern
        if "forbidden_param" in text:
            return GuardrailResult(
                is_valid=False,
                violations=[Violation(type="custom", score=1.0, message="Forbidden pattern detected")]
            )
        return GuardrailResult(is_valid=True, sanitized_text=text)

    async def scan_output(
        self, 
        text: str, 
        output: str, 
        user_id: str, 
        config: Dict[str, Any], 
        metadata: Dict[str, Any] = None
    ) -> GuardrailResult:
        return GuardrailResult(is_valid=True, sanitized_text=output)

2. Register the Provider

Update package/src/inferia/services/filtration/guardrail/engine.py to initialize your new provider.

from .providers.regex_provider import RegexProvider

class GuardrailEngine:
    def _load_providers(self):
        # ... existing providers ...
        
        # Register new provider
        regex = RegexProvider()
        self.providers[regex.name] = regex

Environment Variables

Variable	Description	Default
`GUARDRAIL_PII_DETECTION_ENABLED`	Enable PII detection	`true`
`GUARDRAIL_ENABLE_GUARDRAILS`	Master switch for guardrails	`true`
`GUARDRAIL_DEFAULT_GUARDRAIL_ENGINE`	Default safety provider	`llm-guard`

Guardrails Configuration

On this page