Developer Guide
System Architecture
Deep dive into the Inferia LLM architecture
Inferia LLM utilizes a microservices architecture to decouple inference serving from orchestration and security. This design allows for independent scaling and robust failure isolation.
High-Level Diagram
Component Details
Inference Gateway
The Inference Gateway is a stateless proxy that implements the OpenAI API specification.
- Responsibilities: Request validation, response streaming.
- Optimization: Uses In-Memory Context Caching (TTL 60s) to minimize network calls for routing configuration.
- Scaling: Horizontally scalable behind a load balancer.
Filtration Gateway
The Filtration Gateway acts as a centralized security and policy checkpoint.
- Responsibilities:
- Context Resolution: Resolves routing and config for the Inference Gateway (cached locally in-memory).
- Parallel Guardrails: Executes PII redaction and Input Scanning concurrently.
- Quota Management: High-speed Redis-based quota enforcement.
- Audit Logging: Asynchronous logging of all interactions.
- Providers: Integrates with external providers like Llama Guard, Lakera, and local PII models.
Orchestration Gateway
The Orchestration Gateway manages the physical and logical infrastructure. It exposes both a REST API (Deployment Management) and a gRPC Interface (Internal Service Communication).
- Responsibilities:
- Inventory Management: Tracking active nodes and their health.
- Job Dispatch: Using the Adapter Pattern to provision resources on Kubernetes, SkyPilot, or Nosana.
- Model Registry: Storing configuration for supported models.
Data Flow
- Auth: Client authenticates with
Inference Gateway. - Resolve: IG checks local in-memory cache or queries
Filtration Gatewayfor context/routing. - Scan: Request payload is sent to
Filtration Gatewayfor parallel safety scanning. - Execute: Validated request is forwarded to the worker (e.g., vLLM container).
- Stream: Tokens are streamed back to the client.
- Log: Inference metadata is asynchronously logged to
Filtration Gateway(fire-and-forget).