Introduction
Welcome to the Inferia LLM documentation
What is Inferia LLM?
Inferia LLM is a comprehensive orchestration platform designed to manage the lifecycle of Large Language Model (LLM) inference. It bridges the gap between raw compute providers (Cloud, On-Prem, DePIN) and end-user applications.
At its core, Inferia provides a unified API to access models running across a distributed network of compute nodes, ensuring high availability, security, and performance.
Key Features
Unified Interface
OpenAI-compatible endpoints for seamless integration with existing tools and libraries.
Security First
Built-in Filtration Gateway provides PII redaction, prompt injection protection, and content safety scanning.
Orchestration
Intelligent routing and job management across diverse compute pools (Kubernetes, SkyPilot, Nosana).
Observability
Comprehensive logging and metrics via the Dashboard for full visibility into usage and performance.
System Components
Inferia consists of three main modular gateways:
- Inference Gateway: The high-performance entry point for API requests.
- Filtration Gateway: The security engine responsible for guardrails and policy enforcement.
- Orchestration Gateway: The control plane that manages compute resources and model deployments.
Getting Started
Ready to dive in?
- Quickstart: Run the full stack locally with Docker.
- Installation: Manual setup guide for individual services.
- Architecture: Deep dive into the system design.