Introduction

What is Inferia LLM?

Inferia LLM is a comprehensive orchestration platform designed to manage the lifecycle of Large Language Model (LLM) inference. It bridges the gap between raw compute providers (Cloud, On-Prem, DePIN) and end-user applications.

At its core, Inferia provides a unified API to access models running across a distributed network of compute nodes, ensuring high availability, security, and performance.

Key Features

Unified Interface

OpenAI-compatible endpoints for seamless integration with existing tools and libraries.

Security First

Built-in Filtration Gateway provides PII redaction, prompt injection protection, and content safety scanning.

Orchestration

Intelligent routing and job management across diverse compute pools (Kubernetes, SkyPilot, Nosana).

Observability

Comprehensive logging and metrics via the Dashboard for full visibility into usage and performance.

System Components

Inferia consists of three main modular gateways:

Inference Gateway: The high-performance entry point for API requests.
Filtration Gateway: The security engine responsible for guardrails and policy enforcement.
Orchestration Gateway: The control plane that manages compute resources and model deployments.

Getting Started

Ready to dive in?

Quickstart: Run the full stack locally with Docker.
Installation: Manual setup guide for individual services.
Architecture: Deep dive into the system design.