InferiaLLMInferiaLLM
API Reference

Inference Gateway

OpenAI-compatible inference endpoints

The Inference Gateway provides high-performance, streaming-compatible endpoints for LLM interaction. It is fully compatible with the OpenAI API specification.

Base URL

http://localhost:8001/v1

Endpoints

Chat Completions

model

BodystringRequired

The name of the deployment to use (e.g., llama-3-8b).

messages

BodyarrayRequired

A list of messages comprising the conversation so far.

stream

Bodyboolean

If true, partial message deltas will be sent via Server-Sent Events.

curl http://localhost:8001/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{
    "model": "llama-3-8b",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

On this page