Inference Gateway

The Inference Gateway provides high-performance, streaming-compatible endpoints for LLM interaction. It is fully compatible with the OpenAI API specification.

Base URL

http://localhost:8001/v1

Endpoints

Chat Completions

`model`

BodystringRequired

The name of the deployment to use (e.g., llama-3-8b).

`messages`

BodyarrayRequired

A list of messages comprising the conversation so far.

`stream`

Bodyboolean

If true, partial message deltas will be sent via Server-Sent Events.

curl http://localhost:8001/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{
    "model": "llama-3-8b",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Inference Gateway

Base URL

Endpoints

Chat Completions

model

messages

stream

On this page

`model`

`messages`

`stream`