API Reference
Inference Gateway
OpenAI-compatible inference endpoints
The Inference Gateway provides high-performance, streaming-compatible endpoints for LLM interaction. It is fully compatible with the OpenAI API specification.
Base URL
http://localhost:8001/v1
Endpoints
Chat Completions
model
BodystringRequired
The name of the deployment to use (e.g., llama-3-8b).
messages
BodyarrayRequired
A list of messages comprising the conversation so far.
stream
Bodyboolean
If true, partial message deltas will be sent via Server-Sent Events.
curl http://localhost:8001/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $API_KEY" \
-d '{
"model": "llama-3-8b",
"messages": [{"role": "user", "content": "Hello!"}]
}'