Production-Ready · v4.2 · 4 Providers

AI Infrastructure
for Modern
Builders.

Claude, GPT‑5.5, Gemini 3.5, and Groq behind one unified API. 9 models. 4 providers. Real-time streaming. Zero re-implementation.

⚡ Try in Docs 🔺 System Status

✓

FastAPI + Docker

✓

Deployed on Railway

✓

API Key Auth

✓

Rate Limited

axiom-ai-production-aaec.up.railway.app

Illustrative request

POST /ask X-API-Key: •••••••• { "question": "What is RAG?", "provider": "claude" }

Illustrative response

✓ 200 OK example

{ "answer": "RAG combines retrieval with generation to ground AI in real data...", "tokens_used": 187 }

🟨 Claude · Sonnet 4.6

💎 Gemini 3.5 Flash

✓ Unified contract

AI Providers

Rate Limit

Chat Memory

AI Models

30 local tests · provider-free

When upstream breaks, Axiom tells the truth.

A cinematic replay of four real contracts enforced in CI. This panel never calls a provider: it makes the tested failure semantics visible without spending inference credits or staging a fake success.

After partial output, the stream ends with one named error event and never emits a false terminal done.
Raw provider exception details stay behind the boundary; clients receive stable codes and retry hints.
A failed chat turn leaves the existing session unchanged—no half-written conversation state.
OpenAI rate limits get one Axiom application attempt; retry ownership stays inside the provider SDK.

Inspect the contracts ↗ Open API reference →

Partial stream failure

Contract replay — no provider call

messagetoken: "partial"

eventerror

codeupstream_failure

retryablefalse

doneblocked

Result: partial content is preserved, the failure is explicit, and success is never fabricated.

Endpoints

Everything you need to build

One consistent interface for health, inference, sessions, usage, and reliability benchmarking.

GET/status

→

🔺

Health Check

Live status endpoint. Returns version, timestamp, provider status. No auth required.

POST/ask

→

🧠

Single-Turn Q&A

Send a question and optional context. Get a precise AI answer back in one round trip.

POST/chat

→

💬

Multi-Turn Chat

Stateful conversations with 20-message rolling memory. Pass session_id to continue.

GET/models

→

📡

Model Registry

Lists available providers, model IDs, and live availability for each configured provider.

GET/session/{id}

→

📄

Session History

Retrieve the full conversation history for any active session. Auth required.

DELETE/session/{id}

→

🔀

Clear Session

Wipe a conversation from memory. Reset context without changing the session ID.

How It Works

One request shape. Four providers.

Provider switching is one explicit JSON field, while auth and response structure stay consistent.

Get Your API Key

Request access. Add X-API-Key to your header. Done in 10 seconds.

Pick Your Provider

Pass "provider":"claude", "openai", "gemini", or "groq" — switch any time, per request.

Ship Your Product

Hit /ask or /chat. Structured AI responses in milliseconds.

Providers

Four providers. One API.

Switch between Claude, GPT-5.5, Gemini 3.5, and Groq with a single field — no re-implementation, no SDK swaps.

🟨

Anthropic Claude

Haiku 4.5 · Sonnet 4.6 · Opus 4.7

The default provider. Exceptional reasoning, structured outputs, and RAG pipelines. 200K context window with industry-leading consistency at low latency.

Default Provider RAG-Ready 200K–1M Context 3 Models

🟩

OpenAI GPT-5

GPT-5.4 Mini · GPT-5.5

Cost-efficient and industry-standard. Ideal for code generation, JSON mode, and when clients prefer the OpenAI ecosystem.

Drop-in Switch JSON Mode Code Generation 2 Models

💎

Google Gemini

Gemini 3.5 Flash · Gemini 3.5 Pro

Google's multimodal flagship. Industry-leading 1M token context window — perfect for huge documents, long conversations, and multimodal workloads.

1M Context Multimodal Long Documents 2 Models

⚡

Groq LPU

Llama 3.3 70B · Llama 3.1 8B

Latency-focused inference on Groq's custom LPUs, exposing the same Axiom request and failure contracts as every other provider adapter.

LPU Accelerated Open Source Ultra-Fast 2 Models

Quick Start

Hit the API in seconds

Production-ready examples. Copy. Paste. Ship.

cURL

Python

JavaScript

# Single-turn Q&A ── Claude ──────────────────────────────────────
curl -X POST https://axiom-ai-production-aaec.up.railway.app/ask \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_KEY" \
  -d '{"question": "What is RAG?", "provider": "claude"}'

# Multi-turn chat ── start a new session ─────────────────────────
curl -X POST https://axiom-ai-production-aaec.up.railway.app/chat \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_KEY" \
  -d '{"message": "Hello!", "provider": "openai"}'

# Custom system prompt ───────────────────────────────────────────
curl -X POST https://axiom-ai-production-aaec.up.railway.app/ask \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_KEY" \
  -d '{"question": "Review my code", "system": "You are a senior Python engineer. Be blunt.", "provider": "claude"}'

# pip install requests
import requests

BASE = "https://axiom-ai-production-aaec.up.railway.app"
HEADERS = {
    "X-API-Key": "YOUR_KEY",
    "Content-Type": "application/json"
}

# Single-turn Q&A
r = requests.post(f"{BASE}/ask", headers=HEADERS, json={
    "question": "What is RAG?",
    "provider": "claude"
})
print(r.json()["answer"])

# Multi-turn chat
r = requests.post(f"{BASE}/chat", headers=HEADERS, json={
    "message": "Hello!", "provider": "openai"
})
session_id = r.json()["session_id"]  # save for next message

// Single-turn Q&A
const BASE = 'https://axiom-ai-production-aaec.up.railway.app';

const r = await fetch(`${BASE}/ask`, {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'X-API-Key': 'YOUR_KEY'
  },
  body: JSON.stringify({
    question: 'What is RAG?',
    provider: 'claude'
  })
});

const data = await r.json();
console.log(data.answer);  // done!

Ready to build?

Explore every endpoint, fire live requests, and integrate in minutes.

⚡ Open API Reference View Available Models

AI Infrastructure for Modern Builders.