Production-Ready  ·  v4.0  ·  4 Providers

AI Infrastructure
for Modern
Builders.

Claude, GPT‑4o, Gemini, and Groq behind one unified API. 9 models. 4 providers. Real-time streaming. Zero re-implementation.

FastAPI + Docker
Deployed on Railway
API Key Auth
Rate Limited
axiom-ai-production-aaec.up.railway.app
Request
POST /ask X-API-Key: •••••••• { "question": "What is RAG?", "provider": "claude" }
Response
✓ 200 OK 318ms
{ "answer": "RAG combines retrieval with generation to ground AI in real data...", "tokens_used": 187 }
🟨 Claude · Sonnet 4.6
💎 Gemini 3.5 Flash
✓ 200 OK · 318ms
0
AI Providers
0
Rate Limit
0
Chat Memory
0
AI Models
Endpoints

Everything you need to build

Five focused endpoints. No bloat. No setup friction. Swap providers per request.

GET/status
🔺
Health Check
Live status endpoint. Returns version, timestamp, provider status. No auth required.
POST/ask
🧠
Single-Turn Q&A
Send a question and optional context. Get a precise AI answer back in one round trip.
POST/chat
💬
Multi-Turn Chat
Stateful conversations with 20-message rolling memory. Pass session_id to continue.
GET/models
📡
Model Registry
Lists available providers, model IDs, and live availability for each configured provider.
GET/session/{id}
📄
Session History
Retrieve the full conversation history for any active session. Auth required.
DELETE/session/{id}
🔀
Clear Session
Wipe a conversation from memory. Reset context without changing the session ID.
How It Works

First response in under 60 seconds

From zero to live AI in the time it takes to brew coffee.

1

Get Your API Key

Request access. Add X-API-Key to your header. Done in 10 seconds.

2

Pick Your Provider

Pass "provider":"claude", "openai", "gemini", or "groq" — switch any time, per request.

3

Ship Your Product

Hit /ask or /chat. Structured AI responses in milliseconds.

Providers

Four providers. One API.

Switch between Claude, GPT-4.1, Gemini 2.5, and Groq with a single field — no re-implementation, no SDK swaps.

Anthropic Claude
Haiku 4.5 · Sonnet 4.6 · Opus 4.7

The default provider. Exceptional reasoning, structured outputs, and RAG pipelines. 200K context window with industry-leading consistency at low latency.

Default Provider RAG-Ready 200K–1M Context 3 Models
OpenAI GPT-5
GPT-5.4 Mini · GPT-5.5

Cost-efficient and industry-standard. Ideal for code generation, JSON mode, and when clients prefer the OpenAI ecosystem.

Drop-in Switch JSON Mode Code Generation 2 Models
Google Gemini
Gemini 3.5 Flash · 2.5 Pro

Google's multimodal flagship. Industry-leading 1M token context window — perfect for huge documents, long conversations, and multimodal workloads.

1M Context Multimodal Long Documents 2 Models
Groq LPU
Llama 3.3 70B · Llama 3.1 8B

Hardware-accelerated inference on Groq's custom LPUs. Open-source models running up to 10× faster than GPU providers. Best-in-class token throughput.

LPU Accelerated Open Source Ultra-Fast 2 Models
Quick Start

Hit the API in seconds

Production-ready examples. Copy. Paste. Ship.

cURL
Python
JavaScript
# Single-turn Q&A ── Claude ──────────────────────────────────────
curl -X POST https://axiom-ai-production-aaec.up.railway.app/ask \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_KEY" \
  -d '{"question": "What is RAG?", "provider": "claude"}'

# Multi-turn chat ── start a new session ─────────────────────────
curl -X POST https://axiom-ai-production-aaec.up.railway.app/chat \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_KEY" \
  -d '{"message": "Hello!", "provider": "openai"}'

# Custom system prompt ───────────────────────────────────────────
curl -X POST https://axiom-ai-production-aaec.up.railway.app/ask \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_KEY" \
  -d '{"question": "Review my code", "system": "You are a senior Python engineer. Be blunt.", "provider": "claude"}'
# pip install requests
import requests

BASE = "https://axiom-ai-production-aaec.up.railway.app"
HEADERS = {
    "X-API-Key": "YOUR_KEY",
    "Content-Type": "application/json"
}

# Single-turn Q&A
r = requests.post(f"{BASE}/ask", headers=HEADERS, json={
    "question": "What is RAG?",
    "provider": "claude"
})
print(r.json()["answer"])

# Multi-turn chat
r = requests.post(f"{BASE}/chat", headers=HEADERS, json={
    "message": "Hello!", "provider": "openai"
})
session_id = r.json()["session_id"]  # save for next message
// Single-turn Q&A
const BASE = 'https://axiom-ai-production-aaec.up.railway.app';

const r = await fetch(`${BASE}/ask`, {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'X-API-Key': 'YOUR_KEY'
  },
  body: JSON.stringify({
    question: 'What is RAG?',
    provider: 'claude'
  })
});

const data = await r.json();
console.log(data.answer);  // done!

Ready to build?

Explore every endpoint, fire live requests, and integrate in minutes.