Open Source LLM Gateway

Unified API Gateway for 100+ LLMs

One API, all providers. Blazing fast Rust backend with cost optimization, caching, and enterprise features.

Get Started View on GitHub

Open Source (AGPL-3.0)

Built with Rust

Self-Hosted

<5ms

Latency Overhead

Minimal processing delay

10K+

Requests/Second

High-throughput capacity

60-90%

Faster with Cache

Redis-powered responses

100+

LLM Models

Across all providers

Everything you need for LLM infrastructure

InferXgate handles the complexity so you can focus on building great AI products.

OpenAI-Compatible API

Drop-in replacement for OpenAI SDK. Use your existing code with any provider.

Multi-Provider Support

Anthropic, OpenAI, Google Gemini, Azure OpenAI - all through one unified API.

Intelligent Caching

Redis-powered caching delivers 60-90% faster responses for repeated queries.

Cost Optimization

Real-time cost tracking, budget alerts, and smart routing to reduce spending.

Real-time Analytics

Built-in dashboard with usage stats, latency metrics, and Prometheus integration.

Enterprise Security

JWT auth, virtual API keys, rate limiting, and domain whitelisting built-in.

One API for all your LLM providers

Switch between providers seamlessly. No code changes required.

Anthropic

Claude 4, 3.5 Sonnet, Haiku

OpenAI

GPT-5, GPT-4.1, GPT-4 Turbo

Google Gemini

Gemini 2.5 Pro, Flash, 1.5 Pro

Azure OpenAI

All Azure-deployed models

AWS Bedrock

Coming Soon

Multiple foundation models

Groq

Coming Soon

Llama, Mixtral (Ultra-fast)

More providers coming soon: Cohere, VertexAI, Ollama, and more.

Drop-in replacement for OpenAI SDK

Use your existing OpenAI SDK code with any provider. Just change the base URL and you're ready to go.

Works with Python, TypeScript, Go, and any OpenAI-compatible client
Full streaming support with Server-Sent Events (SSE)
Smart model routing based on provider prefixes

View Quick Start Guide


                from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:3000/v1",
    api_key="your-api-key"
)

response = client.chat.completions.create(
    model="claude-3-opus-20240229",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.content, end="")


                import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'http://localhost:3000/v1',
  apiKey: 'your-api-key',
});

const response = await client.chat.completions.create({
  model: 'gemini-1.5-pro',
  messages: [{ role: 'user', content: 'Hello!' }],
});

console.log(response.choices[0].message.content);


                curl -X POST http://localhost:3000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-api-key" \
  -d '{
    "model": "gpt-4-turbo",
    "messages": [{"role": "user", "content": "Hello!"}],
    "temperature": 0.7
  }'

Ready to simplify your LLM infrastructure?

Get started in under 5 minutes with Docker. InferXgate is free, open-source, and self-hosted.

Quick Start

docker-compose up -d

Get Started Free Star on GitHub

Free & Open Source

Self-Hosted

Built with Rust