Open Source LLM Gateway

Unified API Gateway for 100+ LLMs

One API, all providers. Blazing fast Rust backend with cost optimization, caching, and enterprise features.

Open Source (AGPL-3.0)
Built with Rust
Self-Hosted
<5ms
Latency Overhead
Minimal processing delay
10K+
Requests/Second
High-throughput capacity
60-90%
Faster with Cache
Redis-powered responses
100+
LLM Models
Across all providers

Everything you need for LLM infrastructure

InferXgate handles the complexity so you can focus on building great AI products.

OpenAI-Compatible API

Drop-in replacement for OpenAI SDK. Use your existing code with any provider.

Multi-Provider Support

Anthropic, OpenAI, Google Gemini, Azure OpenAI - all through one unified API.

Intelligent Caching

Redis-powered caching delivers 60-90% faster responses for repeated queries.

Cost Optimization

Real-time cost tracking, budget alerts, and smart routing to reduce spending.

Real-time Analytics

Built-in dashboard with usage stats, latency metrics, and Prometheus integration.

Enterprise Security

JWT auth, virtual API keys, rate limiting, and domain whitelisting built-in.

One API for all your LLM providers

Switch between providers seamlessly. No code changes required.

Anthropic

Claude 4, 3.5 Sonnet, Haiku

OpenAI

GPT-5, GPT-4.1, GPT-4 Turbo

Google Gemini

Gemini 2.5 Pro, Flash, 1.5 Pro

Azure OpenAI

All Azure-deployed models

AWS Bedrock

Coming Soon

Multiple foundation models

Groq

Coming Soon

Llama, Mixtral (Ultra-fast)

More providers coming soon: Cohere, VertexAI, Ollama, and more.

Drop-in replacement for OpenAI SDK

Use your existing OpenAI SDK code with any provider. Just change the base URL and you're ready to go.

  • Works with Python, TypeScript, Go, and any OpenAI-compatible client
  • Full streaming support with Server-Sent Events (SSE)
  • Smart model routing based on provider prefixes
View Quick Start Guide

                from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:3000/v1",
    api_key="your-api-key"
)

response = client.chat.completions.create(
    model="claude-3-opus-20240229",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.content, end="")
              

Ready to simplify your LLM infrastructure?

Get started in under 5 minutes with Docker. InferXgate is free, open-source, and self-hosted.

Quick Start
docker-compose up -d
Free & Open Source
Self-Hosted
Built with Rust