Available Endpoints
The Heroku Managed Inference and Agents API exposes endpoints for chat, agents, embeddings, images, and tooling:Chat completionsOpenAI-compatible endpoint for multi-turn conversations.View docs →Messages APINative Anthropic SDK support for Claude models.View docs →AgentsRun tool-enabled agent loops with MCP integration.Explore →EmbeddingsGenerate vector embeddings for search and retrieval.Learn more →Image generationCreate visuals with Stable Image Ultra using text prompts.View API →MCP serversRegister and query Model Context Protocol servers via API.Read docs →CLI automationRun Heroku CLI commands to manage models and keys.Open guide →
Supported Models
Chat Models
The following chat models support the Chat Completions and Agents endpoints:- Claude 4 Sonnet - Latest flagship model with extended thinking
- Claude 3.7 Sonnet - High intelligence with extended thinking
- Claude 3.5 Sonnet - Balance of intelligence and speed
- Claude 3.5 Haiku - Fast and cost-effective
- Claude 3.0 Haiku - Ultra-fast responses
- Amazon Nova Pro - Amazon’s advanced model
- Amazon Nova Lite - Amazon’s efficient model
- OpenAI GPT OSS 120B - Open-source compatible model
Embedding Models
- Cohere Embed Multilingual - Multilingual text embeddings
Image Models
- Stable Image Ultra - High-quality image generation
View our Model Cards for detailed information about each model’s capabilities, pricing, and features.
Quick Start
Base URL
All API requests use the following base URL:Authentication
All requests must include anAuthorization header with your Heroku Inference API key:
Rate Limits & Quotas
Rate limits and quotas vary by plan and model. See your Heroku Dashboard for current usage and limits.Additional Resources
AI Studio
Visual interface for testing models
Heroku Tools
Built-in tools for agents
Working with MCP
Deploy custom MCP tools
AI Integrations
Framework integrations