API Overview

Available Endpoints

The Heroku Managed Inference and Agents API exposes endpoints for chat, agents, embeddings, images, and tooling:

Chat completionsOpenAI-compatible endpoint for multi-turn conversations.View docs →Messages APINative Anthropic SDK support for Claude models.View docs →AgentsRun tool-enabled agent loops with MCP integration.Explore →EmbeddingsGenerate vector embeddings for search and retrieval.Learn more →Image generationCreate visuals with Stable Image Ultra using text prompts.View API →MCP serversRegister and query Model Context Protocol servers via API.Read docs →CLI automationRun Heroku CLI commands to manage models and keys.Open guide →

Supported Models

Chat Models

The following chat models support the Chat Completions and Agents endpoints:

Claude 4 Sonnet - Latest flagship model with extended thinking
Claude 3.7 Sonnet - High intelligence with extended thinking
Claude 3.5 Sonnet - Balance of intelligence and speed
Claude 3.5 Haiku - Fast and cost-effective
Claude 3.0 Haiku - Ultra-fast responses
Amazon Nova Pro - Amazon’s advanced model
Amazon Nova Lite - Amazon’s efficient model
OpenAI GPT OSS 120B - Open-source compatible model

Embedding Models

Cohere Embed Multilingual - Multilingual text embeddings

Image Models

Stable Image Ultra - High-quality image generation

View our Model Cards for detailed information about each model’s capabilities, pricing, and features.

Quick Start

Create a Heroku app

heroku create my-ai-app

Provision a model

heroku ai:models:create claude-4-sonnet --app my-ai-app

Attach the model to the app and generate an alias you can reuse.

Get your credentials

heroku config --app my-ai-app

Copy the INFERENCE_KEY, INFERENCE_URL, and INFERENCE_MODEL_ID.

Call the API

Use your preferred language or cURL to hit the Chat Completions endpoint.

Need end-to-end examples in cURL, Python, TypeScript, and Java? Follow the Quickstart to copy ready-to-run snippets.

Base URL

All API requests use the following base URL:

https://us.inference.heroku.com

Authentication

All requests must include an Authorization header with your Heroku Inference API key:

Authorization: Bearer YOUR_INFERENCE_KEY

Rate Limits & Quotas

Rate limits and quotas vary by plan and model. See your Heroku Dashboard for current usage and limits.

Additional Resources

AI Studio

Visual interface for testing models

Heroku Tools

Built-in tools for agents

Working with MCP

Deploy custom MCP tools

AI Integrations

Framework integrations

Get started

Core concepts

Agents

Tools

Evaluation

Integrations

Reference

Cookbook

Available Endpoints

Supported Models

Chat Models

Embedding Models

Image Models

Quick Start

Base URL

Authentication

Rate Limits & Quotas

Additional Resources

AI Studio

Heroku Tools

Working with MCP

AI Integrations

Get started

Core concepts

Agents

Tools

Evaluation

Integrations

Reference

Cookbook

​Available Endpoints

​Supported Models

​Chat Models

​Embedding Models

​Image Models

​Quick Start

​Base URL

​Authentication

​Rate Limits & Quotas

​Additional Resources

AI Studio

Heroku Tools

Working with MCP

AI Integrations

Available Endpoints

Supported Models

Chat Models

Embedding Models

Image Models

Quick Start

Base URL

Authentication

Rate Limits & Quotas

Additional Resources