Skip to main content

Available Endpoints

The Heroku Managed Inference and Agents API exposes endpoints for chat, agents, embeddings, images, and tooling:

Supported Models

Chat Models

The following chat models support the Chat Completions and Agents endpoints:
  • Claude 4 Sonnet - Latest flagship model with extended thinking
  • Claude 3.7 Sonnet - High intelligence with extended thinking
  • Claude 3.5 Sonnet - Balance of intelligence and speed
  • Claude 3.5 Haiku - Fast and cost-effective
  • Claude 3.0 Haiku - Ultra-fast responses
  • Amazon Nova Pro - Amazon’s advanced model
  • Amazon Nova Lite - Amazon’s efficient model
  • OpenAI GPT OSS 120B - Open-source compatible model

Embedding Models

  • Cohere Embed Multilingual - Multilingual text embeddings

Image Models

  • Stable Image Ultra - High-quality image generation
View our Model Cards for detailed information about each model’s capabilities, pricing, and features.

Quick Start

1

Create a Heroku app

heroku create my-ai-app
2

Provision a model

heroku ai:models:create claude-4-sonnet --app my-ai-app
Attach the model to the app and generate an alias you can reuse.
3

Get your credentials

heroku config --app my-ai-app
Copy the INFERENCE_KEY, INFERENCE_URL, and INFERENCE_MODEL_ID.
4

Call the API

Use your preferred language or cURL to hit the Chat Completions endpoint.
Need end-to-end examples in cURL, Python, TypeScript, and Java? Follow the Quickstart to copy ready-to-run snippets.

Base URL

All API requests use the following base URL:
https://us.inference.heroku.com

Authentication

All requests must include an Authorization header with your Heroku Inference API key:
Authorization: Bearer YOUR_INFERENCE_KEY

Rate Limits & Quotas

Rate limits and quotas vary by plan and model. See your Heroku Dashboard for current usage and limits.

Additional Resources

AI Studio

Visual interface for testing models

Heroku Tools

Built-in tools for agents

Working with MCP

Deploy custom MCP tools

AI Integrations

Framework integrations