Skip to main content
Migrating from OpenAI to Heroku AI is straightforward because Heroku AI provides full OpenAI SDK compatibility. In most cases, you only need to change your base URL and API key—your existing code, prompts, and integrations continue to work. This guide covers everything you need to know to migrate: configuration changes, model selection, feature differences, and testing strategies.
Already using Anthropic? If you’re migrating from Anthropic’s API (not OpenAI), you can use the Messages API instead. It provides native Anthropic SDK compatibility—just change your base URL and API key.

Why Migrate to Heroku AI

Teams migrate to Heroku AI for several reasons:
  • Unified platform: Manage your AI models alongside your Heroku applications, databases, and add-ons
  • Model variety: Access Claude (Anthropic), Nova (Amazon), and other models through a single API
  • Simplified billing: One invoice for compute, add-ons, and AI usage
  • Enterprise features: Heroku’s security, compliance, and support infrastructure

Quick Migration (5 Minutes)

For applications using the OpenAI Python or Node.js SDK, migration requires only two configuration changes:
  1. Change the base URL from api.openai.com to us.inference.heroku.com
  2. Replace your OpenAI API key with your Heroku INFERENCE_KEY

Python

from openai import OpenAI

client = OpenAI(
    api_key="sk-..."  # OpenAI API key
)

response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is Heroku?"}
    ]
)

print(response.choices[0].message.content)

TypeScript / Node.js

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'sk-...',  // OpenAI API key
});

const response = await client.chat.completions.create({
  model: 'gpt-4',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'What is Heroku?' }
  ],
});

console.log(response.choices[0].message.content);

Environment Setup

Set up your environment variables for local development:
# .env file for local development
INFERENCE_URL=https://us.inference.heroku.com
INFERENCE_KEY=inf-your-api-key-here
INFERENCE_MODEL_ID=claude-4-5-sonnet
For Heroku deployment, these variables are set automatically when you provision a model:
# Provision a model (sets environment variables automatically)
heroku ai:models:create claude-4-5-sonnet -a your-app-name

# Verify the variables are set
heroku config -a your-app-name | grep INFERENCE

Endpoint Mapping

Heroku AI implements the same endpoints as OpenAI:
OpenAI EndpointHeroku AI EndpointStatus
POST /v1/chat/completionsPOST /v1/chat/completionsFully compatible
POST /v1/embeddingsPOST /v1/embeddingsFully compatible
POST /v1/images/generationsPOST /v1/images/generationsFully compatible
POST /v1/moderationsNot available
POST /v1/audio/transcriptionsNot available
POST /v1/audio/speechNot available
GET /v1/modelsNot available
POST /v1/filesNot available
POST /v1/fine-tuning/jobsNot available
Note: Heroku AI also offers unique endpoints not available on OpenAI:
Heroku AI EndpointDescription
POST /v1/messagesNative Anthropic Messages API for Claude (use Anthropic SDK)
POST /v1/agents/herokuAgentic loop with autonomous tool execution
POST /v1/rerankDocument reranking for improved RAG
GET /v1/mcp/serversMCP server management

Model Selection

Choose a Heroku AI model based on your current OpenAI model and use case:
OpenAI ModelRecommended Heroku AI ModelNotes
gpt-4oclaude-4-5-sonnetBest general-purpose replacement
gpt-4-turboclaude-4-5-sonnetComparable capabilities
gpt-4claude-4-sonnetStrong reasoning
gpt-3.5-turboclaude-4-5-haikuFast, cost-effective
text-embedding-3-smallcohere-embed-multilingual1024 dimensions
text-embedding-3-largecohere-embed-multilingualUse for multilingual
dall-e-3stable-image-ultraHigh-quality images

Model-Specific Considerations

Claude models (Anthropic) Claude models handle prompts slightly differently than GPT models:
# GPT-4 style prompt
messages = [
    {"role": "system", "content": "You are a helpful coding assistant."},
    {"role": "user", "content": "Write a Python function to sort a list."}
]

# Works identically with Claude - no changes needed
response = client.chat.completions.create(
    model="claude-4-5-sonnet",
    messages=messages
)
Claude tends to be more verbose by default. To get concise responses, explicitly request them:
messages = [
    {
        "role": "system",
        "content": "You are a helpful assistant. Keep responses brief and direct."
    },
    {"role": "user", "content": "What is Heroku?"}
]
Nova models (Amazon) Nova models follow the same API but may interpret instructions differently. Test your prompts after migration:
# Works with both GPT and Nova
response = client.chat.completions.create(
    model="nova-pro",  # or nova-lite for faster responses
    messages=[{"role": "user", "content": "Summarize this article..."}]
)

Choosing the Right Model

PriorityRecommended ModelReason
Lowest latencyclaude-4-5-haikuFastest response times
Complex reasoningclaude-4-5-sonnetBest for analysis, coding
Cost optimizationnova-liteGood quality, lower cost
Extended contextclaude-4-5-sonnet200K token context
Image generationstable-image-ultraHigh-quality outputs

Feature Parity

Fully Supported Features

These OpenAI features work identically on Heroku AI:
FeatureSupport LevelNotes
Chat completionsFullIdentical API
StreamingFullSSE format identical
Function calling / ToolsFullSame schema format
JSON modeFullresponse_format: {"type": "json_object"}
System messagesFullIdentical behavior
Multi-turn conversationsFullSame message format
Token usage reportingFullSame usage object
Temperature / top_pFullSame parameters
Stop sequencesFullSame behavior
EmbeddingsFullDifferent model names

Features with Differences

Some features work differently or have limitations:
FeatureDifferenceWorkaround
seed parameterIgnoredDeterministic outputs not guaranteed
n parameterIgnoredGenerate multiple completions with separate calls
logprobsNot available
presence_penaltyIgnoredUse prompt instructions for variety
frequency_penaltyIgnoredUse prompt instructions for variety
Vision / image inputModel-dependentUse Claude 4.5 Sonnet for vision
Example handling the n parameter:
# OpenAI: Generate 3 completions at once
# response = client.chat.completions.create(model="gpt-4", n=3, ...)

# Heroku AI: Generate 3 completions with separate calls
completions = []
for _ in range(3):
    response = client.chat.completions.create(
        model="claude-4-5-sonnet",
        messages=[{"role": "user", "content": "Write a haiku"}],
        temperature=0.9  # Higher temperature for variety
    )
    completions.append(response.choices[0].message.content)

Features Not Available

These OpenAI features are not currently available on Heroku AI:
FeatureAlternative
Fine-tuningUse detailed system prompts and few-shot examples
Assistants APIUse the Agents endpoint with MCP tools
Audio transcriptionUse a dedicated transcription service
Text-to-speechUse a dedicated TTS service
ModerationImplement custom content filtering
Batch APIProcess requests sequentially with rate limiting

Code Migration Examples

Basic Chat Completion

The simplest migration—just change configuration:
from openai import OpenAI

client = OpenAI()  # Uses OPENAI_API_KEY env var

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

Streaming Responses

Streaming works identically:
stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Function Calling / Tools

Tool definitions work identically:
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string"}
            },
            "required": ["location"]
        }
    }
}]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in SF?"}],
    tools=tools
)

Embeddings

Embeddings use a different model name but the same API:
response = client.embeddings.create(
    model="text-embedding-3-small",
    input=["Hello world", "Goodbye world"]
)

for i, embedding in enumerate(response.data):
    print(f"Text {i}: {len(embedding.embedding)} dimensions")
Note: Cohere embeddings require an input_type parameter. Use "search_document" for content you’re indexing and "search_query" for search queries.

Environment Configuration

Local Development

Create a .env file for local development:
# .env
INFERENCE_URL=https://us.inference.heroku.com
INFERENCE_KEY=inf-your-key-here
INFERENCE_MODEL_ID=claude-4-5-sonnet
Load it in your application:
from dotenv import load_dotenv
load_dotenv()

import os
from openai import OpenAI

client = OpenAI(
    base_url=os.getenv("INFERENCE_URL") + "/v1",
    api_key=os.getenv("INFERENCE_KEY")
)

Heroku Deployment

Environment variables are set automatically when you provision a model:
# Provision the model
heroku ai:models:create claude-4-5-sonnet -a your-app-name

# Variables are now set:
# INFERENCE_KEY=inf-...
# INFERENCE_URL=https://us.inference.heroku.com
# INFERENCE_MODEL_ID=claude-4-5-sonnet
For multiple models, each add-on gets a unique prefix:
# List all inference-related config vars
heroku config -a your-app-name | grep -E "INFERENCE|HEROKU_INFERENCE"

Multiple Environments

For staging/production parity:
# Production
heroku config:set NODE_ENV=production -a your-app-production

# Staging (uses same model)
heroku addons:attach your-app-production::inference -a your-app-staging

Testing Your Migration

Verification Checklist

After migrating, verify each functionality:
  • Basic chat completions work
  • Streaming responses render correctly
  • Function/tool calls execute properly
  • Error handling catches API errors
  • Rate limiting is handled gracefully
  • Token usage is tracked correctly

Test Script

Run this script to verify your migration:
import os
from openai import OpenAI

def test_migration():
    client = OpenAI(
        base_url=os.getenv("INFERENCE_URL") + "/v1",
        api_key=os.getenv("INFERENCE_KEY")
    )
    model = os.getenv("INFERENCE_MODEL_ID")

    print(f"Testing with model: {model}")
    print("-" * 40)

    # Test 1: Basic completion
    print("Test 1: Basic completion...")
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": "Say 'Migration successful!' and nothing else."}],
        max_tokens=20
    )
    assert "successful" in response.choices[0].message.content.lower()
    print(f"  ✓ Response: {response.choices[0].message.content}")

    # Test 2: Streaming
    print("Test 2: Streaming...")
    stream = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": "Count from 1 to 3."}],
        stream=True,
        max_tokens=20
    )
    chunks = list(stream)
    assert len(chunks) > 1
    print(f"  ✓ Received {len(chunks)} chunks")

    # Test 3: Token usage
    print("Test 3: Token usage tracking...")
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": "Hello"}],
        max_tokens=10
    )
    assert response.usage.prompt_tokens > 0
    assert response.usage.completion_tokens > 0
    print(f"  ✓ Tokens: {response.usage.prompt_tokens} in, {response.usage.completion_tokens} out")

    print("-" * 40)
    print("All tests passed! Migration successful.")

if __name__ == "__main__":
    test_migration()

Common Migration Issues

IssueCauseSolution
401 UnauthorizedWrong API keyUse INFERENCE_KEY, not OpenAI key
403 ForbiddenWrong modelUse provisioned model from INFERENCE_MODEL_ID
Different output styleModel behaviorAdjust prompts for Claude/Nova specifics
Missing n parameterNot supportedMake multiple calls for multiple completions

Additional Resources