Skip to main content
Heroku Managed Inference and Agents provides OpenAI SDK compatibility, allowing you to use familiar OpenAI client libraries to interact with Claude, Amazon Nova, and other models available on the platform.

Prerequisites

Before you begin, ensure you have:
  • Heroku Managed Inference and Agents add-on provisioned
  • An API key from your Heroku dashboard
  • The OpenAI SDK installed for your language

Why use OpenAI SDK compatibility?

OpenAI SDK compatibility offers several advantages:
  • Familiar interface: Use the same code patterns you already know from OpenAI
  • Easy migration: Switch between OpenAI and Heroku models with minimal code changes
  • Broad ecosystem: Leverage tools and frameworks built for OpenAI SDK
  • Multi-language support: Available in Python, Node.js, and other languages

Quick start

Python

Install the OpenAI Python SDK:
pip install openai
Configure the client to use Heroku’s inference endpoint:
from openai import OpenAI
import os

client = OpenAI(
    api_key=os.getenv("INFERENCE_KEY"),
    base_url=os.getenv("INFERENCE_URL") + "/v1/"
)

response = client.chat.completions.create(
    model="claude-4-5-sonnet",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing in simple terms."}
    ]
)

print(response.choices[0].message.content)

Node.js

Install the OpenAI Node.js SDK:
npm install openai
Configure the client:
import OpenAI from 'openai';

const client = new OpenAI({
    apiKey: process.env.INFERENCE_KEY,
    baseURL: process.env.INFERENCE_URL + '/v1/'
});

const response = await client.chat.completions.create({
    model: 'claude-4-5-sonnet',
    messages: [
        { role: 'system', content: 'You are a helpful assistant.' },
        { role: 'user', content: 'Explain quantum computing in simple terms.' }
    ]
});

console.log(response.choices[0].message.content);

Environment variables

Set up your environment with these variables:
export INFERENCE_KEY="your-api-key"
export INFERENCE_URL="https://inference.heroku.com"
export INFERENCE_MODEL_ID="claude-4-5-sonnet"
View your API credentials in the Heroku Dashboard.

Supported parameters

Heroku’s OpenAI-compatible API supports most standard chat completion parameters.

Request parameters

ParameterSupport levelDescription
modelFullModel identifier (e.g., claude-4-5-sonnet)
messagesFullArray of conversation messages
max_tokensFullMaximum tokens to generate
temperatureFullSampling temperature (0-1)
top_pFullNucleus sampling parameter
streamFullEnable streaming responses
stopFullStop sequences
ParameterSupport levelDescription
toolsFullFunction calling tools
tool_choiceFullControl tool selection
response_formatPartialStructured output format
seedIgnoredDeterministic sampling seed
nIgnoredNumber of completions
presence_penaltyIgnoredPresence penalty
frequency_penaltyIgnoredFrequency penalty
logit_biasIgnoredToken probability bias
ParameterSupport levelDescription
userFullEnd-user identifier
metadataPartialRequest metadata
Ignored parameters: Parameters marked as “Ignored” are accepted by the API but have no effect on the response. Use allow_ignored_params: false to receive errors instead.

Response structure

The response follows OpenAI’s standard format:
{
  "id": "msg_abc123",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "claude-4-5-sonnet",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Response text here"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 50,
    "total_tokens": 70
  }
}

Extended thinking

Claude 4.5 Sonnet, Claude 4 Sonnet, and Claude 3.7 Sonnet support extended thinking for complex reasoning tasks.

Enabling extended thinking

Use the extra_body parameter to configure extended thinking:
response = client.chat.completions.create(
    model="claude-4-5-sonnet",
    messages=[
        {"role": "user", "content": "Analyze the pros and cons of different database architectures."}
    ],
    extra_body={
        "extended_thinking": {
            "enabled": True,
            "budget_tokens": 5000
        }
    }
)

Extended thinking options

OptionTypeDescription
enabledbooleanEnable extended thinking mode
budget_tokensintegerToken budget for reasoning (max varies by model)
include_reasoningbooleanInclude reasoning in response (default: false)
Extended thinking increases output token usage. Set a token budget to control costs while enabling deeper analysis.

Streaming responses

Enable streaming to receive responses as they’re generated:
response = client.chat.completions.create(
    model="claude-4-5-haiku",
    messages=[
        {"role": "user", "content": "Write a short story about AI."}
    ],
    stream=True
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end='')

Streaming benefits

  • Lower perceived latency for users
  • Display partial results immediately
  • Better UX for longer responses
  • Efficient resource usage

Tool calling

Function calling allows models to interact with external tools and APIs.

Basic tool usage

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City name"
                    }
                },
                "required": ["location"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="claude-4-5-sonnet",
    messages=[
        {"role": "user", "content": "What's the weather in San Francisco?"}
    ],
    tools=tools,
    tool_choice="auto"
)

Tool calling requirements

When using tools, ensure each message includes a non-empty content field. Messages with only tool calls and no content may cause errors.

Processing tool calls

# Check if model wants to call a tool
if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    function_name = tool_call.function.name
    function_args = json.loads(tool_call.function.arguments)

    # Execute the function
    result = execute_function(function_name, function_args)

    # Return result to model
    messages.append(response.choices[0].message)
    messages.append({
        "role": "tool",
        "tool_call_id": tool_call.id,
        "content": json.dumps(result)
    })

    # Get final response
    final_response = client.chat.completions.create(
        model="claude-4-5-sonnet",
        messages=messages,
        tools=tools
    )

Model compatibility

All models available through Heroku Managed Inference and Agents work with the OpenAI SDK:
Model familyOpenAI SDK supportExtended thinkingVision
Claude 4.5 Sonnet
Claude 4.5 Haiku
Claude 4 Sonnet
Claude 3.7 Sonnet
Claude 3.5 SonnetOptional
Claude 3.5 Haiku
Amazon Nova Pro
Amazon Nova Lite
Vision capabilities require sending images in base64 format or via URLs in the message content.

Limitations and constraints

Known limitations

  1. Parameter support: Some OpenAI parameters are ignored (seed, n, penalties)
  2. Response format: Limited support for structured output schemas
  3. Tool calling: Requires non-empty content field in messages
  4. Extended thinking: Only available for specific Claude models
  5. Batch API: Not currently supported

Error handling

Handle API errors using standard try-catch patterns:
from openai import OpenAI, APIError

try:
    response = client.chat.completions.create(
        model="claude-4-5-sonnet",
        messages=[{"role": "user", "content": "Hello"}]
    )
except APIError as e:
    print(f"API error: {e.status_code} - {e.message}")

Migration from OpenAI

Switching from OpenAI to Heroku requires minimal code changes:
1

Update base URL

Point the SDK to Heroku’s endpoint:
client = OpenAI(
    api_key=os.getenv("INFERENCE_KEY"),
    base_url=os.getenv("INFERENCE_URL") + "/v1/"
)
2

Update model names

Replace OpenAI model names with Heroku models:
# Before: model="gpt-4"
# After:  model="claude-4-5-sonnet"
3

Test and validate

Test your application thoroughly, especially if using:
  • Function calling
  • Streaming responses
  • Vision capabilities

Best practices

  • Use claude-4-5-haiku for high-volume, latency-sensitive workloads
  • Enable streaming for better perceived performance
  • Set appropriate max_tokens limits to control costs
  • Cache frequently-used prompts when possible
  • Implement exponential backoff for rate limit errors
  • Validate inputs before sending to the API
  • Handle streaming disconnections gracefully
  • Log errors with request IDs for debugging
  • Never expose API keys in client-side code
  • Use environment variables for credentials
  • Validate user inputs to prevent prompt injection
  • Monitor usage patterns for anomalies
  • Choose appropriate models for each use case
  • Set max_tokens to prevent runaway generation
  • Use extended thinking only when necessary
  • Monitor token usage in the Heroku Dashboard

Additional resources

Chat Completions API

Native API reference with full parameter details

Models overview

Compare available models and capabilities

Pricing

Understand token costs and optimize spend

OpenAI SDK reference apps

Example applications using the OpenAI SDK