Prerequisites
Before you begin, ensure you have:- Heroku Managed Inference and Agents add-on provisioned
- An API key from your Heroku dashboard
- The OpenAI SDK installed for your language
Why use OpenAI SDK compatibility?
OpenAI SDK compatibility offers several advantages:- Familiar interface: Use the same code patterns you already know from OpenAI
- Easy migration: Switch between OpenAI and Heroku models with minimal code changes
- Broad ecosystem: Leverage tools and frameworks built for OpenAI SDK
- Multi-language support: Available in Python, Node.js, and other languages
Quick start
Python
Install the OpenAI Python SDK:Node.js
Install the OpenAI Node.js SDK:Environment variables
Set up your environment with these variables:Supported parameters
Heroku’s OpenAI-compatible API supports most standard chat completion parameters.Request parameters
Core parameters
Core parameters
| Parameter | Support level | Description |
|---|---|---|
model | Full | Model identifier (e.g., claude-4-5-sonnet) |
messages | Full | Array of conversation messages |
max_tokens | Full | Maximum tokens to generate |
temperature | Full | Sampling temperature (0-1) |
top_p | Full | Nucleus sampling parameter |
stream | Full | Enable streaming responses |
stop | Full | Stop sequences |
Advanced parameters
Advanced parameters
| Parameter | Support level | Description |
|---|---|---|
tools | Full | Function calling tools |
tool_choice | Full | Control tool selection |
response_format | Partial | Structured output format |
seed | Ignored | Deterministic sampling seed |
n | Ignored | Number of completions |
presence_penalty | Ignored | Presence penalty |
frequency_penalty | Ignored | Frequency penalty |
logit_bias | Ignored | Token probability bias |
Metadata parameters
Metadata parameters
| Parameter | Support level | Description |
|---|---|---|
user | Full | End-user identifier |
metadata | Partial | Request metadata |
Ignored parameters: Parameters marked as “Ignored” are accepted by the API but have no effect on the response. Use
allow_ignored_params: false to receive errors instead.Response structure
The response follows OpenAI’s standard format:Extended thinking
Claude 4.5 Sonnet, Claude 4 Sonnet, and Claude 3.7 Sonnet support extended thinking for complex reasoning tasks.Enabling extended thinking
Use theextra_body parameter to configure extended thinking:
Extended thinking options
| Option | Type | Description |
|---|---|---|
enabled | boolean | Enable extended thinking mode |
budget_tokens | integer | Token budget for reasoning (max varies by model) |
include_reasoning | boolean | Include reasoning in response (default: false) |
Streaming responses
Enable streaming to receive responses as they’re generated:Streaming benefits
- Lower perceived latency for users
- Display partial results immediately
- Better UX for longer responses
- Efficient resource usage
Tool calling
Function calling allows models to interact with external tools and APIs.Basic tool usage
Tool calling requirements
Processing tool calls
Model compatibility
All models available through Heroku Managed Inference and Agents work with the OpenAI SDK:| Model family | OpenAI SDK support | Extended thinking | Vision |
|---|---|---|---|
| Claude 4.5 Sonnet | ✓ | ✓ | ✓ |
| Claude 4.5 Haiku | ✓ | ✗ | ✗ |
| Claude 4 Sonnet | ✓ | ✓ | ✓ |
| Claude 3.7 Sonnet | ✓ | ✓ | ✓ |
| Claude 3.5 Sonnet | ✓ | Optional | ✓ |
| Claude 3.5 Haiku | ✓ | ✗ | ✗ |
| Amazon Nova Pro | ✓ | ✗ | ✗ |
| Amazon Nova Lite | ✓ | ✗ | ✗ |
Vision capabilities require sending images in base64 format or via URLs in the message content.
Limitations and constraints
Known limitations
- Parameter support: Some OpenAI parameters are ignored (seed, n, penalties)
- Response format: Limited support for structured output schemas
- Tool calling: Requires non-empty content field in messages
- Extended thinking: Only available for specific Claude models
- Batch API: Not currently supported
Error handling
Handle API errors using standard try-catch patterns:Migration from OpenAI
Switching from OpenAI to Heroku requires minimal code changes:Best practices
Performance optimization
Performance optimization
- Use
claude-4-5-haikufor high-volume, latency-sensitive workloads - Enable streaming for better perceived performance
- Set appropriate
max_tokenslimits to control costs - Cache frequently-used prompts when possible
Error handling
Error handling
- Implement exponential backoff for rate limit errors
- Validate inputs before sending to the API
- Handle streaming disconnections gracefully
- Log errors with request IDs for debugging
Security
Security
- Never expose API keys in client-side code
- Use environment variables for credentials
- Validate user inputs to prevent prompt injection
- Monitor usage patterns for anomalies
Cost management
Cost management
- Choose appropriate models for each use case
- Set
max_tokensto prevent runaway generation - Use extended thinking only when necessary
- Monitor token usage in the Heroku Dashboard
Additional resources
Chat Completions API
Native API reference with full parameter details
Models overview
Compare available models and capabilities
Pricing
Understand token costs and optimize spend
OpenAI SDK reference apps
Example applications using the OpenAI SDK