Error Response Format
When an error occurs, the API returns a JSON response with the following structure:| Field | Type | Description |
|---|---|---|
code | integer | The HTTP status code (e.g., 400, 401, 429) |
message | string | A human-readable description of what went wrong |
type | string | A machine-readable error category for programmatic handling |
type field helps you categorize errors in your code. Common types include:
invalid_request_error- The request was malformed or missing required fieldsauthentication_error- The API key is invalid or missingauthorization_error- The API key doesn’t have access to the requested resourcerate_limit_error- Too many requests in a given time periodserver_error- An internal error occurred on Heroku’s servers
HTTP Status Codes
400 Bad Request
A 400 error indicates that your request was malformed or contained invalid parameters. The API could not process the request because something in the request body, query parameters, or headers was incorrect. Common causes:- Missing required fields (
model,messages) - Invalid JSON syntax in the request body
- Parameter values outside allowed ranges (e.g.,
temperature> 1.0) - Malformed model names that don’t exist
- Invalid message format or role values
- Python
- cURL
- Ensure
modelmatches an available model ID (e.g.,claude-4-5-sonnet, notclaude-4.5-sonnet) - Verify
messagesis an array with at least one message object - Confirm each message has both
roleandcontentfields - Check that numeric parameters are within valid ranges
401 Unauthorized
A 401 error means the API could not authenticate your request. This typically indicates a problem with your API key. Common causes:- Missing
Authorizationheader - API key is malformed or contains extra whitespace
- API key has been regenerated and the old key is no longer valid
- Using the wrong environment’s API key (production vs. staging)
- Python
- Bash
-
Retrieve a fresh API key from your Heroku app:
-
Ensure there’s no whitespace or newline characters in your key:
-
If using the OpenAI SDK, verify the key is being passed correctly:
403 Forbidden
A 403 error indicates your API key is valid but doesn’t have permission to access the requested resource. This is different from 401—your key authenticated successfully, but authorization failed. Common causes:- Requesting a model that isn’t provisioned for your app
- API key doesn’t have access to the specific model tier
- Attempting to access resources belonging to a different organization or app
-
Use the model that matches your provisioned add-on:
-
Provision the model you need:
-
If using multiple models, use the correct INFERENCE_KEY for each:
429 Too Many Requests
A 429 error means you’ve exceeded the rate limits for your model. Heroku AI enforces both requests-per-minute and tokens-per-minute limits to ensure fair usage and system stability. Rate limits by model:| Model | Requests/min | Tokens/min |
|---|---|---|
| Claude 4.5 Sonnet | 150 | 800,000 |
| Claude 4 Sonnet | 150 | 800,000 |
| Claude 3.5 Haiku | 200 | 800,000 |
| Nova Pro / Lite | 150 | 800,000 |
| Stable Image Ultra | 20 | N/A |
| Header | Description |
|---|---|
X-RateLimit-Limit-Requests | Maximum requests allowed per minute |
X-RateLimit-Remaining-Requests | Requests remaining in current window |
X-RateLimit-Reset-Requests | Unix timestamp when the request limit resets |
Retry-After | Seconds to wait before retrying (on 429 responses) |
- Batch requests: For embeddings, send up to 96 inputs per request instead of one at a time
- Use prompt caching: Cache system prompts and tool definitions to reduce token usage
- Queue requests: Implement a request queue that respects rate limits
- Choose appropriate models: Use Claude 3.5 Haiku for high-volume, latency-sensitive workloads
500 Internal Server Error
A 500 error indicates an unexpected problem on Heroku’s servers. These errors are not caused by your request and are typically transient. Example error response:503 Service Unavailable
A 503 error indicates the service is temporarily unavailable, usually due to high load or maintenance. Example error response:Retry Strategies
Exponential Backoff
Exponential backoff is the recommended strategy for handling transient errors. The idea is to wait progressively longer between retries, reducing load on the server while eventually succeeding.- Python
- TypeScript
Request Queuing
For high-throughput applications, implement a request queue that respects rate limits:Debugging Tips
Using Request IDs
Every API response includes a unique request ID in theX-Request-ID header. Include this ID when contacting support:
Logging Recommendations
Enable detailed logging during development to diagnose issues:When to Contact Support
Contact Heroku Support if you experience:- Persistent 500/503 errors lasting more than 15 minutes
- Rate limit errors when your usage is well below documented limits
- Authentication errors with keys that previously worked
- Unexpected model behavior that differs from documentation
- Request ID from the error response
- Timestamp of when the issue occurred (with timezone)
- The exact error message and status code
- A minimal code example that reproduces the issue
- Your app name and region
Additional Resources
- Rate Limits - Detailed rate limits by model
- Troubleshooting - Common issues and solutions
- OpenAI SDK Compatibility - Using the OpenAI SDK with Heroku AI
- Heroku Status - Check for service incidents