Authentication Issues
”Invalid API key” or 401 Errors
Symptoms: You receive one of these error messages:"Invalid API key provided""Unauthorized"- HTTP status code 401
- Verify the environment variable is set:
- Python
- Bash
- Test the key with a minimal request:
200 (success) or 403 (wrong model, but key is valid).
- Retrieve a fresh key from Heroku:
| Cause | Solution |
|---|---|
| Environment variable not exported | Run export INFERENCE_KEY=your-key or add to .env file |
| Key has trailing whitespace/newline | Use tr -d '[:space:]' when setting: export INFERENCE_KEY=$(heroku config:get INFERENCE_KEY -a app | tr -d '[:space:]') |
| Key was regenerated | Get the new key from heroku config:get INFERENCE_KEY -a your-app |
| Using wrong environment’s key | Verify you’re using keys from the correct app (production vs staging) |
| Key is URL-encoded | Don’t URL-encode the key; use it as-is |
”You do not have access to that model” (403)
Symptoms: You receive:"You do not have access to that model""authorization_error"- HTTP status code 403
- List your provisioned models:
- Compare with your request:
- Use the provisioned model:
- Or provision the model you need:
- For multiple models, use the correct key:
Rate Limiting Issues
Hitting Rate Limits (429 Errors)
Symptoms:"Rate limit exceeded"- HTTP status code 429
- Requests suddenly start failing after working fine
- Check your current usage:
- Check the rate limit headers:
| Model | Requests/min | Tokens/min |
|---|---|---|
| Claude 4.5 Sonnet | 150 | 800,000 |
| Claude 3.5 Haiku | 200 | 800,000 |
| Nova Pro/Lite | 150 | 800,000 |
| Stable Image Ultra | 20 | N/A |
- Implement exponential backoff: See Error Handling - Retry Strategies
- Reduce request frequency:
- Batch embedding requests:
- Use prompt caching to reduce token usage:
Model Response Issues
Slow Responses
Symptoms:- Requests take 10+ seconds to complete
- Timeouts in production
- Perceived latency issues in user-facing applications
- Measure actual latency:
- Compare models:
| Model | Typical Latency | Use Case |
|---|---|---|
| Claude 4.5 Haiku | 0.5-2s | High-volume, latency-sensitive |
| Claude 4.5 Sonnet | 2-8s | Complex reasoning |
| Claude 4 Sonnet | 2-8s | Complex reasoning |
| Nova Lite | 1-3s | Cost-effective general use |
- Use streaming for perceived performance:
- Use a faster model:
- Reduce prompt size:
Unexpected or Truncated Output
Symptoms:- Response ends mid-sentence
finish_reasonis"length"instead of"stop"- Output seems incomplete
- Increase
max_tokens:
- Handle long responses with continuation:
Structured Output Not Matching Schema
Symptoms:- JSON parsing errors
- Response doesn’t follow the requested format
- Missing fields in structured responses
- Use response_format for JSON mode:
- Provide explicit JSON schema in the prompt:
- Use function calling for guaranteed structure:
Agent and Tool Issues
MCP Server Connection Failures
Symptoms:- Tools not appearing in agent responses
"server_status": "disconnected"in MCP server list- Agent doesn’t use expected tools
- List registered MCP servers:
- Check server status:
- Verify the MCP server is running:
- Re-register the MCP server:
- Check network connectivity:
Tool Execution Errors
Symptoms:- Agent calls tool but receives an error
- Tool returns unexpected results
"primitives_status": "error"in MCP server
- Verify tool definitions match implementation:
- Check MCP server logs for errors:
- Test tools directly:
Getting Help
If you can’t resolve your issue using this guide:-
Gather diagnostic information:
- Request ID from error response
- Timestamp (with timezone)
- Error message and status code
- Minimal code to reproduce
-
Check resources:
- Error Handling - Detailed error codes
- Heroku Status - Service incidents
- Rate Limits - Current limits
-
Contact support:
- Heroku Support - For production issues
- Include all diagnostic information gathered above