Prices per 1M tokens. Heroku charges standard provider rates with no markups.
Chat Models
| Model | Input | Output | Extended Thinking | Status |
|---|
| Claude Opus 4.5 | $5.00 | $25.00 | $25.00 | Active |
| Claude 4.5 Sonnet | $3.00 | $15.00 | $15.00 | Active |
| Claude 4 Sonnet | $3.00 | $15.00 | $15.00 | Active |
| Claude 4.5 Haiku | $1.00 | $5.00 | — | Active |
| Amazon Nova 2 Lite | $0.30 | $2.50 | Available | Active |
| Amazon Nova Pro | $0.80 | $3.20 | — | Active |
| Amazon Nova Lite | $0.06 | $0.24 | — | Active |
| Kimi K2 Thinking | $0.60 | $2.50 | Available | Active |
| MiniMax M2 | $0.50 | $2.00 | — | Active |
| Qwen3 235B | $0.80 | $3.00 | — | Active |
| Qwen3 Coder 480B | $1.00 | $4.00 | — | Active |
| OpenAI GPT OSS 120B | Self-managed | Self-managed | — | Active |
| Claude 3.7 Sonnet ⚠️ | $3.00 | $15.00 | $15.00 | Deprecated Feb 28 |
| Claude 3.5 Sonnet ⚠️ | $3.00 | $15.00 | $15.00 | Deprecated Feb 28 |
| Claude 3.5 Haiku ⚠️ | $0.80 | $4.00 | — | Deprecated Feb 28 |
| Claude 3.0 Haiku ⚠️ | $0.25 | $1.25 | — | Deprecated Feb 28 |
Extended thinking increases output token usage. Enable only when deeper reasoning is needed.
Embedding Models
| Model | Pricing |
|---|
| Cohere Embed Multilingual | Standard Cohere rate per 1M tokens |
Batch up to 96 texts per request for efficiency.
Image Generation
| Model | Pricing |
|---|
| Stable Image Ultra | Standard Stability rate per image |
Higher resolutions cost more. Aspect ratio doesn’t affect price at a given resolution.
Rerank Models
| Model | Pricing | Rate Limit |
|---|
| Cohere Rerank 3.5 | $2.00 / 1,000 searches | 250 RPM |
| Amazon Rerank 1.0 | Standard AWS rate | 200 RPM |
Each search can rank up to 1,000 documents against a query.
Billing
- Monthly billing to your Heroku account
- Pay per use — no minimums or commitments
- View usage in Heroku Dashboard → Resources → Heroku Inference
What’s a Token?
Roughly 4 characters or 3/4 of a word in English.
| Text | Tokens |
|---|
| ”Hello, how are you?” | ~6 |
| 100 tokens | ~75 words |
| 1,000 tokens | ~750 words |
Rate Limits
Request and token limits
Choosing a Model
Model selection guide