Skip to main content
Prices per 1M tokens. Heroku charges standard provider rates with no markups.

Chat Models

ModelInputOutputExtended ThinkingStatus
Claude Opus 4.5$5.00$25.00$25.00Active
Claude 4.5 Sonnet$3.00$15.00$15.00Active
Claude 4 Sonnet$3.00$15.00$15.00Active
Claude 4.5 Haiku$1.00$5.00Active
Amazon Nova 2 Lite$0.30$2.50AvailableActive
Amazon Nova Pro$0.80$3.20Active
Amazon Nova Lite$0.06$0.24Active
Kimi K2 Thinking$0.60$2.50AvailableActive
MiniMax M2$0.50$2.00Active
Qwen3 235B$0.80$3.00Active
Qwen3 Coder 480B$1.00$4.00Active
OpenAI GPT OSS 120BSelf-managedSelf-managedActive
Claude 3.7 Sonnet ⚠️$3.00$15.00$15.00Deprecated Feb 28
Claude 3.5 Sonnet ⚠️$3.00$15.00$15.00Deprecated Feb 28
Claude 3.5 Haiku ⚠️$0.80$4.00Deprecated Feb 28
Claude 3.0 Haiku ⚠️$0.25$1.25Deprecated Feb 28
Extended thinking increases output token usage. Enable only when deeper reasoning is needed.

Embedding Models

ModelPricing
Cohere Embed MultilingualStandard Cohere rate per 1M tokens
Batch up to 96 texts per request for efficiency.

Image Generation

ModelPricing
Stable Image UltraStandard Stability rate per image
Higher resolutions cost more. Aspect ratio doesn’t affect price at a given resolution.

Rerank Models

ModelPricingRate Limit
Cohere Rerank 3.5$2.00 / 1,000 searches250 RPM
Amazon Rerank 1.0Standard AWS rate200 RPM
Each search can rank up to 1,000 documents against a query.

Billing

  • Monthly billing to your Heroku account
  • Pay per use — no minimums or commitments
  • View usage in Heroku Dashboard → Resources → Heroku Inference

What’s a Token?

Roughly 4 characters or 3/4 of a word in English.
TextTokens
”Hello, how are you?”~6
100 tokens~75 words
1,000 tokens~750 words

Rate Limits

Request and token limits

Choosing a Model

Model selection guide

Models

Full model details