Native Anthropic Messages API endpoint for Claude models. Use the Anthropic SDK with Heroku AI credentials.
/v1/messages endpoint provides native Anthropic API compatibility for Claude models. If you’re already using the Anthropic SDK, you can switch to Heroku AI by changing the base URL and API key—no other code changes required.
Authorization: Bearer, the Messages API uses Anthropic’s authentication pattern:
INFERENCE_KEY config variable.
INFERENCE_MODEL_ID config var value.
Example: "claude-4-5-sonnet", "claude-4-5-haiku", "claude-opus-4-5"
role (user or assistant) and content.
user and assistant roles only. System prompts are passed separately via the system parameter.1.0
Controls randomness. Range: 0.0 to 1.0.
0.0 to 1.0.
false
Enable streaming responses via server-sent events.
user_id for tracking.
Thinking Object Structure
type (string): Set to "enabled" to activate extended thinkingbudget_tokens (integer): Token budget for reasoning (minimum 1024)Tool Object Structure
{"type": "auto"} - Model decides whether to use tools{"type": "any"} - Model must use at least one tool{"type": "tool", "name": "tool_name"} - Model must use the specified tool"message".
"assistant".
Content Block Types
type (string): "text"text (string): The generated texttype (string): "tool_use"id (string): Unique tool use IDname (string): Name of the tool to callinput (object): Arguments for the tooltype (string): "thinking"thinking (string): The model’s reasoning process"end_turn" - Natural stopping point"max_tokens" - Reached max tokens"stop_sequence" - Hit a stop sequence"tool_use" - Made a tool callinput_tokens (integer): Tokens in the inputoutput_tokens (integer): Tokens in the outputcache_control blocks. This differs from Chat Completions, which uses a header to enable/disable automatic caching.
cache_control to cacheable content blocks:
cache_control to system content blockscache_control to message content blockscache_control to tool definitions| Use Messages API when… | Use Chat Completions when… |
|---|---|
| You have existing Anthropic SDK code | You need OpenAI SDK compatibility |
You want native cache_control for prompt caching | You prefer header-based caching control |
You’re using Anthropic-specific features like thinking | You’re using non-Anthropic models |
| You want to use the Anthropic Agent SDK (coming soon) | You need multi-provider support |
API key using your INFERENCE_KEY (Anthropic-style authentication)
Claude model ID to use
"claude-4-sonnet"
Maximum tokens to generate (required)
1024
Array of message objects with user/assistant roles
System prompt (string or array of content blocks for caching)
Sampling temperature
0 <= x <= 1Nucleus sampling threshold
0 <= x <= 1Only sample from top K options
Custom stop sequences
Stream responses via SSE
Request metadata for tracking
Extended thinking configuration (Claude 3.7/4 Sonnet only)
Tools the model may use
Controls how model uses tools
Successful response
Unique message identifier
Object type
message Always assistant
assistant Array of content blocks
Model that generated the response
Why generation stopped
end_turn, max_tokens, stop_sequence, tool_use Stop sequence that was matched, if any