Skip to main content
POST
/
v1
/
agents
/
heroku
Agents (Heroku)
curl --request POST \
  --url https://us.inference.heroku.com/v1/agents/heroku \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "claude-4-sonnet",
  "messages": [
    {
      "role": "system",
      "content": "<string>"
    }
  ],
  "tools": [
    {}
  ],
  "temperature": 123,
  "top_p": 123
}
'
"<string>"
The /v1/agents/heroku endpoint allows you to interact with an agentic system powered by large language models (LLMs) that can autonomously invoke tools and execute actions on your behalf. The agent uses a loop to reason, plan, and execute tasks using Heroku Tools and MCP tools.
View our available agent models and Heroku Tools to see which models and tools are supported.

Base URL

https://us.inference.heroku.com

Authentication

All requests must include an Authorization header with your Heroku Inference API key:
Authorization: Bearer YOUR_INFERENCE_KEY
You can get your API key from your Heroku app’s INFERENCE_KEY config variable.

Request Parameters

model

string · required Model used for inference, typically the value of your INFERENCE_MODEL_ID config var. Example: "claude-4-5-sonnet", "claude-4-5-haiku"

messages

array · required Array of message objects used by the agent to determine its response and next actions. Supported roles: system, user, assistant, tool
[
  {
    "role": "system",
    "content": "You are a helpful DevOps assistant."
  },
  {
    "role": "user",
    "content": "Check my database schema and restart the web dynos."
  }
]

tools

array · optional List of tools the agent is allowed to use. Heroku automatically executes tool calls via one-off dynos. The /v1/agents/heroku endpoint supports two types of tools:
  • heroku_tool: 1st-party tools that Heroku natively supports
  • mcp: Custom MCP tools you deploy to Heroku
See Heroku Tools for available tools.
type (enum): Type of tool. Options: heroku_tool, mcpname (string): Name of tool (e.g., "code_exec_ruby", "pg_psql")description (string, optional): Hint text to inform the model when to use this toolruntime_params (object): Configuration to control automatic executionRuntime Parameters:
  • target_app_name (string, required): Name of Heroku app to run the tool in
  • dyno_size (string, optional): Dyno size to use when running the tool (default: "standard-1x")
  • ttl_seconds (integer, optional): Max seconds a dyno is allowed to run (max: 120, default: 120)
  • max_calls (integer, optional): Max number of times this tool can be called during the agent loop (default: 3)
  • tool_params (object, optional): Additional parameters for tool (see tool-specific docs)
Example:
{
  "type": "heroku_tool",
  "name": "pg_psql",
  "description": "Runs SQL query on a Heroku database",
  "runtime_params": {
    "target_app_name": "my-heroku-app",
    "dyno_size": "standard-1x",
    "ttl_seconds": 30,
    "max_calls": 2,
    "tool_params": {
      "db_attachment": "DATABASE_URL"
    }
  }
}

temperature

float · optional · default: 1.0 Controls randomness of the response. Range: 0.0 to 1.0
  • Values closer to 0 make responses more focused and deterministic
  • Values closer to 1.0 encourage more creative and diverse responses

top_p

float · optional · default: 0.999 Nucleus sampling threshold. Range: 0 to 1.0. Specifies the cumulative probability of tokens to consider.

max_tokens_per_inference_request

integer · optional Max number of tokens the model can generate during each underlying inference request before stopping. A single call to /v1/agents/heroku can include multiple underlying inference requests.
  • Max value: 4096 for Haiku models
  • Max value: 8192 for Sonnet models

stop

array of strings · optional List of strings that stop the model from generating further tokens if encountered in the response.

Response Format

Agent responses are streamed back over Server-Sent Events (SSE). Each event: message includes a JSON payload representing a completion. The final event is event: done with data [DONE].

Completion Object

Each SSE message contains either a chat.completion or tool.completion object. id (string): Unique ID for agent session object (enum): Type of completion. Options: chat.completion, tool.completion created (integer): Unix timestamp when chunk was created model (string): Model ID used to generate the message choices (array): Array of length 1 containing a single choice object usage (object): Token usage statistics (empty for tool completions)
index (integer): Index of the choice, always 0message (object): Message content with role assistant or tool
  • role (string): Message role
  • content (string): Text content
  • tool_calls (array, optional): Tool calls requested by the model
  • tool_call_id (string, for tool messages): ID of the tool call being responded to
finish_reason (enum): Reason model stopped. Options: stop, length, tool_calls, ""
prompt_tokens (integer): Tokens used in promptcompletion_tokens (integer): Tokens used in responsetotal_tokens (integer): Sum of prompt and completion tokens

Examples

curl --location $INFERENCE_URL/v1/agents/heroku \
  --header 'Content-Type: application/json' \
  --header "Authorization: Bearer $INFERENCE_KEY" \
  --data '{
    "model": "claude-4-sonnet",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful DevOps assistant."
      },
      {
        "role": "user",
        "content": "Run a database query to show all tables in my database."
      }
    ],
    "tools": [
      {
        "type": "heroku_tool",
        "name": "pg_psql",
        "runtime_params": {
          "target_app_name": "my-app",
          "tool_params": {
            "db_attachment": "DATABASE_URL"
          }
        }
      }
    ]
  }'

Response Example

event: message
data: {"id":"chatcmpl-abc123","object":"chat.completion","created":1746546550,"model":"claude-4-sonnet","choices":[{"index":0,"message":{"role":"assistant","content":"I'll query your database to show all tables.","tool_calls":[{"id":"toolu_abc123","type":"function","function":{"name":"pg_psql","arguments":"{\"query\":\"SELECT tablename FROM pg_tables WHERE schemaname='public';\"}"}}]},"finish_reason":"tool_calls"}],"usage":{"prompt_tokens":150,"completion_tokens":45,"total_tokens":195}}

event: message
data: {"id":"chatcmpl-abc123","object":"tool.completion","created":1746546552,"model":"claude-4-sonnet","choices":[{"index":0,"message":{"role":"tool","content":"tablename\n---------\nusers\nproducts\norders","tool_call_id":"toolu_abc123"},"finish_reason":""}],"usage":{}}

event: message
data: {"id":"chatcmpl-abc123","object":"chat.completion","created":1746546553,"model":"claude-4-sonnet","choices":[{"index":0,"message":{"role":"assistant","content":"Your database has 3 tables: users, products, and orders."},"finish_reason":"stop"}],"usage":{"prompt_tokens":180,"completion_tokens":18,"total_tokens":198}}

event: done
data: [DONE]

Message Types

role (string): Always "user"content (string): Contents of user messageExample:
{
  "role": "user",
  "content": "What is the weather?"
}
role (string): Always "assistant"content (string): Contents of assistant messagetool_calls (array, optional): Array of tool call request objectsExample:
{
  "role": "assistant",
  "content": "I'll check that for you.",
  "tool_calls": [
    {
      "id": "toolu_123",
      "type": "function",
      "function": {
        "name": "dyno_run_command",
        "arguments": "{}"
      }
    }
  ]
}
role (string): Always "system"content (string): System prompt to guide the model’s behaviorExample:
{
  "role": "system",
  "content": "You are a helpful assistant. You favor brevity and avoid hedging."
}
role (string): Always "tool"content (string): Output of tool calltool_call_id (string): ID of the tool call this message is responding toExample:
{
  "role": "tool",
  "content": "Command executed successfully.",
  "tool_call_id": "toolu_02F9GXvY5MZAq8Lw3PTNQyJK"
}

Chat Completions

Generate conversational responses without automatic tool execution

Heroku Tools

Available tools for agents

Working with MCP

Create custom MCP tools

Authorizations

Authorization
string
header
required

Bearer token using your INFERENCE_KEY

Body

application/json
model
string
required
Example:

"claude-4-sonnet"

messages
object[]
required
tools
object[]
temperature
number<float>
top_p
number<float>

Response

200 - text/event-stream

Successful response (Server-Sent Events)

The response is of type string.