Agents (Heroku)

The /v1/agents/heroku endpoint allows you to interact with an agentic system powered by large language models (LLMs) that can autonomously invoke tools and execute actions on your behalf. The agent uses a loop to reason, plan, and execute tasks using Heroku Tools and MCP tools.

View our available agent models and Heroku Tools to see which models and tools are supported.

Base URL

https://us.inference.heroku.com

Authentication

All requests must include an Authorization header with your Heroku Inference API key:

Authorization: Bearer YOUR_INFERENCE_KEY

You can get your API key from your Heroku app’s INFERENCE_KEY config variable.

Request Parameters

model

string · required Model used for inference, typically the value of your INFERENCE_MODEL_ID config var. Example: "claude-4-5-sonnet", "claude-4-5-haiku"

messages

array · required Array of message objects used by the agent to determine its response and next actions. Supported roles: system, user, assistant, tool

[
  {
    "role": "system",
    "content": "You are a helpful DevOps assistant."
  },
  {
    "role": "user",
    "content": "Check my database schema and restart the web dynos."
  }
]

tools

array · optional List of tools the agent is allowed to use. Heroku automatically executes tool calls via one-off dynos. The /v1/agents/heroku endpoint supports two types of tools:

heroku_tool: 1st-party tools that Heroku natively supports
mcp: Custom MCP tools you deploy to Heroku

See Heroku Tools for available tools.

Tool Object Structure

type (enum): Type of tool. Options: heroku_tool, mcpname (string): Name of tool (e.g., "code_exec_ruby", "pg_psql")description (string, optional): Hint text to inform the model when to use this toolruntime_params (object): Configuration to control automatic executionRuntime Parameters:

target_app_name (string, required): Name of Heroku app to run the tool in
dyno_size (string, optional): Dyno size to use when running the tool (default: "standard-1x")
ttl_seconds (integer, optional): Max seconds a dyno is allowed to run (max: 120, default: 120)
max_calls (integer, optional): Max number of times this tool can be called during the agent loop (default: 3)
tool_params (object, optional): Additional parameters for tool (see tool-specific docs)

Example:

{
  "type": "heroku_tool",
  "name": "pg_psql",
  "description": "Runs SQL query on a Heroku database",
  "runtime_params": {
    "target_app_name": "my-heroku-app",
    "dyno_size": "standard-1x",
    "ttl_seconds": 30,
    "max_calls": 2,
    "tool_params": {
      "db_attachment": "DATABASE_URL"
    }
  }
}

temperature

float · optional · default: 1.0 Controls randomness of the response. Range: 0.0 to 1.0

Values closer to 0 make responses more focused and deterministic
Values closer to 1.0 encourage more creative and diverse responses

top_p

float · optional · default: 0.999 Nucleus sampling threshold. Range: 0 to 1.0. Specifies the cumulative probability of tokens to consider.

max_tokens_per_inference_request

integer · optional Max number of tokens the model can generate during each underlying inference request before stopping. A single call to /v1/agents/heroku can include multiple underlying inference requests.

Max value: 4096 for Haiku models
Max value: 8192 for Sonnet models

stop

array of strings · optional List of strings that stop the model from generating further tokens if encountered in the response.

Response Format

Agent responses are streamed back over Server-Sent Events (SSE). Each event: message includes a JSON payload representing a completion. The final event is event: done with data [DONE].

Completion Object

Each SSE message contains either a chat.completion or tool.completion object. id (string): Unique ID for agent session object (enum): Type of completion. Options: chat.completion, tool.completion created (integer): Unix timestamp when chunk was created model (string): Model ID used to generate the message choices (array): Array of length 1 containing a single choice object usage (object): Token usage statistics (empty for tool completions)

Choice Object

index (integer): Index of the choice, always 0message (object): Message content with role assistant or tool

role (string): Message role
content (string): Text content
tool_calls (array, optional): Tool calls requested by the model
tool_call_id (string, for tool messages): ID of the tool call being responded to

finish_reason (enum): Reason model stopped. Options: stop, length, tool_calls, ""

Usage Object

prompt_tokens (integer): Tokens used in promptcompletion_tokens (integer): Tokens used in responsetotal_tokens (integer): Sum of prompt and completion tokens

Examples

curl --location $INFERENCE_URL/v1/agents/heroku \
  --header 'Content-Type: application/json' \
  --header "Authorization: Bearer $INFERENCE_KEY" \
  --data '{
    "model": "claude-4-sonnet",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful DevOps assistant."
      },
      {
        "role": "user",
        "content": "Run a database query to show all tables in my database."
      }
    ],
    "tools": [
      {
        "type": "heroku_tool",
        "name": "pg_psql",
        "runtime_params": {
          "target_app_name": "my-app",
          "tool_params": {
            "db_attachment": "DATABASE_URL"
          }
        }
      }
    ]
  }'

Response Example

event: message
data: {"id":"chatcmpl-abc123","object":"chat.completion","created":1746546550,"model":"claude-4-sonnet","choices":[{"index":0,"message":{"role":"assistant","content":"I'll query your database to show all tables.","tool_calls":[{"id":"toolu_abc123","type":"function","function":{"name":"pg_psql","arguments":"{\"query\":\"SELECT tablename FROM pg_tables WHERE schemaname='public';\"}"}}]},"finish_reason":"tool_calls"}],"usage":{"prompt_tokens":150,"completion_tokens":45,"total_tokens":195}}

event: message
data: {"id":"chatcmpl-abc123","object":"tool.completion","created":1746546552,"model":"claude-4-sonnet","choices":[{"index":0,"message":{"role":"tool","content":"tablename\n---------\nusers\nproducts\norders","tool_call_id":"toolu_abc123"},"finish_reason":""}],"usage":{}}

event: message
data: {"id":"chatcmpl-abc123","object":"chat.completion","created":1746546553,"model":"claude-4-sonnet","choices":[{"index":0,"message":{"role":"assistant","content":"Your database has 3 tables: users, products, and orders."},"finish_reason":"stop"}],"usage":{"prompt_tokens":180,"completion_tokens":18,"total_tokens":198}}

event: done
data: [DONE]

Message Types

User Message

role (string): Always "user"content (string): Contents of user messageExample:

{
  "role": "user",
  "content": "What is the weather?"
}

Assistant Message

role (string): Always "assistant"content (string): Contents of assistant messagetool_calls (array, optional): Array of tool call request objectsExample:

{
  "role": "assistant",
  "content": "I'll check that for you.",
  "tool_calls": [
    {
      "id": "toolu_123",
      "type": "function",
      "function": {
        "name": "dyno_run_command",
        "arguments": "{}"
      }
    }
  ]
}

System Message

role (string): Always "system"content (string): System prompt to guide the model’s behaviorExample:

{
  "role": "system",
  "content": "You are a helpful assistant. You favor brevity and avoid hedging."
}

Tool Message

role (string): Always "tool"content (string): Output of tool calltool_call_id (string): ID of the tool call this message is responding toExample:

{
  "role": "tool",
  "content": "Command executed successfully.",
  "tool_call_id": "toolu_02F9GXvY5MZAq8Lw3PTNQyJK"
}

Chat Completions

Generate conversational responses without automatic tool execution

Heroku Tools

Available tools for agents

Working with MCP

Create custom MCP tools

Authorizations

Authorization

string

header

required

Bearer token using your INFERENCE_KEY

Body

application/json

model

string

required

Example:

"claude-4-sonnet"

messages

object[]

required

Show child attributes

tools

object[]

temperature

number<float>

top_p

number<float>

Response

200 - text/event-stream

Successful response (Server-Sent Events)

The response is of type string.

Get started

Core concepts

Agents

Tools

Evaluation

Integrations

Reference

Cookbook

Base URL

Authentication

Request Parameters

model

messages

tools

temperature

top_p

max_tokens_per_inference_request

stop

Response Format

Completion Object

Examples

Response Example

Message Types

Chat Completions

Heroku Tools

Working with MCP

Authorizations

Body

Response

Get started

Core concepts

Agents

Tools

Evaluation

Integrations

Reference

Cookbook

​Base URL

​Authentication

​Request Parameters

​model

​messages

​tools

​temperature

​top_p

​max_tokens_per_inference_request

​stop

​Response Format

​Completion Object

​Examples

​Response Example

​Message Types

​Related Endpoints

Chat Completions

Heroku Tools

Working with MCP

Authorizations

Body

Response

Base URL

Authentication

Request Parameters

model

messages

tools

temperature

top_p

max_tokens_per_inference_request

stop

Response Format

Completion Object

Examples

Response Example

Message Types

Related Endpoints