Heroku AI exposes OpenAI-compatible endpoints, which lets you reuse the official OpenAI SDKs while routing traffic through Heroku’s managed infrastructure. Use the recipes below to bootstrap real applications quickly, then adapt them to your product needs.
All examples call https://us.inference.heroku.com/v1 and rely on a Heroku-managed API key. Provision a model first and export the key as INFERENCE_KEY.
Before you start
- Install the Heroku CLI and log in.
- Provision a model, for example
heroku ai:models:create claude-4-5-sonnet --app my-ai-app.
- Store the resulting key securely (
heroku config:get INFERENCE_KEY --app my-ai-app).
- Set the base URL when instantiating the SDK client (
base_url="https://us.inference.heroku.com/v1").
Recipe: Customer support chat
Provide agents with instant answers sourced from your knowledge base.
Create `app.py`
import os
from flask import Flask, request, jsonify
from openai import OpenAI
client = OpenAI(
base_url="https://us.inference.heroku.com/v1",
api_key=os.environ["INFERENCE_KEY"],
)
app = Flask(__name__)
@app.post("/chat")
def chat():
question = request.json.get("question", "")
response = client.chat.completions.create(
model="claude-4-5-sonnet",
messages=[
{"role": "system", "content": "You are a Heroku support specialist."},
{"role": "user", "content": question},
],
max_tokens=600,
temperature=0.3,
)
return jsonify({"answer": response.choices[0].message.content})
if __name__ == "__main__":
app.run(debug=True)
Deploy to Heroku
heroku create my-ai-support
git push heroku main
heroku config:set INFERENCE_KEY=$(heroku config:get INFERENCE_KEY --app my-ai-app) --app my-ai-support
Add `/app/api/chat/route.ts`
import { NextRequest, NextResponse } from "next/server";
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://us.inference.heroku.com/v1",
apiKey: process.env.INFERENCE_KEY,
});
export async function POST(req: NextRequest) {
const { question } = await req.json();
const completion = await client.chat.completions.create({
model: "claude-4-5-sonnet",
max_tokens: 600,
temperature: 0.3,
messages: [
{ role: "system", content: "You are a Heroku support specialist." },
{ role: "user", content: question },
],
});
return NextResponse.json({
answer: completion.choices[0].message?.content,
});
}
Configure environment
heroku config:set INFERENCE_KEY=$(heroku config:get INFERENCE_KEY --app my-ai-app) --app my-next-app
Recipe: Search + summarize notebook
Blend embeddings and chat completions to turn raw documents into actionable briefs.
Embed your corpus
from openai import OpenAI
import os
client = OpenAI(
base_url="https://us.inference.heroku.com/v1",
api_key=os.environ["INFERENCE_KEY"],
)
documents = [...]
vectors = client.embeddings.create(
model="cohere-embed-multilingual",
input=documents,
)
Index vectors in Postgres + pgvector
CREATE TABLE docs (
id serial PRIMARY KEY,
content text,
embedding vector(1024)
);
Answer questions with retrieval
def summarize(question: str) -> str:
related = fetch_similar_vectors(question, top_k=4)
context = "\n\n".join(doc.text for doc in related)
completion = client.chat.completions.create(
model="claude-4-5-sonnet",
temperature=0.4,
max_tokens=700,
messages=[
{"role": "system", "content": "Summarize internal Heroku docs."},
{"role": "user", "content": f"{question}\n\nContext:\n{context}"},
],
)
return completion.choices[0].message.content
Recipe: Image generator microservice
Expose Stable Image Ultra behind a simple REST interface.
import base64
import os
from io import BytesIO
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from openai import OpenAI
client = OpenAI(
base_url="https://us.inference.heroku.com/v1",
api_key=os.environ["INFERENCE_KEY"],
)
app = FastAPI()
@app.post("/images")
async def images(prompt: str):
result = client.images.generate(
model="stable-image-ultra",
prompt=prompt,
size="1024x1024",
)
data = base64.b64decode(result.data[0].b64_json)
image_bytes = BytesIO(data)
image_bytes.seek(0)
return StreamingResponse(image_bytes, media_type="image/png")
Deploy with a Heroku Container stack or uvicorn buildpack, then protect the endpoint with an API key or Heroku session token.
What to build next
- Add streaming UIs with Server-Sent Events for chat interfaces.
- Swap models dynamically by passing the
model from request payloads.
- Log completions to Heroku Data for Redis for analytics.
- Pair these recipes with the Chat Completions API guide for advanced parameters such as tool calling and structured outputs.