This guide distills prompt optimization practices from the OpenAI Cookbook so you can harden chat-based Heroku AI apps without relying on the Agents API. Adapted references: Optimize Prompts and Evaluation Flywheel.
Structure prompts for reuse
Template anatomy
- System message: role, tone, forbidden behaviors.
- Instructions block: numbered steps to follow (use bullet formatting from the cookbook).
- Reference data: optional context, clearly delimited.
- Output contract: JSON schema or textual requirements to aid parsing.
Keep variants in source control so changes can be reviewed just like code.
Store prompts alongside unit tests so changes must pass automated checks before deployment.
Automated prompt reviews
Checker workflow
- Run prompt text through heuristic checkers (linting length, missing sections).
- Optionally call the chat completions API with a “critic” system message to flag contradictions or formatting gaps (inspired by the cookbook’s multi-agent loop).
- Return actionable feedback for human review.
Keep the critic prompt deterministic (low temperature) and limit output to a structured checklist.
Haiku keeps costs low for CI pipelines; switch to Sonnet for more nuanced critiques.
Evaluation flywheel
Cycle overview
- Collect failing traces from production and label failure modes.
- Measure with LLM graders scoring binary pass/fail outcomes.
- Improve prompts or context, re-run graders, and deploy if scores rise.
Align graders with human expectations using the cookbook’s TPR/TNR approach.
Run graders inside Heroku Scheduler or CI; persist scores to Postgres so you can chart quality over time.