Skip to main content
RubyLLM is a unified Ruby API for interacting with multiple AI providers including OpenAI, Anthropic, Gemini, and more. Since Heroku AI exposes OpenAI-compatible endpoints, you can use RubyLLM to call Claude models on Heroku AI with a consistent, idiomatic Ruby interface.

Installation and Setup

Install the RubyLLM gem:
gem install ruby_llm
Or add to your Gemfile:
gem 'ruby_llm'
Set up your Heroku AI credentials as environment variables:
export INFERENCE_KEY=$(heroku config:get INFERENCE_KEY -a your-app)
export INFERENCE_URL="https://us.inference.heroku.com/v1"
export INFERENCE_MODEL_ID=$(heroku config:get INFERENCE_MODEL_ID -a your-app)

Configure RubyLLM for Heroku AI

RubyLLM supports OpenAI-compatible endpoints through its OpenAI provider. Configure it to point to Heroku AI:
require 'ruby_llm'

RubyLLM.configure do |config|
  config.openai_api_key = ENV['INFERENCE_KEY']
  config.openai_base_url = ENV['INFERENCE_URL']
end

Basic Chat Completion

Use RubyLLM’s chat interface with Heroku AI models:
chat = RubyLLM.chat(
  provider: :openai,
  model: ENV['INFERENCE_MODEL_ID'] || 'claude-4-5-sonnet'
)

response = chat.ask "What are the benefits of deploying AI apps on Heroku?"
puts response.content

Streaming Responses

Stream responses for faster perceived latency:
chat = RubyLLM.chat(
  provider: :openai,
  model: 'claude-4-5-haiku'
)

chat.ask "Explain Heroku's dyno architecture" do |chunk|
  print chunk.content
end

Multi-turn Conversations

RubyLLM maintains conversation context automatically:
chat = RubyLLM.chat(
  provider: :openai,
  model: 'claude-4-5-sonnet'
)

# First turn
chat.ask "I'm building a Rails API that needs to classify customer support tickets."

# Follow-up question - context is preserved
response = chat.ask "What Heroku AI model would you recommend for this?"
puts response.content

Using with Rails

Configuration

Create an initializer at config/initializers/ruby_llm.rb:
RubyLLM.configure do |config|
  config.openai_api_key = ENV['INFERENCE_KEY']
  config.openai_base_url = ENV['INFERENCE_URL']
end

Service Object Example

Create a service to encapsulate AI interactions:
# app/services/ai_assistant_service.rb
class AiAssistantService
  def initialize(model: ENV['INFERENCE_MODEL_ID'])
    @chat = RubyLLM.chat(provider: :openai, model: model)
  end

  def answer(question)
    @chat.ask(question).content
  rescue StandardError => e
    Rails.logger.error("AI request failed: #{e.message}")
    "I'm sorry, I couldn't process that request."
  end
end

Controller Usage

class QuestionsController < ApplicationController
  def create
    assistant = AiAssistantService.new
    answer = assistant.answer(params[:question])

    render json: { answer: answer }
  end
end

Structured Output

Use RubyLLM’s structured output feature for predictable responses:
chat = RubyLLM.chat(
  provider: :openai,
  model: 'claude-4-5-sonnet',
  response_format: { type: 'json_object' }
)

response = chat.ask(<<~PROMPT)
  Analyze this support ticket and return JSON with keys: category, priority, suggested_response.

  Ticket: "My app keeps crashing when I scale to more than 5 dynos."
PROMPT

data = JSON.parse(response.content)
puts "Category: #{data['category']}"
puts "Priority: #{data['priority']}"

Advanced Configuration

Temperature and Token Control

chat = RubyLLM.chat(
  provider: :openai,
  model: 'claude-4-5-haiku',
  temperature: 0.7,
  max_tokens: 1000
)

System Prompts

Set a system message to guide model behavior:
chat = RubyLLM.chat(
  provider: :openai,
  model: 'claude-4-5-sonnet',
  system: "You are a Heroku expert. Answer questions concisely and cite docs when helpful."
)

Available Models

RubyLLM works with any Heroku AI chat model. Popular options include:
  • claude-4-5-sonnet - Most capable, best for complex reasoning
  • claude-4-5-haiku - Fast and efficient, great for production workloads
  • claude-4-sonnet - Latest model with extended context
See the complete list in Heroku AI Model Cards.

Error Handling

Wrap RubyLLM calls in proper error handling:
begin
  chat = RubyLLM.chat(provider: :openai, model: 'claude-4-5-sonnet')
  response = chat.ask("Your question here")
rescue RubyLLM::AuthenticationError
  puts "Invalid API key. Check your INFERENCE_KEY environment variable."
rescue RubyLLM::RateLimitError
  puts "Rate limit exceeded. Try again in a moment."
rescue RubyLLM::Error => e
  puts "Request failed: #{e.message}"
end

Additional Resources