Rank documents by semantic relevance to a query for improved RAG and search results
/v1/rerank endpoint ranks a list of documents by their semantic relevance to a given query. This is essential for Retrieval-Augmented Generation (RAG) pipelines, semantic search, and question-answering applications where you need to surface the most relevant content.
Authorization header with your Heroku Inference API key:
RERANK_KEY config variable (assuming you created the model resource with an --as RERANK flag).
| Model | Description | Rate Limit | Availability |
|---|---|---|---|
cohere-rerank-3-5 | Enhanced reasoning with broad data compatibility and multilingual support | 250 RPM | US, EU |
amazon-rerank-1-0 | High-performing reranker backed by AWS | 200 RPM | US, EU |
"cohere-rerank-3-5" or "amazon-rerank-1-0"
"How do I optimize database connection pooling?"
10
Result Object
api_version.version (string): API version numberapi_version.is_experimental (boolean): Whether this API is experimentalbilled_units.search_units (integer): Number of search units consumed for billing| Status Code | Description | Common Causes |
|---|---|---|
| 400 | Bad Request | Missing required fields, documents exceed 1000 limit, invalid JSON |
| 401 | Unauthorized | Missing or invalid authorization token |
| 403 | Forbidden | No access to the requested model |
| 404 | Not Found | Invalid model ID |
| 429 | Too Many Requests | Rate limit exceeded (250 RPM for Cohere, 200 RPM for Amazon) |
| 500 | Internal Server Error | Backend service errors |
Bearer token using your INFERENCE_KEY
ID of the rerank model to use
"cohere-rerank-3-5"
The search query or question to rank documents against
"How do I optimize database connection pooling?"
Array of text documents to rank by relevance to the query
[
"Connection pooling reduces overhead by reusing existing connections.",
"You can monitor application performance using built-in metrics.",
"Set max pool size based on your dyno count and concurrent queries."
]Number of most relevant results to return. If not specified, returns all documents.
3
Successful response
Documents ranked by relevance, highest score first