Skip to main content
POST
/
v1
/
embeddings
Embeddings
curl --request POST \
  --url https://us.inference.heroku.com/v1/embeddings \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "cohere-embed-multilingual",
  "input": [
    "What is the capital of France?",
    "Paris is the capital of France."
  ],
  "input_type": "search_document"
}
'
{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 123,
      "embedding": [
        123
      ]
    }
  ],
  "model": "<string>",
  "usage": {
    "prompt_tokens": 123,
    "total_tokens": 123
  }
}
The /v1/embeddings endpoint generates vector embeddings (numerical representations) for a provided set of input texts. These embeddings are optimized for semantic search, classification, clustering, and other machine learning tasks.
OpenAI Compatible: This endpoint is fully compatible with the OpenAI Embeddings API. You can use the OpenAI SDK by pointing it to our base URL.
View our available embedding models to see which models support which features.

Base URL

https://us.inference.heroku.com

Authentication

All requests must include an Authorization header with your Heroku Inference API key:
Authorization: Bearer YOUR_EMBEDDING_KEY
You can get your API key from your Heroku app’s EMBEDDING_KEY config variable (assuming you created the model resource with an --as EMBEDDING flag).

Request Parameters

model

string · required ID of the embedding model to use. Example: "cohere-embed-multilingual"

input

array or string · required Single string or an array of strings for the model to embed.
  • Max: 96 strings
  • Max: 2048 characters each
  • Recommended: Less than 512 tokens per string
["example string 1", "example string 2"]

input_type

enum · optional Specifies the type of input passed to the model. This prepends special tokens to the input for optimal embeddings. Options: search_document, search_query, classification, clustering Example: "search_document" for indexing documents, "search_query" for search queries

encoding_format

enum · optional · default: "raw" Determines the encoding format of the output. Options: raw, base64

embedding_type

enum · optional · default: "float" Specifies the type(s) of embeddings to return. Options: float, int8, uint8, binary, ubinary

allow_ignored_params

boolean · optional · default: false Ignore unsupported parameters in request instead of throwing an error.

Response

object

string Always returns "list".

data

array List of embedding objects generated, one per input string.
Each object in the data array includes:object (string): Type of object, always "embedding"index (integer): Index of the input string this embedding corresponds to (starting from 0)embedding (array or string): The embedding vector of the specified embedding_type

model

string ID of the model that generated the embeddings.

usage

object Token usage statistics.
  • prompt_tokens (integer): Tokens in the input
  • total_tokens (integer): Total tokens used

Examples

eval $(heroku config -a $APP_NAME --shell | grep '^EMBEDDING_' | sed 's/^/export /' | tee >(cat >&2))

curl $EMBEDDING_URL/v1/embeddings \
  -H "Authorization: Bearer $EMBEDDING_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "cohere-embed-multilingual",
    "input": [
      "What is the capital of France?",
      "Paris is the capital of France."
    ],
    "input_type": "search_document"
  }'

Response Example

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.123, -0.456, 0.789, ...]
    },
    {
      "object": "embedding",
      "index": 1,
      "embedding": [0.234, -0.567, 0.890, ...]
    }
  ],
  "model": "cohere-embed-multilingual",
  "usage": {
    "prompt_tokens": 15,
    "total_tokens": 15
  }
}

Use Cases

Use input_type: "search_document" when embedding documents for your search index, and input_type: "search_query" when embedding user queries.

Classification

Use input_type: "classification" when creating embeddings for text classification tasks.

Clustering

Use input_type: "clustering" when grouping similar texts together.

Rerank API

Rerank retrieved documents by semantic relevance for better RAG results

Chat Completions

Generate conversational responses

Image Generation

Create images with AI

Vector Database

Store embeddings in Postgres

Authorizations

Authorization
string
header
required

Bearer token using your INFERENCE_KEY

Body

application/json
model
string
required
Example:

"cohere-embed-multilingual"

input
required

Text to embed

input_type
enum<string>
Available options:
search_document,
search_query,
classification,
clustering
encoding_format
enum<string>
default:raw
Available options:
raw,
base64

Response

200 - application/json

Successful response

object
enum<string>
Available options:
list
data
object[]
model
string
usage
object