Embeddings

The /v1/embeddings endpoint generates vector embeddings (numerical representations) for a provided set of input texts. These embeddings are optimized for semantic search, classification, clustering, and other machine learning tasks.

OpenAI Compatible: This endpoint is fully compatible with the OpenAI Embeddings API. You can use the OpenAI SDK by pointing it to our base URL.

View our available embedding models to see which models support which features.

Base URL

https://us.inference.heroku.com

Authentication

All requests must include an Authorization header with your Heroku Inference API key:

Authorization: Bearer YOUR_EMBEDDING_KEY

You can get your API key from your Heroku app’s EMBEDDING_KEY config variable (assuming you created the model resource with an --as EMBEDDING flag).

Request Parameters

model

string · required ID of the embedding model to use. Example: "cohere-embed-multilingual"

input

array or string · required Single string or an array of strings for the model to embed.

Max: 96 strings
Max: 2048 characters each
Recommended: Less than 512 tokens per string

["example string 1", "example string 2"]

input_type

enum · optional Specifies the type of input passed to the model. This prepends special tokens to the input for optimal embeddings. Options: search_document, search_query, classification, clustering Example: "search_document" for indexing documents, "search_query" for search queries

encoding_format

enum · optional · default: "raw" Determines the encoding format of the output. Options: raw, base64

embedding_type

enum · optional · default: "float" Specifies the type(s) of embeddings to return. Options: float, int8, uint8, binary, ubinary

allow_ignored_params

boolean · optional · default: false Ignore unsupported parameters in request instead of throwing an error.

Response

object

string Always returns "list".

data

array List of embedding objects generated, one per input string.

Embedding Object

Each object in the data array includes:object (string): Type of object, always "embedding"index (integer): Index of the input string this embedding corresponds to (starting from 0)embedding (array or string): The embedding vector of the specified embedding_type

model

string ID of the model that generated the embeddings.

usage

object Token usage statistics.

prompt_tokens (integer): Tokens in the input
total_tokens (integer): Total tokens used

Examples

eval $(heroku config -a $APP_NAME --shell | grep '^EMBEDDING_' | sed 's/^/export /' | tee >(cat >&2))

curl $EMBEDDING_URL/v1/embeddings \
  -H "Authorization: Bearer $EMBEDDING_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "cohere-embed-multilingual",
    "input": [
      "What is the capital of France?",
      "Paris is the capital of France."
    ],
    "input_type": "search_document"
  }'

Response Example

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.123, -0.456, 0.789, ...]
    },
    {
      "object": "embedding",
      "index": 1,
      "embedding": [0.234, -0.567, 0.890, ...]
    }
  ],
  "model": "cohere-embed-multilingual",
  "usage": {
    "prompt_tokens": 15,
    "total_tokens": 15
  }
}

Use Cases

Semantic Search

Use input_type: "search_document" when embedding documents for your search index, and input_type: "search_query" when embedding user queries.

Classification

Use input_type: "classification" when creating embeddings for text classification tasks.

Clustering

Use input_type: "clustering" when grouping similar texts together.

Rerank API

Rerank retrieved documents by semantic relevance for better RAG results

Chat Completions

Generate conversational responses

Image Generation

Create images with AI

Vector Database

Store embeddings in Postgres

Authorizations

Authorization

string

header

required

Bearer token using your INFERENCE_KEY

Body

application/json

model

string

required

Example:

"cohere-embed-multilingual"

input

required

Text to embed

input_type

enum<string>

Available options:

search_document,

search_query,

classification,

clustering

encoding_format

enum<string>

default:raw

Available options:

raw,

base64

Response

200 - application/json

Successful response

object

enum<string>

Available options:

list

data

object[]

Show child attributes

model

string

usage

object

Show child attributes

Endpoints

Reference

Base URL

Authentication

Request Parameters

model

input

input_type

encoding_format

embedding_type

allow_ignored_params

Response

object

data

model

usage

Examples

Response Example

Use Cases

Semantic Search

Classification

Clustering

Rerank API

Chat Completions

Image Generation

Vector Database

Authorizations

Body

Response

Endpoints

Reference

​Base URL

​Authentication

​Request Parameters

​model

​input

​input_type

​encoding_format

​embedding_type

​allow_ignored_params

​Response

​object

​data

​model

​usage

​Examples

​Response Example

​Use Cases

​Semantic Search

​Classification

​Clustering

​Related Endpoints

Rerank API

Chat Completions

Image Generation

Vector Database

Authorizations

Body

Response

Base URL

Authentication

Request Parameters

model

input

input_type

encoding_format

embedding_type

allow_ignored_params

Response

object

data

model

usage

Examples

Response Example

Use Cases

Semantic Search

Classification

Clustering

Related Endpoints