/embeddings
Quick Startβ
from litellm import embedding
import os
os.environ['OPENAI_API_KEY'] = ""
response = embedding(model='text-embedding-ada-002', input=["good morning from litellm"])
Proxy Usageβ
NOTE
For vertex_ai
,
export GOOGLE_APPLICATION_CREDENTIALS="absolute/path/to/service_account.json"
Add model to configβ
model_list:
- model_name: textembedding-gecko
litellm_params:
model: vertex_ai/textembedding-gecko
general_settings:
master_key: sk-1234
Start proxyβ
litellm --config /path/to/config.yaml
# RUNNING on http://0.0.0.0:4000
Testβ
- Curl
- OpenAI (python)
- Langchain Embeddings
curl --location 'http://0.0.0.0:4000/embeddings' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{"input": ["Academia.edu uses"], "model": "textembedding-gecko", "encoding_format": "base64"}'
from openai import OpenAI
client = OpenAI(
api_key="sk-1234",
base_url="http://0.0.0.0:4000"
)
client.embeddings.create(
model="textembedding-gecko",
input="The food was delicious and the waiter...",
encoding_format="float"
)
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(model="textembedding-gecko", openai_api_base="http://0.0.0.0:4000", openai_api_key="sk-1234")
text = "This is a test document."
query_result = embeddings.embed_query(text)
print(f"VERTEX AI EMBEDDINGS")
print(query_result[:5])
Image Embeddingsβ
For models that support image embeddings, you can pass in a base64 encoded image string to the input
param.
- SDK
- PROXY
from litellm import embedding
import os
# set your api key
os.environ["COHERE_API_KEY"] = ""
response = embedding(model="cohere/embed-english-v3.0", input=["<base64 encoded image>"])
- Setup config.yaml
model_list:
- model_name: cohere-embed
litellm_params:
model: cohere/embed-english-v3.0
api_key: os.environ/COHERE_API_KEY
- Start proxy
litellm --config /path/to/config.yaml
# RUNNING on http://0.0.0.0:4000
- Test it!
curl -X POST 'http://0.0.0.0:4000/v1/embeddings' \
-H 'Authorization: Bearer sk-54d77cd67b9febbb' \
-H 'Content-Type: application/json' \
-d '{
"model": "cohere/embed-english-v3.0",
"input": ["<base64 encoded image>"]
}'
Input Params for litellm.embedding()
β
Any non-openai params, will be treated as provider-specific params, and sent in the request body as kwargs to the provider.
Required Fieldsβ
model
: string - ID of the model to use.model='text-embedding-ada-002'
input
: string or array - Input text to embed, encoded as a string or array of tokens. To embed multiple inputs in a single request, pass an array of strings or array of token arrays. The input must not exceed the max input tokens for the model (8192 tokens for text-embedding-ada-002), cannot be an empty string, and any array must be 2048 dimensions or less.
input=["good morning from litellm"]
Optional LiteLLM Fieldsβ
user
: string (optional) A unique identifier representing your end-user,dimensions
: integer (Optional) The number of dimensions the resulting output embeddings should have. Only supported in OpenAI/Azure text-embedding-3 and later models.encoding_format
: string (Optional) The format to return the embeddings in. Can be either"float"
or"base64"
. Defaults toencoding_format="float"
timeout
: integer (Optional) - The maximum time, in seconds, to wait for the API to respond. Defaults to 600 seconds (10 minutes).api_base
: string (optional) - The api endpoint you want to call the model withapi_version
: string (optional) - (Azure-specific) the api version for the callapi_key
: string (optional) - The API key to authenticate and authorize requests. If not provided, the default API key is used.api_type
: string (optional) - The type of API to use.
Output from litellm.embedding()
β
{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [
-0.0022326677571982145,
0.010749882087111473,
...
...
...
]
}
],
"model": "text-embedding-ada-002-v2",
"usage": {
"prompt_tokens": 10,
"total_tokens": 10
}
}
OpenAI Embedding Modelsβ
Usageβ
from litellm import embedding
import os
os.environ['OPENAI_API_KEY'] = ""
response = embedding(
model="text-embedding-3-small",
input=["good morning from litellm", "this is another item"],
metadata={"anything": "good day"},
dimensions=5 # Only supported in text-embedding-3 and later models.
)
Model Name | Function Call | Required OS Variables |
---|---|---|
text-embedding-3-small | embedding('text-embedding-3-small', input) | os.environ['OPENAI_API_KEY'] |
text-embedding-3-large | embedding('text-embedding-3-large', input) | os.environ['OPENAI_API_KEY'] |
text-embedding-ada-002 | embedding('text-embedding-ada-002', input) | os.environ['OPENAI_API_KEY'] |
OpenAI Compatible Embedding Modelsβ
Use this for calling /embedding
endpoints on OpenAI Compatible Servers, example https://github.com/xorbitsai/inference
Note add openai/
prefix to model so litellm knows to route to OpenAI
Usageβ
from litellm import embedding
response = embedding(
model = "openai/<your-llm-name>", # add `openai/` prefix to model so litellm knows to route to OpenAI
api_base="http://0.0.0.0:4000/" # set API Base of your Custom OpenAI Endpoint
input=["good morning from litellm"]
)
Bedrock Embeddingβ
API keysβ
This can be set as env variables or passed as params to litellm.embedding()
import os
os.environ["AWS_ACCESS_KEY_ID"] = "" # Access key
os.environ["AWS_SECRET_ACCESS_KEY"] = "" # Secret access key
os.environ["AWS_REGION_NAME"] = "" # us-east-1, us-east-2, us-west-1, us-west-2
Usageβ
from litellm import embedding
response = embedding(
model="amazon.titan-embed-text-v1",
input=["good morning from litellm"],
)
print(response)
Model Name | Function Call |
---|---|
Titan Embeddings - G1 | embedding(model="amazon.titan-embed-text-v1", input=input) |
Cohere Embeddings - English | embedding(model="cohere.embed-english-v3", input=input) |
Cohere Embeddings - Multilingual | embedding(model="cohere.embed-multilingual-v3", input=input) |
Cohere Embedding Modelsβ
https://docs.cohere.com/reference/embed
Usageβ
from litellm import embedding
os.environ["COHERE_API_KEY"] = "cohere key"
# cohere call
response = embedding(
model="embed-english-v3.0",
input=["good morning from litellm", "this is another item"],
input_type="search_document" # optional param for v3 llms
)
Model Name | Function Call |
---|---|
embed-english-v3.0 | embedding(model="embed-english-v3.0", input=["good morning from litellm", "this is another item"]) |
embed-english-light-v3.0 | embedding(model="embed-english-light-v3.0", input=["good morning from litellm", "this is another item"]) |
embed-multilingual-v3.0 | embedding(model="embed-multilingual-v3.0", input=["good morning from litellm", "this is another item"]) |
embed-multilingual-light-v3.0 | embedding(model="embed-multilingual-light-v3.0", input=["good morning from litellm", "this is another item"]) |
embed-english-v2.0 | embedding(model="embed-english-v2.0", input=["good morning from litellm", "this is another item"]) |
embed-english-light-v2.0 | embedding(model="embed-english-light-v2.0", input=["good morning from litellm", "this is another item"]) |
embed-multilingual-v2.0 | embedding(model="embed-multilingual-v2.0", input=["good morning from litellm", "this is another item"]) |
NVIDIA NIM Embedding Modelsβ
API keysβ
This can be set as env variables or passed as params to litellm.embedding()
import os
os.environ["NVIDIA_NIM_API_KEY"] = "" # api key
os.environ["NVIDIA_NIM_API_BASE"] = "" # nim endpoint url
Usageβ
from litellm import embedding
import os
os.environ['NVIDIA_NIM_API_KEY'] = ""
response = embedding(
model='nvidia_nim/<model_name>',
input=["good morning from litellm"],
input_type="query"
)
input_type
Parameter for Embedding Modelsβ
Certain embedding models, such as nvidia/embed-qa-4
and the E5 family, operate in dual modesβone for indexing documents (passages) and another for querying. To maintain high retrieval accuracy, it's essential to specify how the input text is being used by setting the input_type
parameter correctly.
Usageβ
Set the input_type
parameter to one of the following values:
"passage"
β for embedding content during indexing (e.g., documents)."query"
β for embedding content during retrieval (e.g., user queries).
Warning: Incorrect usage of
input_type
can lead to a significant drop in retrieval performance.
All models listed here are supported:
Model Name | Function Call |
---|---|
NV-Embed-QA | embedding(model="nvidia_nim/NV-Embed-QA", input) |
nvidia/nv-embed-v1 | embedding(model="nvidia_nim/nvidia/nv-embed-v1", input) |
nvidia/nv-embedqa-mistral-7b-v2 | embedding(model="nvidia_nim/nvidia/nv-embedqa-mistral-7b-v2", input) |
nvidia/nv-embedqa-e5-v5 | embedding(model="nvidia_nim/nvidia/nv-embedqa-e5-v5", input) |
nvidia/embed-qa-4 | embedding(model="nvidia_nim/nvidia/embed-qa-4", input) |
nvidia/llama-3.2-nv-embedqa-1b-v1 | embedding(model="nvidia_nim/nvidia/llama-3.2-nv-embedqa-1b-v1", input) |
nvidia/llama-3.2-nv-embedqa-1b-v2 | embedding(model="nvidia_nim/nvidia/llama-3.2-nv-embedqa-1b-v2", input) |
snowflake/arctic-embed-l | embedding(model="nvidia_nim/snowflake/arctic-embed-l", input) |
baai/bge-m3 | embedding(model="nvidia_nim/baai/bge-m3", input) |
HuggingFace Embedding Modelsβ
LiteLLM supports all Feature-Extraction + Sentence Similarity Embedding models: https://huggingface.co/models?pipeline_tag=feature-extraction
Usageβ
from litellm import embedding
import os
os.environ['HUGGINGFACE_API_KEY'] = ""
response = embedding(
model='huggingface/microsoft/codebert-base',
input=["good morning from litellm"]
)
Usage - Set input_typeβ
LiteLLM infers input type (feature-extraction or sentence-similarity) by making a GET request to the api base.
Override this, by setting the input_type
yourself.
from litellm import embedding
import os
os.environ['HUGGINGFACE_API_KEY'] = ""
response = embedding(
model='huggingface/microsoft/codebert-base',
input=["good morning from litellm", "you are a good bot"],
api_base = "https://p69xlsj6rpno5drq.us-east-1.aws.endpoints.huggingface.cloud",
input_type="sentence-similarity"
)
Usage - Custom API Baseβ
from litellm import embedding
import os
os.environ['HUGGINGFACE_API_KEY'] = ""
response = embedding(
model='huggingface/microsoft/codebert-base',
input=["good morning from litellm"],
api_base = "https://p69xlsj6rpno5drq.us-east-1.aws.endpoints.huggingface.cloud"
)
Model Name | Function Call | Required OS Variables |
---|---|---|
microsoft/codebert-base | embedding('huggingface/microsoft/codebert-base', input=input) | os.environ['HUGGINGFACE_API_KEY'] |
BAAI/bge-large-zh | embedding('huggingface/BAAI/bge-large-zh', input=input) | os.environ['HUGGINGFACE_API_KEY'] |
any-hf-embedding-model | embedding('huggingface/hf-embedding-model', input=input) | os.environ['HUGGINGFACE_API_KEY'] |
Mistral AI Embedding Modelsβ
All models listed here https://docs.mistral.ai/platform/endpoints are supported
Usageβ
from litellm import embedding
import os
os.environ['MISTRAL_API_KEY'] = ""
response = embedding(
model="mistral/mistral-embed",
input=["good morning from litellm"],
)
print(response)
Model Name | Function Call |
---|---|
mistral-embed | embedding(model="mistral/mistral-embed", input) |
Gemini AI Embedding Modelsβ
API keysβ
This can be set as env variables or passed as params to litellm.embedding()
import os
os.environ["GEMINI_API_KEY"] = ""
Usage - Embeddingβ
from litellm import embedding
response = embedding(
model="gemini/text-embedding-004",
input=["good morning from litellm"],
)
print(response)
All models listed here are supported:
Model Name | Function Call |
---|---|
text-embedding-004 | embedding(model="gemini/text-embedding-004", input) |
Vertex AI Embedding Modelsβ
Usage - Embeddingβ
import litellm
from litellm import embedding
litellm.vertex_project = "hardy-device-38811" # Your Project ID
litellm.vertex_location = "us-central1" # proj location
response = embedding(
model="vertex_ai/textembedding-gecko",
input=["good morning from litellm"],
)
print(response)
Supported Modelsβ
All models listed here are supported
Model Name | Function Call |
---|---|
textembedding-gecko | embedding(model="vertex_ai/textembedding-gecko", input) |
textembedding-gecko-multilingual | embedding(model="vertex_ai/textembedding-gecko-multilingual", input) |
textembedding-gecko-multilingual@001 | embedding(model="vertex_ai/textembedding-gecko-multilingual@001", input) |
textembedding-gecko@001 | embedding(model="vertex_ai/textembedding-gecko@001", input) |
textembedding-gecko@003 | embedding(model="vertex_ai/textembedding-gecko@003", input) |
text-embedding-preview-0409 | embedding(model="vertex_ai/text-embedding-preview-0409", input) |
text-multilingual-embedding-preview-0409 | embedding(model="vertex_ai/text-multilingual-embedding-preview-0409", input) |
Voyage AI Embedding Modelsβ
Usage - Embeddingβ
from litellm import embedding
import os
os.environ['VOYAGE_API_KEY'] = ""
response = embedding(
model="voyage/voyage-01",
input=["good morning from litellm"],
)
print(response)
Supported Modelsβ
All models listed here https://docs.voyageai.com/embeddings/#models-and-specifics are supported
Model Name | Function Call |
---|---|
voyage-01 | embedding(model="voyage/voyage-01", input) |
voyage-lite-01 | embedding(model="voyage/voyage-lite-01", input) |
voyage-lite-01-instruct | embedding(model="voyage/voyage-lite-01-instruct", input) |
Provider-specific Paramsβ
Any non-openai params, will be treated as provider-specific params, and sent in the request body as kwargs to the provider.
Exampleβ
Cohere v3 Models have a required parameter: input_type
, it can be one of the following four values:
input_type="search_document"
: (default) Use this for texts (documents) you want to store in your vector databaseinput_type="search_query"
: Use this for search queries to find the most relevant documents in your vector databaseinput_type="classification"
: Use this if you use the embeddings as an input for a classification systeminput_type="clustering"
: Use this if you use the embeddings for text clustering
https://txt.cohere.com/introducing-embed-v3/
- SDK
- PROXY
from litellm import embedding
os.environ["COHERE_API_KEY"] = "cohere key"
# cohere call
response = embedding(
model="embed-english-v3.0",
input=["good morning from litellm", "this is another item"],
input_type="search_document" # π PROVIDER-SPECIFIC PARAM
)
via config
model_list:
- model_name: "cohere-embed"
litellm_params:
model: embed-english-v3.0
input_type: search_document # π PROVIDER-SPECIFIC PARAM
via request
curl -X POST 'http://0.0.0.0:4000/v1/embeddings' \
-H 'Authorization: Bearer sk-54d77cd67b9febbb' \
-H 'Content-Type: application/json' \
-d '{
"model": "cohere-embed",
"input": ["Are you authorized to work in United States of America?"],
"input_type": "search_document" # π PROVIDER-SPECIFIC PARAM
}'
Nebius AI Studio Embedding Modelsβ
Usage - Embeddingβ
from litellm import embedding
import os
os.environ['NEBIUS_API_KEY'] = ""
response = embedding(
model="nebius/BAAI/bge-en-icl",
input=["Good morning from litellm!"],
)
print(response)
Supported Modelsβ
All supported models can be found here: https://studio.nebius.ai/models/embedding
Model Name | Function Call |
---|---|
BAAI/bge-en-icl | embedding(model="nebius/BAAI/bge-en-icl", input) |
BAAI/bge-multilingual-gemma2 | embedding(model="nebius/BAAI/bge-multilingual-gemma2", input) |
intfloat/e5-mistral-7b-instruct | embedding(model="nebius/intfloat/e5-mistral-7b-instruct", input) |