Building performant embedding workflows with Chroma and Baseten
You can now use Chroma, the open-source AI application database, with Baseten's inference platform to create AI-native apps—from agents to RAG pipelines and search backends.
Chroma is unique among vector databases because it is open-source: you can run it locally, self-host it in your cloud provider of choice, or use Chroma’s new hosted cloud offering. This makes Chroma a natural choice for developers building with open models who want control over their entire AI infrastructure stack.
Vector databases like Chroma are powered by embedding models, which reduce inputs to numerical vectors that encode their semantic meaning. Baseten offers dedicated deployments of every open-source, fine-tuned, and custom embedding model (as well as all other generative AI models) on autoscaling infrastructure.
Recently, Baseten announced Baseten Embedding Inference (BEI), the world’s fastest runtime for embedding models. BEI offers twice the throughput of the previous leading solutions for modern LLM-based embedding models.
BEI is useful with Chroma in two ways:
When filling the Chroma vector database with an initial corpus of data, BEI provides substantial speed and cost savings for embedding large corpora.
When passing user queries to the Chroma database, BEI provides low-latency, real-time embedding inference and handles a large number of simultaneous users.
You can use BEI-optimized embedding models deployed on Baseten with Chroma via our official integration.
How to use Chroma with Baseten
You can call an embedding model running on Baseten using the Chroma Python SDK in less than five minutes.
Step 1: Deploy an embedding model on Baseten
If you don’t yet have a Baseten account, you can sign up and you’ll receive $30 in free credits to cover the cost of experimentation.
With your Baseten account, you can quickly deploy BEI-optimized embedding models from Baseten’s model library and prebuilt examples, or package an embedding model from Hugging Face.
One great example model to try is Mixedbread Embed Large V1, a highly efficient model with great performance on the small, inexpensive L4 GPU. For more recommendations, see our list of the best open-source embedding models.
Step 2: Install the Chroma Python package
In your local development environment, make sure you have the latest versions of chromadb
and openai
.
pip install --upgrade chromadb openai
Step 3: Generate embeddings using Baseten and Chroma
Using the BasetenEmbeddingFunction
from the Chroma Python SDK, you can call your deployed model.
To establish a connection with the deployed model, you need:
Your Baseten API key.
Your model inference endpoint, formatted in the code sample below with
xxxxxxxx
replaced with your deployed model ID.
This inference code returns a list of embedding vectors generated by the model.
import os
import chromadb.utils.embedding_functions as embedding_functions
baseten_ef = embedding_functions.BasetenEmbeddingFunction(api_key=os.environ["BASETEN_API_KEY"], api_base="https://model-xxxxxxxx.api.baseten.co/environments/production/sync/v1",)
baseten_ef(input=["This is my first text to embed", "This is my second document"])
Building performant embedding workflows in production
Chroma's open-source flexibility paired with Baseten's high-performance embedding inference (BEI) makes building AI-native apps simpler and faster. If you're interested in learning more about performance optimizations for embedding inference, here's a detailed look at how we built BEI.
When you're ready, you can easily deploy your first embedding model on Baseten to begin building an agent, RAG pipeline, or search interface with Baseten and Chroma.
Subscribe to our newsletter
Stay up to date on model performance, GPUs, and more.