Building performant embedding workflows with Chroma and Baseten

You can now use Chroma, the open-source AI application database, with Baseten's inference platform to create AI-native apps—from agents to RAG pipelines and search backends.

Chroma is unique among vector databases because it is open-source: you can run it locally, self-host it in your cloud provider of choice, or use Chroma’s new hosted cloud offering. This makes Chroma a natural choice for developers building with open models who want control over their entire AI infrastructure stack.

Vector databases like Chroma are powered by embedding models, which reduce inputs to numerical vectors that encode their semantic meaning. Baseten offers dedicated deployments of every open-source, fine-tuned, and custom embedding model (as well as all other generative AI models) on autoscaling infrastructure.

Recently, Baseten announced Baseten Embedding Inference (BEI), the world’s fastest runtime for embedding models. BEI offers twice the throughput of the previous leading solutions for modern LLM-based embedding models.

BEI outperforms the next-best embedding inference engine by up to 2.05x

BEI is useful with Chroma in two ways:

  1. When filling the Chroma vector database with an initial corpus of data, BEI provides substantial speed and cost savings for embedding large corpora.

  2. When passing user queries to the Chroma database, BEI provides low-latency, real-time embedding inference and handles a large number of simultaneous users.

You can use BEI-optimized embedding models deployed on Baseten with Chroma via our official integration.

How to use Chroma with Baseten

You can call an embedding model running on Baseten using the Chroma Python SDK in less than five minutes.

Step 1: Deploy an embedding model on Baseten

If you don’t yet have a Baseten account, you can sign up and you’ll receive $30 in free credits to cover the cost of experimentation.

With your Baseten account, you can quickly deploy BEI-optimized embedding models from Baseten’s model library and prebuilt examples, or package an embedding model from Hugging Face.

One great example model to try is Mixedbread Embed Large V1, a highly efficient model with great performance on the small, inexpensive L4 GPU. For more recommendations, see our list of the best open-source embedding models.

The best embedding model under 1B parameters: Mixedbread Embed Large V1

Step 2: Install the Chroma Python package

In your local development environment, make sure you have the latest versions of chromadb and openai.

pip install --upgrade chromadb openai

Step 3: Generate embeddings using Baseten and Chroma

Using the BasetenEmbeddingFunction from the Chroma Python SDK, you can call your deployed model. 

To establish a connection with the deployed model, you need:

  1. Your Baseten API key.

  2. Your model inference endpoint, formatted in the code sample below with xxxxxxxx replaced with your deployed model ID.

This inference code returns a list of embedding vectors generated by the model.

import os
import chromadb.utils.embedding_functions as embedding_functions

baseten_ef = embedding_functions.BasetenEmbeddingFunction(api_key=os.environ["BASETEN_API_KEY"], api_base="https://model-xxxxxxxx.api.baseten.co/environments/production/sync/v1",)

baseten_ef(input=["This is my first text to embed", "This is my second document"])

Building performant embedding workflows in production

Chroma's open-source flexibility paired with Baseten's high-performance embedding inference (BEI) makes building AI-native apps simpler and faster. If you're interested in learning more about performance optimizations for embedding inference, here's a detailed look at how we built BEI.

When you're ready, you can easily deploy your first embedding model on Baseten to begin building an agent, RAG pipeline, or search interface with Baseten and Chroma.