Product

Rachel Rapp

About

Rachel Rapp is part of Baseten's product team. With a background in applied machine learning research, she helps organizations navigate the complexities of deploying high-performance machine learning models in production. Originally from a small town in Michigan, nowadays you can find her in Germany with her husband and crew of former-Romanian-street pets.

Product

Introducing Baseten Embeddings Inference: The fastest embeddings solution available

Baseten Embeddings Inference (BEI) delivers 2x higher throughput and 10% lower latency for production embedding, reranker and classification models at scale.

Michael Feil

1 other

Product

Baseten Chains is now GA for production compound AI systems

Baseten Chains delivers ultra-low-latency compound AI at scale, with custom hardware per model and simplified model orchestration.

Marius Killinger

2 others

Baseten Chains delivers ultra-low-latency, scalable compound AI with custom hardware per model and seamless model orchestration.

Product

New observability features: activity logging, LLM metrics, and metrics dashboard customization

We added three new observability features for improved monitoring and debugging: an activity log, LLM metrics, and customizable metrics dashboards.

Suren Atoyan

4 others

Introducing three new observability features on Baseten: the activity log, LLM metrics, and customizable metrics dashboards

Product

Introducing our Speculative Decoding Engine Builder integration for ultra-low-latency LLM inference

Our new Speculative Decoding integration can cut latency in half for production LLM workloads.

Justin Yi

3 others

Baseten's Speculative Decoding integration can cut latency in half for production LLM workloads.

Model performance

Generally Available: The fastest, most accurate and cost-efficient Whisper transcription

At Baseten, we've built the most performant (1000x real-time factor), accurate, and cost-efficient speech-to-text pipeline for production AI audio transcription