Product
New observability features: activity logging, LLM metrics, and metrics dashboard customization
We added three new observability features for improved monitoring and debugging: an activity log, LLM metrics, and customizable metrics dashboards.
Introducing our Speculative Decoding Engine Builder integration for ultra-low-latency LLM inference
Our new Speculative Decoding integration can cut latency in half for production LLM workloads.
Introducing Custom Servers: Deploy production-ready model servers from Docker images
Deploy production-ready model servers on Baseten directly from any Docker image using just a YAML file.
Create custom environments for deployments on Baseten
Test and deploy ML models reliably with production-ready custom environments, persistent endpoints, and seamless CI/CD.
Introducing canary deployments on Baseten
Our canary deployments feature lets you roll out new model deployments with minimal risk to your end-user experience.
Using asynchronous inference in production
Learn how async inference works, protects against common inference failures, is applied in common use cases, and more.
Baseten Chains explained: building multi-component AI workflows at scale
A Delightful Developer Experience for Building and Deploying Compound ML Inference Workflows
New in May 2024
AI events, multicluster model serving architecture, tokenizer efficiency, and forward-deployed engineering
New in April 2024
Use four new best in class LLMs, stream synthesized speech with XTTS, and deploy models with CI/CD
New in March 2024
Fast Mistral 7B, fractional H100 GPUs, FP8 quantization, and API endpoints for model management.