Product

Introducing our Speculative Decoding Engine Builder integration for ultra-low-latency LLM inference

Our new Speculative Decoding integration can cut latency in half for production LLM workloads.

3 others

Introducing Custom Servers: Deploy production-ready model servers from Docker images

Deploy production-ready model servers on Baseten directly from any Docker image using just a YAML file.

Create custom environments for deployments on Baseten

Test and deploy ML models reliably with production-ready custom environments, persistent endpoints, and seamless CI/CD.

3 others

Introducing canary deployments on Baseten

Our canary deployments feature lets you roll out new model deployments with minimal risk to your end-user experience.

3 others

Using asynchronous inference in production

Learn how async inference works, protects against common inference failures, is applied in common use cases, and more.

2 others

Baseten Chains explained: building multi-component AI workflows at scale

A Delightful Developer Experience for Building and Deploying Compound ML Inference Workflows

New in May 2024

AI events, multicluster model serving architecture, tokenizer efficiency, and forward-deployed engineering

New in April 2024

Use four new best in class LLMs, stream synthesized speech with XTTS, and deploy models with CI/CD

New in March 2024

Fast Mistral 7B, fractional H100 GPUs, FP8 quantization, and API endpoints for model management.