Baseten Blog | Page 1
Generally Available: The fastest, most accurate and cost-efficient Whisper transcription
At Baseten, we've built the most performant (1000x real-time factor), accurate, and cost-efficient speech-to-text pipeline for production AI audio transcription
Introducing Custom Servers: Deploy production-ready model servers from Docker images
Deploy production-ready model servers on Baseten directly from any Docker image using just a YAML file.
Create custom environments for deployments on Baseten
Test and deploy ML models reliably with production-ready custom environments, persistent endpoints, and seamless CI/CD.
Introducing canary deployments on Baseten
Our canary deployments feature lets you roll out new model deployments with minimal risk to your end-user experience.
Evaluating NVIDIA H200 Tensor Core GPUs for LLM inference
Are NVIDIA H200 GPUs cost-effective for model inference? We tested an 8xH200 cluster provided by Lambda to discover suitable inference workload profiles.
Export your model inference metrics to your favorite observability tool
Export model inference metrics like response time and hardware utilization to observability platforms like Grafana, New Relic, Datadog, and Prometheus.
Baseten partners with Google Cloud to deliver high-performance AI infrastructure to a broader audience
Baseten is now on Google Cloud Marketplace, empowering organizations with the tools to build and scale AI applications effortlessly.
Introducing Baseten Hybrid: control and flexibility in your cloud and ours
Baseten Hybrid is a multi-cloud solution that enables you to run inference in your cloud—with optional spillover into ours.
Building high-performance compound AI applications with MongoDB Atlas and Baseten
Using MongoDB Atlas and Baseten’s Chains framework for compound AI, you can build high-performance compound AI systems.
How to build function calling and JSON mode for open-source and fine-tuned LLMs
Use a state machine to generate token masks for logit biasing to enable function calling and structured output at the model server level.