Baseten Blog | Page 1

Product

Introducing canary deployments on Baseten

Our canary deployments feature lets you roll out new model deployments with minimal risk to your end-user experience.

3 others
GPU guides

Evaluating NVIDIA H200 Tensor Core GPUs for LLM inference

Are NVIDIA H200 GPUs cost-effective for model inference? We tested an 8xH200 cluster provided by Lambda to discover suitable inference workload profiles.

News

Export your model inference metrics to your favorite observability tool

Export model inference metrics like response time and hardware utilization to observability platforms like Grafana, New Relic, Datadog, and Prometheus.

2 others
News

Baseten partners with Google Cloud to deliver high-performance AI infrastructure to a broader audience

Baseten is now on Google Cloud Marketplace, empowering organizations with the tools to build and scale AI applications effortlessly.

News

Introducing Baseten Hybrid: control and flexibility in your cloud and ours

Baseten Hybrid is a multi-cloud solution that enables you to run inference in your cloud—with optional spillover into ours.

2 others
Glossary

Building high-performance compound AI applications with MongoDB Atlas and Baseten

Using MongoDB Atlas and Baseten’s Chains framework for compound AI, you can build high-performance compound AI systems.

Model performance

How to build function calling and JSON mode for open-source and fine-tuned LLMs

Use a state machine to generate token masks for logit biasing to enable function calling and structured output at the model server level.

News

Introducing function calling and structured output for open-source and fine-tuned LLMs

Add function calling and structured output capabilities to any open-source or fine-tuned large language model supported by TensorRT-LLM automatically.

ML models

The best open-source image generation model

Explore the strengths and weaknesses of state-of-the-art image generation models like FLUX.1, Stable Diffusion 3, SDXL Lightning, and Playground 2.5.