Baseten Blog | Page 7

Topics

Latest Model performance Hacks & projects GPU guides ML models Glossary Community Product News

1…6 7 8…14

The benefits of globally distributed infrastructure for model serving

Multi-cloud and multi-region infrastructure for model serving provides availability, redundancy, lower latency, cost savings, and data residency compliance.

Phil Howes

1 other

Prompt: a movie still of a gondola lift in the Alps

Product

New in February 2024

3x throughput with H100 GPUs, 40% lower SDXL latency with TensorRT, and multimodal open source models.

Baseten

Prompt: A futuristic submarine in a colorful coral reef

Model performance

40% faster Stable Diffusion XL inference with NVIDIA TensorRT

Using NVIDIA TensorRT to optimize each component of the SDXL pipeline, we improved SDXL inference latency by 40% and throughput by 70% on NVIDIA H100 GPUs.

Pankaj Gupta

2 others

Prompt: A movie still of an astronaut coming through a technicolor wormhole

Glossary

Why GPU utilization matters for model inference

Save money on high-traffic model inference workloads by increasing GPU utilization to maximize performance per dollar for LLMs, SDXL, Whisper, and more.

Marius Killinger

1 other

Prompt: A retrofuturistic pickup truck loaded with green plants on a sunny highway

ML models

The best open source large language model

Explore the best open source large language models for 2025 for any budget, license, and use case.

Philip Kiely

Prompt: A sleek orange robot hoising a trophy on top of a mountain.

Model performance

Unlocking the full power of NVIDIA H100 GPUs for ML inference with TensorRT

Double or triple throughput at same-or-better latencies by switching to H100 GPUs from A100s for model inference with TensorRT/TensorRT-LLM.

Pankaj Gupta

1 other

Prompt: a retro rocket ship taking off on the beach at sunrise. Model: Playground 2

Product

New in January 2024

A library for open source models, general availability for L4 GPUs, and performance benchmarking for ML inference

Baseten

Prompt: A futuristic bullet train crossing under a waterfall with soft lighting. Model: Playground 2.

Glossary

Introduction to quantizing ML models

Quantizing ML models like LLMs makes it possible to run big models on less expensive GPUs. But it must be done carefully to avoid quality reduction.

Abu Qader

1 other

Prompt: A steampunk microscope in a lab run by lord of the rings elves. Model: Playground 2

Glossary

How to benchmark image generation models like Stable Diffusion XL

Benchmarking Stable Diffusion XL performance across latency, throughput, and cost depends on factors from hardware to model variant to inference config.

Philip Kiely

Prompt: a sleek bus driving through the mountains. Model: Playground 2

1…6 7 8…14