Baseten / Blog

Baseten Blog | Page 8

Topics

Latest Model performance Hacks & projects GPU guides ML models Glossary Community Product News

1…7 8 9…14

Glossary

Understanding performance benchmarks for LLM inference

This guide helps you interpret LLM performance metrics to make direct comparisons on latency, throughput, and cost.

Philip Kiely

Prompt: Two racecars on the beach at sunset. Model: Playground 2.

Product

New in December 2023

Faster Mixtral inference, Playground v2 image generation, and ComfyUI pipelines as API endpoints.

Baseten

Prompt: A forest green airplane on the runway at dawn. Model: Playground v2.

Model performance

Faster Mixtral inference with TensorRT-LLM and quantization

Mixtral 8x7B structurally has faster inference than similarly-powerful Llama 2 70B, but we can make it even faster using TensorRT-LLM and int8 quantization.

Pankaj Gupta

2 others

Prompt: An illustration of a face divided in half. Half the face is Marie Curie, the other half of the face is Einstein. Model: Playground v2.

ML models

Playground v2 vs Stable Diffusion XL 1.0 for text-to-image generation

Playground v2, a new text-to-image model, matches SDXL's speed & quality with a unique AAA game-style aesthetic. Ideal choice varies by use case & art taste.

Philip Kiely

Model: Playground v2. Prompt: The meaning of life.

Hacks & projects

How to serve your ComfyUI model behind an API endpoint

This guide details deploying ComfyUI image generation pipelines via API for app integration, using Truss for packaging & production deployment.

Het Trivedi

1 other

Model: SDXL + ControlNet, Prompt: A top down view of a river through the woods

Product

New in November 2023

Switching to open source ML, a guide to model inference math, and Stability.ai's new generative AI image-to-video model.

Baseten

SDXL prompt: A green sailboat in the icy sea

GPU guides

NVIDIA A10 vs A10G for ML model inference

The A10, an Ampere-series GPU, excels in tasks like running 7B parameter LLMs. AWS's A10G variant, similar in GPU memory & bandwidth, is mostly interchangeable.

Philip Kiely