Baseten Blog | Page 4

News

Introducing automatic LLM optimization with TensorRT-LLM Engine Builder

The TensorRT-LLM Engine Builder empowers developers to deploy extremely efficient and performant inference servers for open source and fine-tuned LLMs.

1 other
Hacks & projects

Deploying custom ComfyUI workflows as APIs

Easily package your ComfyUI workflow to use any custom node or model checkpoint.

Community

Ten reasons to join Baseten

Baseten is a Series B startup building infrastructure for AI. We're actively hiring for many roles — here are ten reasons to join the Baseten team.

Model performance

How to serve 10,000 fine-tuned LLMs from a single GPU

LoRA swapping with TRT-LLM supports in-flight batching and loads LoRA weights in 1-2 ms, enabling each request to hit a different fine-tune.

Product

Using asynchronous inference in production

Learn how async inference works, protects against common inference failures, is applied in common use cases, and more.

2 others
Product

Baseten Chains explained: building multi-component AI workflows at scale

A Delightful Developer Experience for Building and Deploying Compound ML Inference Workflows

News

Introducing Baseten Chains

Learn about Baseten's new Chains framework for deploying complex ML inference workflows across compound AI systems using multiple models and components

4 others
ML models

Comparing few-step image generation models

Few-step image generation models like LCMs, SDXL Turbo, and SDXL Lightning can generate images fast, but there's a tradeoff when it comes to speed vs quality.

Glossary

How latent consistency models work

Latent Consistency Models (LCMs) improve on generative AI methods to produce high-quality images in just 2-4 steps, taking less than a second for inference.