Technical Writer

Philip Kiely

Glossary

Jan 12, 2024

Understanding performance benchmarks for LLM inference

This guide helps you interpret LLM performance metrics to make direct comparisons on latency, throughput, and cost.

Philip Kiely

Prompt: Two racecars on the beach at sunset. Model: Playground 2.

Model performance

Dec 22, 2023Revised Mar 1, 2024

Faster Mixtral inference with TensorRT-LLM and quantization

Mixtral 8x7B structurally has faster inference than similarly-powerful Llama 2 70B, but we can make it even faster using TensorRT-LLM and int8 quantization.

Pankaj Gupta

Timur Abishev

1 other

Prompt: An illustration of a face divided in half. Half the face is Marie Curie, the other half of the face is Einstein. Model: Playground v2.

ML models

Dec 13, 2023

Playground v2 vs Stable Diffusion XL 1.0 for text-to-image generation

Playground v2, a new text-to-image model, matches SDXL's speed & quality with a unique AAA game-style aesthetic. Ideal choice varies by use case & art taste.

Philip Kiely

Model: Playground v2. Prompt: The meaning of life.

Hacks & projects

Dec 8, 2023

How to serve your ComfyUI model behind an API endpoint

This guide details deploying ComfyUI image generation pipelines via API for app integration, using Truss for packaging & production deployment.

Het Trivedi

Philip Kiely

Model: SDXL + ControlNet, Prompt: A top down view of a river through the woods

GPU guides

Nov 28, 2023

NVIDIA A10 vs A10G for ML model inference

The A10, an Ampere-series GPU, excels in tasks like running 7B parameter LLMs. AWS's A10G variant, similar in GPU memory & bandwidth, is mostly interchangeable.

Philip Kiely

Hacks & projects

Nov 22, 2023

GPT vs Mistral: Migrate to open source LLMs seamlessly

Use ChatCompletions API to test open-source LLMs like Mistral 7B in your AI app with just three minor code modifications.

Sid Shanker

Philip Kiely

Prompt: A sturdy stone bridge under a full moon, warm colors

ML models

Nov 21, 2023

Open source alternatives for machine learning models

Building on top of open source models gives you access to a wide range of capabilities that you would otherwise lack from a black box endpoint provider.