Bryce Dubayah

Engineering

Bryce Dubayah

Model performance

How we run GPT OSS 120B at 500+ tokens per second on NVIDIA GPUs

Amir Haghighat

Tri Dao

Abu Qader

Bryce Dubayah

Philip Kiely

Amir Haghighat

4 others

GPT OSS 120B

News

Introducing our Speculative Decoding Engine Builder integration for ultra-low-latency LLM inference

Abu Qader

Bryce Dubayah

Rachel Rapp

Justin Yi

3 others

Speculative Decoding in Engine Builder

Model performance

How to build function calling and JSON mode for open-source and fine-tuned LLMs

Bryce Dubayah

Philip Kiely

Bryce Dubayah

1 other

JSON Mode

News

Introducing function calling and structured output for open-source and fine-tuned LLMs

Bryce Dubayah

Philip Kiely

Bryce Dubayah

1 other

Function calling + JSON Mode

Explore Baseten today

Start deploying

Talk to an engineer