changelog / post

Introducing the Speculative Decoding Engine Builder integration

Dec 19, 2024

Go back

Our new Speculative Decoding integration lets you leverage speculative decoding as part of our streamlined TensorRT-LLM Engine Builder flow. Just modify the new speculator configuration in the Engine Builder YAML file to specify your draft model, and either run with the default build or further tune parameters to fit your needs.

Check out the launch blog to learn more, or talk to one of our engineers if you think speculative decoding could be a good fit for your latency-sensitive LLM applications!

Explore Baseten today

Start deploying

Talk to an engineer