Introducing the Speculative Decoding Engine Builder integration
Our new Speculative Decoding integration lets you leverage speculative decoding as part of our streamlined TensorRT-LLM Engine Builder flow. Just modify the new speculator
configuration in the Engine Builder YAML file to specify your draft model, and either run with the default build or further tune parameters to fit your needs.
Check out the launch blog to learn more, or talk to one of our engineers if you think speculative decoding could be a good fit for your latency-sensitive LLM applications!