"Inference Engineering" is now available. Get your copy here

Changelog

See our latest feature releases, product improvements and bug fixes

Apr 17, 2026

Model API deprecation notice (DeepSeek v3 0324, GLM 4.6)

The DeepSeek v3 0324 and GLM 4.6 Model API(s) will be deprecated at 5pm PT on May 1st.

Apr 16, 2026

Cache token pricing now available for Model APIs

Cached input tokens are billed at a discounted rate on Model APIs for all models (excluding GPT-OSS), starting April 17, 2026. Cache token pricing is applied automatically to the portion of each...

Apr 6, 2026

Copy and download logs

You can now copy or download all visible logs directly from the logs viewer. A new export menu next to the search box lets you copy logs to your clipboard, or download them as CSV or JSON. To export...

Apr 6, 2026

Named entity recognition on BEI-Bert

BEI-Bert now supports token-classification models for named-entity recognition. Deploy any ForTokenClassification model with the /predict_tokens endpoint and get structured entity predictions with...

Apr 1, 2026

Per-request log filtering

Every predict call now returns a unique request ID in the X-Baseten-Request-Id response header. Use this ID to filter your model's logs to a single request, cutting through the noise when debugging...

Mar 31, 2026

Health check improvements

Startup probes now handle initialization more reliably by waiting until the model has loaded before executing any liveness checks. The startup phase still defaults to 30 minutes and can be configured...

Mar 30, 2026

Rolling deployments

You can now gradually shift traffic to new deployments instead of swapping all at once. Candidate replicas scale up incrementally while previous replicas scale down in controlled steps, giving you...

Mar 27, 2026

Hot reload for development deployments

truss watch and truss push --watch now support hot-reloading model code changes with the --hot-reload and --watch-hot-reload flags. Instead of restarting the inference server, hot reload swaps your...

Mar 27, 2026

Terminate deployment replica via API

You can now terminate a specific replica within a deployment using the new management API endpoint. This lets you remove individual replicas without affecting the rest of the deployment, making it...

Mar 26, 2026

Observability improvements

We've redesigned the logs and metrics views for better visibility and faster debugging.