Measure end-to-end response time vs inference time
On the model metrics tab, you can now use the dropdown menu to toggle between two different views for model inference time: End-to-end response time includes time for cold starts, queuing, and...
See our latest feature releases, product improvements and bug fixes
Oct 9, 2023
On the model metrics tab, you can now use the dropdown menu to toggle between two different views for model inference time: End-to-end response time includes time for cold starts, queuing, and...
Sep 29, 2023
The replica count chart on the model metrics page is now broken out into “active” and “starting up” replicas. An active replica has loaded the model for inference and is actively responding to...
Sep 25, 2023
ML models deployed on Baseten can automatically scale to zero when not in use so that you’re not paying for unnecessary idle GPU time. When a scaled to zero model is invoked, it spins up on a new...
Sep 7, 2023
Models deployed on Baseten using Truss 0.7.1 or later can now send the 500 response code when there is an error during model invocation. This change only affects newly deployed models. Any exception...
Sep 7, 2023
We’ve updated your API key management panel with four key changes: Dropped randomly generated key names (e.g. legit-artichoke ) Instead, the first 8 characters of the key are displayed to make it...
Aug 24, 2023
Every model deployed on Baseten uses Truss , our open-source framework for packaging models with their dependencies, hardware requirements, and more. You can now securely download the Truss of any...
Aug 22, 2023
A small but mighty change: a 🚫 icon now clearly marks inactive models in the model dropdown and model version sidebar. The green dot for active models and moon icon for scaled-to-zero models remain...
Aug 14, 2023
Get cleaner, more accurate insights into your model’s performance and load with the refreshed model metrics charts in each model’s overview tab.. Monitor requests per minute and both mean and peak...
Aug 7, 2023
Baseten now supports streaming model output. Instead of having to wait for the entire output to be generated, you can immediately start returning results to users with a sub-one-second...
Jul 22, 2023
If you deploy a model by accident or want to shut down a failed deployment quickly, you can now stop any model deployment before it finishes on the model page. Stopping a model deployment puts that...