Baseten Blog | Page 2
Export your model inference metrics to your favorite observability tool
Export model inference metrics like response time and hardware utilization to observability platforms like Grafana, New Relic, Datadog, and Prometheus.
Baseten partners with Google Cloud to deliver high-performance AI infrastructure to a broader audience
Baseten is now on Google Cloud Marketplace, empowering organizations with the tools to build and scale AI applications effortlessly.
Introducing Baseten Hybrid: control and flexibility in your cloud and ours
Baseten Hybrid is a multi-cloud solution that enables you to run inference in your cloud—with optional spillover into ours.
Building high-performance compound AI applications with MongoDB Atlas and Baseten
Using MongoDB Atlas and Baseten’s Chains framework for compound AI, you can build high-performance compound AI systems.
How to build function calling and JSON mode for open-source and fine-tuned LLMs
Use a state machine to generate token masks for logit biasing to enable function calling and structured output at the model server level.
Introducing function calling and structured output for open-source and fine-tuned LLMs
Add function calling and structured output capabilities to any open-source or fine-tuned large language model supported by TensorRT-LLM automatically.
The best open-source image generation model
Explore the strengths and weaknesses of state-of-the-art image generation models like FLUX.1, Stable Diffusion 3, SDXL Lightning, and Playground 2.5.
How to double tokens per second for Llama 3 with Medusa
We observe up to a 122% increase in tokens per second for Llama 3 after training custom Medusa heads and running the updated model with TensorRT-LLM
SPC hackathon winners build with Llama 3.1 on Baseten
SPC hackathon winner TestNinja and finalist VibeCheck used Baseten to power apps for test generation and mood board creation.