Introducing the Baseten Frontier Gateway. Learn more here
Resources

Learn, Build, Deploy

Mario Martone logo

By fine-tuning Qwen models on Baseten Training, we exceeded the intelligence of closed-source models, while cutting overall inference costs by 60%. This also delivered a dramatic speedup, reducing p90 latency from 2.2 seconds to just 250 milliseconds—opening the door to entirely new, latency-sensitive LLM use cases.

Mario Martone
Head of Applied Research, EliseAI
Product
Raymond Cano
2 others
loops blog
Model performance
Aaryam Sharma
DFlash: 3x faster LLM inference
Product
Bola Malek
1 other
Baseten Frontier Gateway