Meet the engineers behind Baseten
Baseten engineers work on the hardest problems in AI inference and infrastructure. Model development, serving, orchestration, observability, and low-level optimization across the full stack. Built for real production traffic with a constant focus on throughput, latency, and reliability.
Post-training
We go beyond generic fine-tuning. We push post-training to its limits. RL, reward shaping, and custom training pipelines tuned on your data. Models optimized for your exact use case, not the average one.
Recent research
View more- Iterative SFT (iSFT): dense reward learningOctober 15, 2025
- Repeated KV cache for long-running agentsMarch 5, 2026
Model performance
Model performance is never one size fits all. We profile your workload, find the bottlenecks, and optimize every layer of the inference stack. Kernels, quantization, batching, routing, hardware selection. The right configuration for your model and your traffic, not a generic preset.
Recent research
View more- Open-sourcing Baseten’s suffix automaton MTP acceleratorJanuary 23, 2026
- The fastest Whisper — with streaming and diarizationJanuary 15, 2026
Infrastructure
Uptime is table stakes. The hard part is scaling predictably under real load. We build for 99.99% uptime across clouds and regions with infrastructure that stays fast, reliable, and cost-efficient as traffic spikes. Deploy anywhere. Scale without surprises.
Recent research
View more- How we built Multi-cloud Capacity Management (MCM)June 23, 2025
- How we built RBAC that scales for the enterpriseApril 23, 2026
Forward deployed engineering
There’s no universal setup for AI inference. Every model, workload, and latency target changes the equation. FDEs work side by side with customers under real traffic, tuning deployments to hit performance targets from first prototype to production scale.
Founded by engineers
We started Baseten in 2019 after seeing the same failure over and over. Strong models stuck in deployment hell. Weeks to production. Fragile infrastructure. Systems that broke the moment real traffic hit. We’d lived the problem ourselves across research, infrastructure, and ML engineering. Training, serving, scaling, and hardware orchestration were all disconnected. Shipping ML systems meant stitching together tools that were never designed to work together. So we built the platform we wanted to use ourselves. Baseten gives teams the infrastructure and engineering depth to run AI systems in production at scale. Fast inference. Reliable deployments. Real performance under real load.
- Tuhin SrivastavaCEO and Co-Founder
- Amir HaghighatCTO and Co-Founder
- Phil HowesCo-Founder & Chief Scientist
- Pankaj GuptaCo-Founder