About us

Meet the engineers behind Baseten

Baseten engineers work on the hardest problems in AI inference and infrastructure. Model development, serving, orchestration, observability, and low-level optimization across the full stack. Built for real production traffic with a constant focus on throughput, latency, and reliability.

Post-training

We go beyond generic fine-tuning. We push post-training to its limits. RL, reward shaping, and custom training pipelines tuned on your data. Models optimized for your exact use case, not the average one.

Recent research

Towards infinite context windows: neural KV cache compactionApril 1, 2026
Iterative SFT (iSFT): dense reward learningOctober 15, 2025
Repeated KV cache for long-running agentsMarch 5, 2026

Model performance

Model performance is never one size fits all. We profile your workload, find the bottlenecks, and optimize every layer of the inference stack. Kernels, quantization, batching, routing, hardware selection. The right configuration for your model and your traffic, not a generic preset.

Recent research

Open-sourcing Baseten’s suffix automaton MTP acceleratorJanuary 23, 2026
The fastest Whisper — with streaming and diarizationJanuary 15, 2026
Your client code matters: 12x higher embedding throughput with Python and RustJune 12, 2025
Sub-3 millisecond named entity recognition (NER) inferenceApril 6, 2026

Infrastructure

Uptime is table stakes. The hard part is scaling predictably under real load. We build for 99.99% uptime across clouds and regions with infrastructure that stays fast, reliable, and cost-efficient as traffic spikes. Deploy anywhere. Scale without surprises.

Recent research

How we built Multi-cloud Capacity Management (MCM)June 23, 2025
Introducing the Baseten Delivery Network: Fast cold starts for big modelsMarch 19, 2026
How we built RBAC that scales for the enterpriseApril 23, 2026

Forward deployed engineering

There’s no universal setup for AI inference. Every model, workload, and latency target changes the equation. FDEs work side by side with customers under real traffic, tuning deployments to hit performance targets from first prototype to production scale.

Founded by engineers

We started Baseten in 2019 after seeing the same failure over and over. Strong models stuck in deployment hell. Weeks to production. Fragile infrastructure. Systems that broke the moment real traffic hit. We’d lived the problem ourselves across research, infrastructure, and ML engineering. Training, serving, scaling, and hardware orchestration were all disconnected. Shipping ML systems meant stitching together tools that were never designed to work together. So we built the platform we wanted to use ourselves. Baseten gives teams the infrastructure and engineering depth to run AI systems in production at scale. Fast inference. Reliable deployments. Real performance under real load.

Tuhin SrivastavaCEO and Co-Founder
Amir HaghighatCTO and Co-Founder
Phil HowesCo-Founder & Chief Scientist
Pankaj GuptaCo-Founder