Frontier RL with Baseten Loops
Async RL on long sequence lengths with one-click checkpoint deploys to the Baseten Inference Stack.
Two ways to train, pick what fits your needs
Loops (early access)
Write training logic, not infra code.
A training SDK that supports long sequence length, async RL, and one-click checkpoint deploys.
131K+ sequence length and 1T+ parameter model training. Qwen3.5/3.6 family and Kimi K2.6 support. Nemotron, Deepseek, GLM, and MiniMax series to follow shortly.
Train → deploy loop: Models trained with Loops promote directly to Baseten Dedicated Inference with one command.
Asynchronous RL primitives like policy versioning and non-blocking weight sync that enable bounded off-policy learning.
Full ownership of your trained weights, no lock-in.
Training Jobs (GA)
Run your existing training scripts on managed GPUs.
A framework-agnostic training product designed for teams who want bare-metal like control on managed infra.
Multi-node training with automatic checkpoint syncing between nodes.
On-demand compute acquired in seconds.
Plugs into your existing stack like W&B, HuggingFace, or S3 via Baseten Secrets.
SSH access built in for live debugging on any running container.
Solving the key problems with large model post-training
We've run into deployment friction, synchronous weight syncs, and unpredictable runtimes ourselves. We built Loops to solve these problems.

Train and deploy, one platform
Current Problem: Teams have to manually merge LoRAs, quantize across formats, and burn iteration cycles before serving prod traffic.
Loops: Inference is a first-class citizen in the product. After the last gradient step, your model is ready to deploy as a prod endpoint to close the training to inference loop.

Async RL at scale
Current Problem: Training 1T+ parameter models at long sequence lengths means hand-tuning parallelisms on fragile training libraries. True async RL is often an afterthought.
Loops: Take a gradient step with primitives like forward_backward, optim_step, and sample. Loops handles all the memory management and parallelisms. Training and sampling also overlap by pushing new weights in the background, so the trainer never waits for the weight sync.

Predictable performance
Current Problem: Training large models on shared infra creates painful variance, with the same script taking hours one day and minutes the next.
Loops: Scale your samplers and trainers independently on dedicated infra that delivers consistent throughput run-over-run.
Our AI engineers build domain-specific models that beat frontier labs in medical record interpretation. With Baseten Training, we can stay focused on our research and value to customers, not hardware and job orchestration. The Baseten platform powers our workflows from training through to production, saving us tons of time and stress.
Baseten helped us train models to be 23x faster and is projected to save us $1.9M, while making the process so easy that even non-ML engineers could get results in under 30 minutes.
Research support
Get startedOur Research
Read MoreFAQ
A training SDK for RL and SFT that supports long sequence length, async RL, and one-click checkpoint deploys. It provides familiar primitives like forward_backward, optim_step, and sample with additives that enable you to train at scale. Currently in early access via public waitlist.
Loops is built around four pillars: (1) one-click train → deploy to a Baseten inference endpoint, powered by the Baseten Inference Stack, (2) async and bounded off-policy RL primitives, (3) 131K+ sequence length out of the box, and (4) predictable run-over-run performance with dedicated infra.
All Qwen3.5/3.6 family of models along with 1T+ parameter models like Kimi K2.6. We are continually adding model support.
Yes. Loops is one of two ways to train on Baseten. Teams that prefer more bare-metal like control can run their own training scripts and SSH through Training Jobs, with the same one-click path to production inference.






































