Product

Frontier RL with Baseten Loops

Async RL on long sequence lengths with one-click checkpoint deploys to the Baseten Inference Stack.

Early Access Signup Read the docs

Training Platform

Two ways to train, pick what fits your needs

Loops (early access)

Write training logic, not infra code.

A training SDK that supports long sequence length, async RL, and one-click checkpoint deploys.

131K+ sequence length and 1T+ parameter model training. Qwen3.5/3.6 family and Kimi K2.6 support. Nemotron, Deepseek, GLM, and MiniMax series to follow shortly.
Train → deploy loop: Models trained with Loops promote directly to Baseten Dedicated Inference with one command.
Asynchronous RL primitives like policy versioning and non-blocking weight sync that enable bounded off-policy learning.
Full ownership of your trained weights, no lock-in.

Early Access Signup

Training Jobs (GA)

Run your existing training scripts on managed GPUs.

A framework-agnostic training product designed for teams who want bare-metal like control on managed infra.

Multi-node training with automatic checkpoint syncing between nodes.
On-demand compute acquired in seconds.
Plugs into your existing stack like W&B, HuggingFace, or S3 via Baseten Secrets.
SSH access built in for live debugging on any running container.

Get Started

Loops SDK

Solving the key problems with large model post-training

We've run into deployment friction, synchronous weight syncs, and unpredictable runtimes ourselves. We built Loops to solve these problems.

Train and deploy, one platform

Current Problem: Teams have to manually merge LoRAs, quantize across formats, and burn iteration cycles before serving prod traffic.

Loops: Inference is a first-class citizen in the product. After the last gradient step, your model is ready to deploy as a prod endpoint to close the training to inference loop.

Async RL at scale

Current Problem: Training 1T+ parameter models at long sequence lengths means hand-tuning parallelisms on fragile training libraries. True async RL is often an afterthought.

Loops: Take a gradient step with primitives like forward_backward, optim_step, and sample. Loops handles all the memory management and parallelisms. Training and sampling also overlap by pushing new weights in the background, so the trainer never waits for the weight sync.

Predictable performance

Current Problem: Training large models on shared infra creates painful variance, with the same script taking hours one day and minutes the next.

Loops: Scale your samplers and trainers independently on dedicated infra that delivers consistent throughput run-over-run.

Our AI engineers build domain-specific models that beat frontier labs in medical record interpretation. With Baseten Training, we can stay focused on our research and value to customers, not hardware and job orchestration. The Baseten platform powers our workflows from training through to production, saving us tons of time and stress.
Baseten helped us train models to be 23x faster and is projected to save us $1.9M, while making the process so easy that even non-ML engineers could get results in under 30 minutes.
Eric Lehman
SVP of ML, OpenEvidence

Research support

Get started

Training Expertise

Partner with world-class RL researchers

Our team trains custom models for your use case that outperform closed-source models.

Your Models

Own your model artifacts

All artifacts including model weights, evals, and training scripts belong entirely to you.

Production Inference

Continual learning from inference

Easily deploy your custom model to inference and continually improve model quality with real-world data.

Our Research

Towards infinite context windows: neural KV cache compaction

Building an intermediate memory layer is a prerequisite for continual learning in LLMs.

Read

Building an intermediate memory layer is a prerequisite for continual learning in LLMs.

Read

Dense, on-policy, or both?

Constitutional alignment as a testbed for comparing learning signals in SFT, RL, and everything in between.

Read

Constitutional alignment as a testbed for comparing learning signals in SFT, RL, and everything in between.

Read

Repeated KV cache for long-running agents

Finding the core barrier to repeated KV cache compression for infinite context.

Read

Finding the core barrier to repeated KV cache compression for infinite context.

Read

Distillation without the dark

A co-evolving discriminator enables on-policy distillation from closed-source models without logit access.

Read

A co-evolving discriminator enables on-policy distillation from closed-source models without logit access.

Read

Iterative SFT (iSFT): dense reward learning

Iterative grader feedback turns imperfect model outputs into gold-quality SFT data.

Read

Iterative grader feedback turns imperfect model outputs into gold-quality SFT data.

Read

RGT (Rationale-Guided Training)

Upweight the strategy, not the tokens: faster training with explicit reasoning

Read

Upweight the strategy, not the tokens: faster training with explicit reasoning

Read

FAQ

A training SDK for RL and SFT that supports long sequence length, async RL, and one-click checkpoint deploys. It provides familiar primitives like forward_backward, optim_step, and sample with additives that enable you to train at scale. Currently in early access via public waitlist.

Loops is built around four pillars: (1) one-click train → deploy to a Baseten inference endpoint, powered by the Baseten Inference Stack, (2) async and bounded off-policy RL primitives, (3) 131K+ sequence length out of the box, and (4) predictable run-over-run performance with dedicated infra.

All Qwen3.5/3.6 family of models along with 1T+ parameter models like Kimi K2.6. We are continually adding model support.

Yes. Loops is one of two ways to train on Baseten. Teams that prefer more bare-metal like control can run their own training scripts and SSH through Training Jobs, with the same one-click path to production inference.

Frontier RL with Baseten Loops

Two ways to train, pick what fits your needs

Loops (early access)

Training Jobs (GA)

Solving the key problems with large model post-training

Train and deploy, one platform

Async RL at scale

Predictable performance

FAQ

What is Baseten Loops?

How is Loops different?

What models are supported?

Can I keep using my existing training scripts?

Get early access to Loops