NVIDIA logoLlama 3.1 Nemotron Ultra 253B

A high-efficiency distill of Llama 3.2 405B with leading accuracy for reasoning, tool calling, chat, and instruction following.

Deploy Llama 3.1 Nemotron Ultra 253B behind an API endpoint in seconds.

Example usage

Input
1import requests
2
3# Replace the empty string with your model id below
4model_id = ""
5baseten_api_key = os.environ["BASETEN_API_KEY"]
6
7messages = [
8    {"role": "user", "content": "Write a limerick about the wonders of GPU computing.?"},
9]
10data = {
11    "messages": messages,
12    "stream": True,
13    "max_new_tokens": 512
14}
15
16# Call model endpoint
17res = requests.post(
18    f"https://model-{model_id}.api.baseten.co/production/predict",
19    headers={"Authorization": f"Api-Key {baseten_api_key}"},
20    json=data,
21    stream=True
22)
23
24# Print the generated tokens as they get streamed
25for content in res.iter_content():
26    print(content.decode("utf-8"), end="", flush=True)
JSON output
1null

Deploy any model in just a few commands

Avoid getting tangled in complex deployment processes. Deploy best-in-class open-source models and take advantage of optimized serving for your own models.

$

truss init -- example stable-diffusion-2-1-base ./my-sd-truss

$

cd ./my-sd-truss

$

export BASETEN_API_KEY=MdNmOCXc.YBtEZD0WFOYKso2A6NEQkRqTe

$

truss push

INFO

Serializing Stable Diffusion 2.1 truss.

INFO

Making contact with Baseten 👋 👽

INFO

🚀 Uploading model to Baseten 🚀

Upload progress: 0% | | 0.00G/2.39G