Optimized Llama 3 8B Instruct

NVIDIA logoLlama 3.1 Nemotron 70B

Llama 3.1 70B fine-tuned by NVIDIA to beat GPT-4o on benchmarks

Deploy Llama 3.1 Nemotron 70B behind an API endpoint in seconds.

Deploy model

Example usage

Input
1import requests
2
3# Replace the empty string with your model id below
4model_id = ""
5baseten_api_key = os.environ["BASETEN_API_KEY"]
6
7messages = [
8    {"role": "user", "content": "How many r in strawberry?"},
9]
10data = {
11    "messages": messages,
12    "stream": True,
13    "max_new_tokens": 512
14}
15
16# Call model endpoint
17res = requests.post(
18    f"https://model-{model_id}.api.baseten.co/production/predict",
19    headers={"Authorization": f"Api-Key {baseten_api_key}"},
20    json=data,
21    stream=True
22)
23
24# Print the generated tokens as they get streamed
25for content in res.iter_content():
26    print(content.decode("utf-8"), end="", flush=True)
JSON output
1[
2    "A sweet question!",
3    "Let's count the 'R's in 'strawberry':",
4    "1. S",
5    "2. T",
6    "3. R",
7    "4. A",
8    "5. W",
9    "6. B",
10    "7. E",
11    "8. R",
12    "9. R",
13    "10. Y",
14    "There are **3 'R's** in the word 'strawberry'."
15]

Deploy any model in just a few commands

Avoid getting tangled in complex deployment processes. Deploy best-in-class open-source models and take advantage of optimized serving for your own models.

$

truss init -- example stable-diffusion-2-1-base ./my-sd-truss

$

cd ./my-sd-truss

$

export BASETEN_API_KEY=MdNmOCXc.YBtEZD0WFOYKso2A6NEQkRqTe

$

truss push

INFO

Serializing Stable Diffusion 2.1 truss.

INFO

Making contact with Baseten 👋 👽

INFO

🚀 Uploading model to Baseten 🚀

Upload progress: 0% | | 0.00G/2.39G