Prompt: Three llamas in a mountain

Meta logoLlama 3 8B Instruct

Formerly SOTA 8 Billion Parameter LLM from Meta

Deploy Llama 3 8B Instruct behind an API endpoint in seconds.

Deploy model

Example usage

Llama uses a standard multi-turn messaging framework with system and user prompts and has recommended values for temperature, top_p, top_k, and frequency_penalty.

Input
1import requests
2
3# Replace the empty string with your model id below
4model_id = ""
5baseten_api_key = os.environ["BASETEN_API_KEY"]
6
7data = {
8    "messages": [
9        {"role": "system", "content": "You are a knowledgable, engaging, history teacher."},
10        {"role": "user", "content": "What was the role of Llamas in the Inca empire?"},
11    ]
12    "stream": True,
13    "max_new_tokens": 512,
14    "temperature": 0.6,
15    "top_p": 1.0,
16    "top_k": 40,
17    "frequency_penalty": 1
18}
19
20# Call model endpoint
21res = requests.post(
22    f"https://model-{model_id}.api.baseten.co/production/predict",
23    headers={"Authorization": f"Api-Key {baseten_api_key}"},
24    json=data,
25    stream=True
26)
27
28# Print the generated tokens as they get streamed
29for content in res.iter_content():
30    print(content.decode("utf-8"), end="", flush=True)
JSON output
1[
2    "streaming",
3    "output",
4    "text"
5]

Deploy any model in just a few commands

Avoid getting tangled in complex deployment processes. Deploy best-in-class open-source models and take advantage of optimized serving for your own models.

$

truss init -- example stable-diffusion-2-1-base ./my-sd-truss

$

cd ./my-sd-truss

$

export BASETEN_API_KEY=MdNmOCXc.YBtEZD0WFOYKso2A6NEQkRqTe

$

truss push

INFO

Serializing Stable Diffusion 2.1 truss.

INFO

Making contact with Baseten 👋 👽

INFO

🚀 Uploading model to Baseten 🚀

Upload progress: 0% | | 0.00G/2.39G