DeepSeek-R1 Llama 70B
Llama 3.3 70B Instruct fine-tuned for CoT reasoning capabilities with DeepSeek R1
Deploy DeepSeek-R1 Llama 70B behind an API endpoint in seconds.
Deploy modelExample usage
The fine-tuned version of Llama uses a standard multi-turn messaging framework with system
and user
prompts and has recommended values for temperature
, top_p
, top_k
, and frequency_penalty
.
1import requests
2
3# Replace the empty string with your model id below
4model_id = ""
5baseten_api_key = os.environ["BASETEN_API_KEY"]
6
7data = {
8 "messages": [
9{"role": "system", "content": "You are a helpful and harmless assistant. You are Llama developed by Meta. You should think step-by-step."},
10{"role": "user", "content": "Which weighs more, a pound of bricks or a pound of feathers?"},
11 ]
12 "stream": True,
13 "max_new_tokens": 2048,
14 "temperature": 0.6,
15 "top_p": 1.0,
16 "top_k": 40,
17 "frequency_penalty": 1
18}
19
20# Call model endpoint
21res = requests.post(
22 f"https://model-{model_id}.api.baseten.co/production/predict",
23 headers={"Authorization": f"Api-Key {baseten_api_key}"},
24 json=data,
25 stream=True
26)
27
28# Print the generated tokens as they get streamed
29for content in res.iter_content():
30 print(content.decode("utf-8"), end="", flush=True)
1[
2 "streaming",
3 "output",
4 "text"
5]