Qwen 2.5 72B Instruct
The largest model in the Qwen family of LLMs
Deploy Qwen 2.5 72B Instruct behind an API endpoint in seconds.
Deploy modelExample usage
Qwen uses the standard llama-style multi-turn messaging framework with system
and user
prompts.
Input
1import requests
2
3# Replace the empty string with your model id below
4model_id = ""
5baseten_api_key = os.environ["BASETEN_API_KEY"]
6
7data = {
8 "messages": [
9 {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
10 {"role": "user", "content": "What does Tongyi Qianwen mean?"},
11 ]
12 "stream": True,
13 "max_new_tokens": 512,
14 "temperature": 0.9
15}
16
17# Call model endpoint
18res = requests.post(
19 f"https://model-{model_id}.api.baseten.co/production/predict",
20 headers={"Authorization": f"Api-Key {baseten_api_key}"},
21 json=data,
22 stream=True
23)
24
25# Print the generated tokens as they get streamed
26for content in res.iter_content():
27 print(content.decode("utf-8"), end="", flush=True)
JSON output
1[
2 "streaming",
3 "output",
4 "text"
5]