Kokoro
Kokoro is a frontier TTS model for its size of 82 million parameters (text in/audio out).
Deploy Kokoro behind an API endpoint in seconds.
Deploy modelExample usage
Kokoro uses the following request and response format:
request:
{"text": "Hello", "voice": "af", "speed": 1.0}
text: str = defaults to "Hi, I'm kokoro"
voice: str = defaults to "af", available options: "af", "af_bella", "af_sarah", "am_adam", "am_michael", "bf_emma", "bf_isabella", "bm_george", "bm_lewis", "af_nicole", "af_sky"
speed: float = defaults to 1.0. The speed of the audio generated
reponse:
{"base64": "base64 encoded bytestring"}
Input
1import httpx
2import base64
3
4# Replace the empty string with your model id below
5model_id = ""
6baseten_api_key = os.environ["BASETEN_API_KEY"]
7
8with httpx.Client() as client:
9 # Make the API request
10 resp = client.post(
11 f"https://model-{model_id}.api.baseten.co/production/predict",
12 headers={"Authorization": f"Api-Key {API_KEY}"},
13 json={"text": "Hello world", "voice": "af", "speed": 1.0},
14 timeout=None,
15 )
16
17# Get the base64 encoded audio
18response_data = resp.json()
19audio_base64 = response_data["base64"]
20
21# Decode the base64 string
22audio_bytes = base64.b64decode(audio_base64)
23
24# Write to a WAV file
25with open("output.wav", "wb") as f:
26 f.write(audio_bytes)
27
28print("Audio saved to output.wav")
JSON output
1null
Preview
00:00/00:00