numpy and sounddevice for capture and playback.
Session token
REST endpoint used to obtain a short-lived JWT for the WebSocket connection.
Connection
WebSocket channel, message types, and request/response flow.
Prerequisites
- Python 3.9+
- Organization API key and agent ID (agent UUID)
- Dependencies: Vatel Python SDK,
numpy,sounddevice(for mic and speaker)
Setup
Clone or copy the example
Use the example from the vatel-nextjs-starters repo, or copy
examples/python-cli/ (e.g. run_session.py and requirements.txt) into your project.Install dependencies
From the directory that contains This installs the Vatel SDK (from the repo),
requirements.txt:numpy, and sounddevice.How it works
- Client and token — A
Clientis created with the API key and base URL (defaulthttps://api.vatel.ai). The script gets a session token withclient.session.generate_token_async(agent_id). - Connection —
client.connect(token=token, url=base_url)opens the WebSocket. A background thread captures microphone audio at 24 kHz mono 16-bit PCM, base64-encodes it, and feeds it into an async queue; a coroutine reads from the queue and sends chunks viaconn.send_input_audio(chunk). - Playback — A
sounddeviceoutput stream runs in a callback. Whenresponse_audiomessages arrive, the base64 payload is decoded and pushed into a queue; the callback plays the PCM data. - Events — The script prints
session_started,response_text,input_audio_transcript, andsession_endedto the console. Fortool_callit sends a simple"ok"reply so the agent can continue.
Project structure
| Path | Purpose |
|---|---|
run_session.py | CLI entrypoint: argparse, client, connection, mic thread, playback queue, event loop over stream_messages() |
requirements.txt | Vatel SDK (git), numpy, sounddevice |
Code
- run_session.py
- requirements.txt
Full-duplex session: mic → server, agent audio → speaker. Uses a thread for mic capture and a queue + sounddevice callback for playback.

