Session token
REST endpoint used to obtain a short-lived JWT for the WebSocket connection.
Connection
WebSocket channel, message types, and request/response flow.
Prerequisites
- Next.js 14+ (App Router)
- Environment variable:
ORG_API_KEY(organization API key). The example usesapi.vatel.aias the API host. - Package:
voice-streamfor microphone capture and chunking
Setup
Configure environment
Create or edit The example uses
.env.local with your organization API key:api.vatel.ai and builds the WebSocket URL as wss://api.vatel.ai/v1/connection.How it works
- Server — The root page is a server component. It uses host
api.vatel.aiand buildswsUrl(e.g.wss://api.vatel.ai/v1/connection). It defines a server actiongetSessionToken(agentId)that callsPOST /v1/session-tokenwithAuthorization: Bearer ORG_API_KEYand returns the JWT. - Client —
SessionDemois a client component. On connect it callsgetSessionToken(agentId), then opens a WebSocket towsUrl?token=<jwt>. It usesuseSessionWebSocketto handle incoming messages and sendinput_audio;useAudioPlaybackto play base64 response audio; andvoice-streamto capture and chunk mic audio at 24 kHz and send it viasendInputAudio. - Events — Incoming events (
session_started,response_text,input_audio_transcript,speech_started,speech_stopped,interruption,session_ended,tool_call) are listed in the UI;response_audiois played and not shown in the list. Thespeech_startedevent includes anemulatedboolean indata: when true, the VAD system did not detect speech but a transcript arrived, so speech start is emulated. The example shows “(VAD emulated)” when true. When atool_callis received, the hook automatically sends atool_call_outputwithoutput: "success"so the agent can continue; you can usesendToolCallOutput(toolCallId, output)for custom outputs.
Project structure
| Path | Purpose |
|---|---|
app/page.tsx | Server component: env, wsUrl, getSessionToken; renders SessionDemo with props |
app/session-demo.tsx | Client component: form, status, event list, connect/disconnect; uses hooks |
hooks/use-session-websocket.ts | WebSocket connection, event list, sendInputAudio, sendToolCallOutput, optional onResponseAudio callback |
hooks/use-audio-playback.ts | Queue and play base64 PCM chunks at 24 kHz |
voice-stream | External lib: mic → 24 kHz chunks, passed to sendInputAudio |
Code
- app/page.tsx
- SessionDemo (session-demo.tsx)
- useSessionWebSocket
- useAudioPlayback
Root page: server component that builds
wsUrl and defines the session-token server action, then renders SessionDemo.Message format — Server messages include
type and data; client sends { type: "input_audio", data: { audio: "<base64>" } } and { type: "tool_call_output", data: { toolCallId, output } }. See Connection for the full schema.
