Skip to main content
Use WebRTCSession in the browser for WebRTC voice. Mint the credential on the server with Client.generateSessionToken and TRANSPORT_WEBRTC, pass token, url, room, and identity to the page, then connect.

Session token

API reference for the session-token request and response fields.

JavaScript SDK

Package install and SDK entry points.

Connection (message types)

WebSocket Session message catalog. WebRTCSession uses the same event names for signaling except it does not emit response_audio (agent speech is only on the WebRTC audio path into remoteAudioContainer).

Prerequisites

  • @vatel/sdk
  • Organization API key on the server only, agent UUID, WebRTC enabled on your stack

generateSessionToken (WebRTC)

import { Client, TRANSPORT_WEBRTC } from "@vatel/sdk";

const client = new Client({
  baseUrl: process.env.VATEL_BASE_URL ?? "https://api.vatel.ai",
  getToken: () => process.env.VATEL_API_KEY,
});

const { data, status } = await client.generateSessionToken(agentId, {
  transport: TRANSPORT_WEBRTC,
});
Use data.token and, when present, data.url, data.room, and data.identity for WebRTCSession.connect on the client. Check status and data for failures like any other SDK call.

Connect WebRTCSession (client)

import { WebRTCSession } from "@vatel/sdk";

const session = new WebRTCSession({
  remoteAudioContainer: document.getElementById("remote-audio"),
});

await session.connect({
  token: credential.token,
  url: credential.url,
  room: credential.room,
});

await session.start();
await session.setMicrophoneEnabled(true);
credential is the response from the generateSessionToken result.

Session events

WebRTCSession uses the same signaling-style events as a WebSocket Session, except response_audio is not fired — agent audio is delivered only as a remote media stream (attach with remoteAudioContainer or your own playback). Subscribe with session.on(eventName, (msg) => { ... }). Each msg has type, timestamp, and data (fields depend on type).
EventPurposeTypical msg.data
session_startedSession is liveid — session identifier
session_endedSession closedMay be an empty object; treat as end-of-call
response_textAgent text for the current turntext, turn_id
input_audio_transcriptSTT of user speechtranscript
speech_startedUser speech detected (VAD)emulated — if true, VAD did not fire but a transcript arrived, so start-of-speech is synthetic
speech_stoppedUser speech segment ended (VAD)Usually empty
interruptionUser cut off the agent while it was speakingUsually empty
tool_callAgent invoked a client tooltoolCallId, toolName, arguments (array of parameter descriptors with optional value)
For base64 response_audio chunks, use a WebSocket Session instead of WebRTCSession. Reply to tools with session.sendToolCallOutput(toolCallId, outputString) (async in some builds; use .catch(...) if needed).
session.on("session_started", (msg) => {
  console.log("session", msg.data?.id);
});
session.on("session_ended", () => console.log("ended"));
session.on("response_text", (msg) => console.log("agent:", msg.data?.text));
session.on("input_audio_transcript", (msg) => console.log("you:", msg.data?.transcript));
session.on("speech_started", (msg) => console.log("speech", msg.data?.emulated ? "emulated" : "vad"));
session.on("speech_stopped", () => console.log("speech stopped"));
session.on("interruption", () => console.log("interruption"));
session.on("tool_call", async (msg) => {
  const id = msg.data?.toolCallId;
  if (id) await session.sendToolCallOutput(id, "ok").catch(() => {});
});

See also

TopicLink
Browser voice demo (Next.js)Next.js
SDK overviewJavaScript SDK