Skip to main content
This example shows a Session WebSocket demo in Next.js: the user enters an agent ID, the app fetches a session token on the server, opens a WebSocket connection, streams microphone audio to Vatel, and plays back agent audio while displaying real-time events (transcripts, turn text, session lifecycle).

Session token

REST endpoint used to obtain a short-lived JWT for the WebSocket connection.

Connection

WebSocket channel, message types, and request/response flow.

Prerequisites

  • Next.js 14+ (App Router)
  • Environment variable: ORG_API_KEY (organization API key). The example uses api.vatel.ai as the API host.
  • Package: voice-stream for microphone capture and chunking

Setup

1

Install dependencies

Add the voice-stream package (and any UI dependencies you use):
npm install voice-stream
2

Configure environment

Create or edit .env.local with your organization API key:
ORG_API_KEY=your-organization-api-key
The example uses api.vatel.ai and builds the WebSocket URL as wss://api.vatel.ai/v1/connection.
3

Run the app

Start the dev server and open the page. Enter an agent ID and click Connect to open the session and stream audio.

How it works

  1. Server — The root page is a server component. It uses host api.vatel.ai and builds wsUrl (e.g. wss://api.vatel.ai/v1/connection). It defines a server action getSessionToken(agentId) that calls POST /v1/session-token with Authorization: Bearer ORG_API_KEY and returns the JWT.
  2. ClientSessionDemo is a client component. On connect it calls getSessionToken(agentId), then opens a WebSocket to wsUrl?token=<jwt>. It uses useSessionWebSocket to handle incoming messages and send input_audio; useAudioPlayback to play base64 response audio; and voice-stream to capture and chunk mic audio at 24 kHz and send it via sendInputAudio.
  3. Events — Incoming events (session_started, response_text, input_audio_transcript, speech_started, speech_stopped, interruption, session_ended, tool_call) are listed in the UI; response_audio is played and not shown in the list. The speech_started event includes an emulated boolean in data: when true, the VAD system did not detect speech but a transcript arrived, so speech start is emulated. The example shows “(VAD emulated)” when true. When a tool_call is received, the hook automatically sends a tool_call_output with output: "success" so the agent can continue; you can use sendToolCallOutput(toolCallId, output) for custom outputs.

Project structure

PathPurpose
app/page.tsxServer component: env, wsUrl, getSessionToken; renders SessionDemo with props
app/session-demo.tsxClient component: form, status, event list, connect/disconnect; uses hooks
hooks/use-session-websocket.tsWebSocket connection, event list, sendInputAudio, sendToolCallOutput, optional onResponseAudio callback
hooks/use-audio-playback.tsQueue and play base64 PCM chunks at 24 kHz
voice-streamExternal lib: mic → 24 kHz chunks, passed to sendInputAudio

Code

Root page: server component that builds wsUrl and defines the session-token server action, then renders SessionDemo.
import { SessionDemo } from "./session-demo";

const host = "api.vatel.ai";
const useTls = true;

async function getSessionToken(agentId: string): Promise<string | null> {
  "use server";
  const apiKey = process.env.ORG_API_KEY ?? "";
  if (!apiKey) return null;
  try {
    const url = new URL("/v1/session-token", useTls ? `https://${host}` : `http://${host}`);
    url.searchParams.set("agentId", agentId);
    const res = await fetch(url.toString(), {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
        Authorization: `Bearer ${apiKey}`,
      },
    });
    if (!res.ok) return null;
    const body = (await res.json()) as { token: string };
    return body.token;
  } catch (error) {
    console.error("Error getting session token", error);
    return null;
  }
}

export default async function Home() {
  const wsUrl = useTls ? `wss://${host}/v1/connection` : `ws://${host}/v1/connection`;

  return (
    <SessionDemo
      wsUrl={wsUrl}
      getSessionToken={getSessionToken}
    />
  );
}
Message format — Server messages include type and data; client sends { type: "input_audio", data: { audio: "<base64>" } } and { type: "tool_call_output", data: { toolCallId, output } }. See Connection for the full schema.