Next.js

This example shows a Session WebSocket demo in Next.js: the user enters an agent ID, the app fetches a session token on the server, opens a WebSocket connection, streams microphone audio to Vatel, and plays back agent audio while displaying real-time events (transcripts, turn text, session lifecycle).

Session token

REST endpoint used to obtain a short-lived JWT for the WebSocket connection.

Connection

WebSocket channel, message types, and request/response flow.

Prerequisites

Next.js 14+ (App Router)
Environment variable: ORG_API_KEY (organization API key). The example uses api.vatel.ai as the API host.
Package: voice-stream for microphone capture and chunking

Setup

Install dependencies

Add the voice-stream package (and any UI dependencies you use):

npm install voice-stream

Configure environment

Create or edit .env.local with your organization API key:

ORG_API_KEY=your-organization-api-key

The example uses api.vatel.ai and builds the WebSocket URL as wss://api.vatel.ai/v1/connection.

Run the app

Start the dev server and open the page. Enter an agent ID and click Connect to open the session and stream audio.

How it works

Server - The root page is a server component. It uses host api.vatel.ai and builds wsUrl (e.g. wss://api.vatel.ai/v1/connection). It defines a server action getSessionToken(agentId) that calls POST /v1/session-token with Authorization: Bearer ORG_API_KEY and returns the JWT.
Client - SessionDemo is a client component. On connect it calls getSessionToken(agentId), then opens a WebSocket to wsUrl?token=<jwt>. It uses useSessionWebSocket to handle incoming messages and send input_audio; useAudioPlayback to play base64 response audio; and voice-stream to capture and chunk mic audio at 24 kHz and send it via sendInputAudio.
Events - Incoming events (session_started, response_text, input_audio_transcript, speech_started, speech_stopped, interruption, session_ended, tool_call) are listed in the UI; response_audio is played and not shown in the list. The speech_started event includes an emulated boolean in data: when true, the VAD system did not detect speech but a transcript arrived, so speech start is emulated. The example shows “(VAD emulated)” when true. When a tool_call is received, the hook automatically sends a tool_call_output with output: "success" so the agent can continue; you can use sendToolCallOutput(toolCallId, output) for custom outputs.

Project structure

Path	Purpose
`app/page.tsx`	Server component: env, `wsUrl`, `getSessionToken`; renders `SessionDemo` with props
`app/session-demo.tsx`	Client component: form, status, event list, connect/disconnect; uses hooks
`hooks/use-session-websocket.ts`	WebSocket connection, event list, `sendInputAudio`, `sendToolCallOutput`, optional `onResponseAudio` callback
`hooks/use-audio-playback.ts`	Queue and play base64 PCM chunks at 24 kHz
`voice-stream`	External lib: mic → 24 kHz chunks, passed to `sendInputAudio`

Code

app/page.tsx
SessionDemo (session-demo.tsx)
useSessionWebSocket
useAudioPlayback

Root page: server component that builds wsUrl and defines the session-token server action, then renders SessionDemo.

import { SessionDemo } from "./session-demo";

const host = "api.vatel.ai";
const useTls = true;

async function getSessionToken(agentId: string): Promise<string | null> {
  "use server";
  const apiKey = process.env.ORG_API_KEY ?? "";
  if (!apiKey) return null;
  try {
    const url = new URL("/v1/session-token", useTls ? `https://${host}` : `http://${host}`);
    url.searchParams.set("agentId", agentId);
    const res = await fetch(url.toString(), {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
        Authorization: `Bearer ${apiKey}`,
      },
    });
    if (!res.ok) return null;
    const body = (await res.json()) as { token: string };
    return body.token;
  } catch (error) {
    console.error("Error getting session token", error);
    return null;
  }
}

export default async function Home() {
  const wsUrl = useTls ? `wss://${host}/v1/connection` : `ws://${host}/v1/connection`;

  return (
    <SessionDemo
      wsUrl={wsUrl}
      getSessionToken={getSessionToken}
    />
  );
}

Client component: connect form (agent ID), WebSocket status, error banners, event list. Uses useSessionWebSocket, useAudioPlayback, and useVoiceStream; on connect it starts streaming mic and passes chunks to sendInputAudio; onResponseAudio plays chunks via playAudioChunk.

"use client";

import { useCallback, useEffect, useState } from "react";
import { useAudioPlayback } from "@/hooks/use-audio-playback";
import { useSessionWebSocket, type SessionEvent } from "@/hooks/use-session-websocket";
import { useVoiceStream } from "voice-stream";

function EventRow({ event }: { event: SessionEvent }) {
  const { message, receivedAt } = event;
  const time = new Date(receivedAt).toLocaleTimeString();
  const typeLabel = message.type.replace(/_/g, " ");
  return (
    <li className="rounded-lg border border-zinc-200 bg-zinc-50/50 p-3 dark:border-zinc-800 dark:bg-zinc-900/50">
      <div className="flex items-center justify-between gap-2">
        <span className="rounded bg-zinc-200 px-2 py-0.5 font-medium capitalize text-zinc-800 dark:bg-zinc-700 dark:text-zinc-200">
          {typeLabel}
        </span>
        <span className="text-xs text-zinc-500 dark:text-zinc-400">{time}</span>
      </div>
      <EventData message={message} />
    </li>
  );
}

function EventData({ message }: { message: SessionEvent["message"] }) {
  switch (message.type) {
    case "session_started":
      return (
        <dl className="mt-2 grid grid-cols-[auto_1fr] gap-x-3 gap-y-1 text-sm">
          <dt className="text-zinc-500 dark:text-zinc-400">Session ID</dt>
          <dd className="font-mono text-zinc-900 dark:text-zinc-100">{message.data.id}</dd>
        </dl>
      );
    case "response_text":
      return (
        <div className="mt-2">
          <span className="text-xs text-zinc-500 dark:text-zinc-400">Turn {message.data.turn_id}</span>
          <p className="mt-1 rounded bg-zinc-100 p-2 text-sm text-zinc-900 dark:bg-zinc-800 dark:text-zinc-100">
            {message.data.text}
          </p>
        </div>
      );
    case "input_audio_transcript":
      return (
        <div className="mt-2">
          <span className="text-xs text-zinc-500 dark:text-zinc-400">You said</span>
          <p className="mt-1 rounded bg-zinc-100 p-2 text-sm italic text-zinc-900 dark:bg-zinc-800 dark:text-zinc-100">
            {message.data.transcript}
          </p>
        </div>
      );
    case "speech_started":
      return (
        <p className="mt-2 text-xs text-zinc-500 dark:text-zinc-400">
          User started speaking
          {message.data.emulated && " (VAD emulated)"}
        </p>
      );
    case "speech_stopped":
      return <p className="mt-2 text-xs text-zinc-500 dark:text-zinc-400">User stopped speaking</p>;
    case "interruption":
      return <p className="mt-2 text-xs text-zinc-500 dark:text-zinc-400">User interrupted</p>;
    case "session_ended":
      return <p className="mt-2 text-xs text-zinc-500 dark:text-zinc-400">Session closed</p>;
    case "tool_call":
      return (
        <div className="mt-2">
          <dl className="grid grid-cols-[auto_1fr] gap-x-3 gap-y-1 text-sm">
            <dt className="text-zinc-500 dark:text-zinc-400">Tool call ID</dt>
            <dd className="font-mono text-zinc-900 dark:text-zinc-100">
              {message.data.toolCallId}
            </dd>
            <dt className="text-zinc-500 dark:text-zinc-400">Tool name</dt>
            <dd className="font-mono text-zinc-900 dark:text-zinc-100">
              {message.data.toolName}
            </dd>
          </dl>
          {message.data.arguments.length > 0 && (
            <div className="mt-2">
              <span className="text-xs text-zinc-500 dark:text-zinc-400">
                Arguments
              </span>
              <ul className="mt-1 list-inside list-disc rounded bg-zinc-100 p-2 text-sm text-zinc-900 dark:bg-zinc-800 dark:text-zinc-100">
                {message.data.arguments.map((arg, i) => (
                  <li key={i}>
                    <span className="font-medium">{arg.name}</span>
                    {arg.value !== undefined && (
                      <span className="text-zinc-600 dark:text-zinc-400">
                        {" "}
                        = {typeof arg.value === "string" ? `"${arg.value}"` : JSON.stringify(arg.value)}
                      </span>
                    )}
                  </li>
                ))}
              </ul>
            </div>
          )}
        </div>
      );
    default:
      return null;
  }
}

export function SessionDemo({
  wsUrl,
  getSessionToken,
}: {
  wsUrl: string;
  getSessionToken: (agentId: string) => Promise<string | null>;
}) {
  const [agentId, setAgentId] = useState("");
  const [token, setToken] = useState<string | null>(null);
  const [tokenError, setTokenError] = useState<string | null>(null);
  const [connecting, setConnecting] = useState(false);
  const [micError, setMicError] = useState<string | null>(null);

  const { playAudioChunk, stopAudio, isPlaying } = useAudioPlayback();
  const { events, status, lastError, disconnect, clearEvents, sendInputAudio } =
    useSessionWebSocket(wsUrl, token, { onResponseAudio: playAudioChunk });
  const { startStreaming, stopStreaming } = useVoiceStream({
    targetSampleRate: 24000,
    onAudioChunked: useCallback((audioData: string) => sendInputAudio(audioData), [sendInputAudio]),
  });

  useEffect(() => {
    if (status !== "open") return;
    setMicError(null);
    startStreaming().catch((err: unknown) => {
      setMicError(err instanceof Error ? err.message : "Microphone access failed");
    });
    return () => stopStreaming();
  }, [status, startStreaming, stopStreaming]);

  async function handleConnect(e: React.FormEvent) {
    e.preventDefault();
    const id = agentId.trim();
    if (!id) return;
    setTokenError(null);
    setConnecting(true);
    try {
      const t = await getSessionToken(id);
      if (t) setToken(t);
      else setTokenError("Failed to get session token. Check ORG_API_KEY in .env.local.");
    } finally {
      setConnecting(false);
    }
  }

  function handleDisconnect() {
    stopStreaming();
    stopAudio();
    disconnect();
    setToken(null);
    setMicError(null);
  }

  const displayEvents = events.filter((e) => e.message.type !== "response_audio");

  return (
    <div className="flex min-h-screen flex-col bg-zinc-50 font-sans dark:bg-zinc-950">
      <main className="mx-auto w-full max-w-2xl flex-1 px-4 py-8">
        <div className="mb-6 flex flex-wrap items-center justify-between gap-4">
          <h1 className="text-2xl font-semibold tracking-tight text-zinc-900 dark:text-zinc-100">
            Session WebSocket
          </h1>
          <div className="flex items-center gap-2">
            <span className={`inline-flex items-center gap-1.5 rounded-full px-3 py-1 text-xs font-medium ${
              status === "open" ? "bg-emerald-100 text-emerald-800 dark:bg-emerald-900/40 dark:text-emerald-300"
              : status === "connecting" ? "bg-amber-100 text-amber-800 dark:bg-amber-900/40 dark:text-amber-300"
              : status === "error" ? "bg-red-100 text-red-800 dark:bg-red-900/40 dark:text-red-300"
              : "bg-zinc-200 text-zinc-700 dark:bg-zinc-700 dark:text-zinc-300"
            }`}>
              {status}
            </span>
            {isPlaying && (
              <span className="rounded-full bg-blue-100 px-3 py-1 text-xs font-medium text-blue-800 dark:bg-blue-900/40 dark:text-blue-300">Playing</span>
            )}
            {token && status !== "idle" && (
              <button type="button" onClick={handleDisconnect} className="rounded-lg border border-zinc-300 bg-white px-3 py-1.5 text-sm font-medium text-zinc-700 hover:bg-zinc-50 dark:border-zinc-600 dark:bg-zinc-800 dark:text-zinc-300 dark:hover:bg-zinc-700">Disconnect</button>
            )}
            <button type="button" onClick={clearEvents} className="rounded-lg border border-zinc-300 bg-white px-3 py-1.5 text-sm font-medium text-zinc-700 hover:bg-zinc-50 dark:border-zinc-600 dark:bg-zinc-800 dark:text-zinc-300 dark:hover:bg-zinc-700">Clear</button>
          </div>
        </div>
        {lastError && <div className="mb-4 rounded-lg border border-red-200 bg-red-50 p-3 text-sm text-red-800">{lastError}</div>}
        {micError && <div className="mb-4 rounded-lg border border-amber-200 bg-amber-50 p-3 text-sm text-amber-800">Mic: {micError}</div>}
        {!token && (
          <form onSubmit={handleConnect} className="mb-6 rounded-lg border border-zinc-200 bg-white p-4">
            <label htmlFor="agentId" className="mb-2 block text-sm font-medium">Agent ID</label>
            <div className="flex gap-2">
              <input id="agentId" type="text" value={agentId} onChange={(e) => setAgentId(e.target.value)} placeholder="e.g. my-agent-id" disabled={connecting} />
              <button type="submit" disabled={connecting || !agentId.trim()}>{connecting ? "Connecting…" : "Connect"}</button>
            </div>
            {tokenError && <p className="mt-2 text-sm text-red-600">{tokenError}</p>}
          </form>
        )}
        <section>
          <h2 className="mb-3 text-sm font-medium text-zinc-600">Incoming events ({displayEvents.length})</h2>
          <ul className="flex flex-col gap-2">
            {displayEvents.length === 0 ? (
              <li className="rounded-lg border border-dashed border-zinc-300 py-8 text-center text-sm text-zinc-500">No events yet. Connect and send messages to see them here.</li>
            ) : (
              displayEvents.map((event) => <EventRow key={event.id} event={event} />)
            )}
          </ul>
        </section>
      </main>
    </div>
  );
}

Manages WebSocket lifecycle, event list, sendInputAudio, and sendToolCallOutput. Connects when token and wsUrl are set; passes response_audio payloads to onResponseAudio if provided. On tool_call messages, automatically sends tool_call_output with output: "success"; use sendToolCallOutput(toolCallId, output) for custom outputs. Includes the full SessionSendMessage / SessionEvent types and a parseMessage(raw) helper.

"use client";

import { useCallback, useEffect, useRef, useState } from "react";

export type SessionSendMessageType =
  | "session_started"
  | "response_audio"
  | "response_text"
  | "input_audio_transcript"
  | "speech_started"
  | "speech_stopped"
  | "interruption"
  | "session_ended"
  | "tool_call";

export type SessionSendMessageSessionStartedData = { id: string };
export type SessionSendMessageResponseAudioData = {
  turn_id: string;
  audio: string;
  is_final: boolean;
};
export type SessionSendMessageResponseTextData = {
  turn_id: string;
  text: string;
};
export type SessionSendMessageInputAudioTranscriptData = { transcript: string };
/** true when VAD did not detect speech but a transcript arrived; speech start is emulated */
export type SessionSendMessageSpeechStartedData = { emulated: boolean };

export type SessionSendMessageToolCallArgument = {
  name: string;
  type?: string;
  dataType?: string;
  description?: string;
  required?: boolean;
  value?: unknown;
};
export type SessionSendMessageToolCallData = {
  toolCallId: string;
  toolName: string;
  arguments: SessionSendMessageToolCallArgument[];
};

export type SessionSendMessage =
  | { type: "session_started"; timestamp: string; data: SessionSendMessageSessionStartedData }
  | { type: "response_audio"; timestamp: string; data: SessionSendMessageResponseAudioData }
  | { type: "response_text"; timestamp: string; data: SessionSendMessageResponseTextData }
  | { type: "input_audio_transcript"; timestamp: string; data: SessionSendMessageInputAudioTranscriptData }
  | { type: "speech_started"; timestamp: string; data: SessionSendMessageSpeechStartedData }
  | { type: "speech_stopped"; timestamp: string; data: Record<string, never> }
  | { type: "interruption"; timestamp: string; data: Record<string, never> }
  | { type: "session_ended"; timestamp: string; data: Record<string, never> }
  | { type: "tool_call"; timestamp: string; data: SessionSendMessageToolCallData };

export type SessionEvent = {
  id: string;
  message: SessionSendMessage;
  receivedAt: number;
};

export type SessionWebSocketStatus = "idle" | "connecting" | "open" | "closing" | "closed" | "error";

function parseMessage(raw: string): SessionSendMessage | null {
  try {
    return JSON.parse(raw) as SessionSendMessage;
  } catch {
    return null;
  }
}

export function useSessionWebSocket(
  wsUrl: string,
  token: string | null,
  options?: { onResponseAudio?: (audio: string) => void }
) {
  const [events, setEvents] = useState<SessionEvent[]>([]);
  const [status, setStatus] = useState<SessionWebSocketStatus>("idle");
  const [lastError, setLastError] = useState<string | null>(null);
  const wsRef = useRef<WebSocket | null>(null);
  const eventIdRef = useRef(0);
  const onResponseAudioRef = useRef(options?.onResponseAudio);
  useEffect(() => { onResponseAudioRef.current = options?.onResponseAudio; }, [options?.onResponseAudio]);

  const connect = useCallback(() => {
    if (!token || !wsUrl) { setStatus("idle"); return; }
    const url = `${wsUrl}?token=${encodeURIComponent(token)}`;
    setStatus("connecting");
    setLastError(null);
    const ws = new WebSocket(url);
    wsRef.current = ws;
    ws.onopen = () => setStatus("open");
    ws.onclose = () => { wsRef.current = null; setStatus("closed"); };
    ws.onerror = () => { setLastError("WebSocket error"); setStatus("error"); };
    ws.onmessage = (event) => {
      const message = parseMessage(event.data);
      if (message) {
        eventIdRef.current += 1;
        setEvents((prev) => [...prev, { id: String(eventIdRef.current), message, receivedAt: Date.now() }]);
        if (message.type === "response_audio" && onResponseAudioRef.current) {
          onResponseAudioRef.current(message.data.audio);
        }
        if (message.type === "tool_call" && wsRef.current?.readyState === WebSocket.OPEN) {
          wsRef.current.send(
            JSON.stringify({
              type: "tool_call_output",
              data: { toolCallId: message.data.toolCallId, output: "success" },
            })
          );
        }
      }
    };
  }, [wsUrl, token]);

  const sendInputAudio = useCallback((audio: string) => {
    if (wsRef.current?.readyState !== WebSocket.OPEN) return;
    wsRef.current.send(JSON.stringify({ type: "input_audio", data: { audio } }));
  }, []);

  const sendToolCallOutput = useCallback((toolCallId: string, output: string) => {
    if (wsRef.current?.readyState !== WebSocket.OPEN) return;
    wsRef.current.send(
      JSON.stringify({ type: "tool_call_output", data: { toolCallId, output } })
    );
  }, []);

  const disconnect = useCallback(() => {
    if (wsRef.current) { wsRef.current.close(); wsRef.current = null; setStatus("closed"); }
  }, []);

  useEffect(() => {
    if (!token || !wsUrl) return;
    const id = setTimeout(() => connect(), 0);
    return () => { clearTimeout(id); wsRef.current?.close(); wsRef.current = null; };
  }, [connect, token, wsUrl]);

  return { events, status, lastError, connect, disconnect, clearEvents: useCallback(() => setEvents([]), []), sendInputAudio, sendToolCallOutput };
}

Queues base64-encoded PCM chunks and plays them at 24 kHz using the Web Audio API.

"use client";

import { useEffect, useRef, useState } from "react";

export function useAudioPlayback() {
  const ctxRef = useRef<AudioContext | null>(null);
  const queueRef = useRef<Float32Array[]>([]);
  const playingRef = useRef(false);
  const [isPlaying, setIsPlaying] = useState(false);
  const sourceRef = useRef<AudioBufferSourceNode | null>(null);

  function playNext() {
    if (queueRef.current.length === 0 || !ctxRef.current) {
      playingRef.current = false;
      setIsPlaying(false);
      return;
    }
    playingRef.current = true;
    setIsPlaying(true);
    const data = queueRef.current.shift()!;
    const buffer = ctxRef.current.createBuffer(1, data.length, 24000);
    buffer.getChannelData(0).set(data);
    const source = ctxRef.current.createBufferSource();
    source.buffer = buffer;
    source.connect(ctxRef.current.destination);
    sourceRef.current = source;
    source.onended = () => {
      sourceRef.current = null;
      playNext();
    };
    source.start();
  }

  function playAudioChunk(base64: string) {
    if (!ctxRef.current) ctxRef.current = new AudioContext({ sampleRate: 24000 });
    const raw = atob(base64);
    const pcm16 = new Int16Array(raw.length / 2);
    for (let i = 0; i < pcm16.length; i++) pcm16[i] = raw.charCodeAt(i * 2) | (raw.charCodeAt(i * 2 + 1) << 8);
    const float32 = new Float32Array(pcm16.length);
    for (let i = 0; i < pcm16.length; i++) float32[i] = pcm16[i] / 32768;
    queueRef.current.push(float32);
    if (!playingRef.current) playNext();
  }

  function stopAudio() {
    if (sourceRef.current) { sourceRef.current.stop(); sourceRef.current.disconnect(); sourceRef.current = null; }
    queueRef.current = [];
    playingRef.current = false;
    setIsPlaying(false);
  }

  useEffect(() => () => { stopAudio(); ctxRef.current?.close(); }, []);

  return { playAudioChunk, stopAudio, isPlaying };
}

Message format - Server messages include type and data; client sends { type: "input_audio", data: { audio: "<base64>" } } and { type: "tool_call_output", data: { toolCallId, output } }. See Connection for the full schema.

Overview

Configure

Deploy

Monitor

Integrations

Session token

Connection

Prerequisites

Setup

How it works

Project structure

Code

Session token

Connection

​Prerequisites

​Setup

​How it works

​Project structure

​Code

Prerequisites

Setup

How it works

Project structure

Code