Skip to content

VoiceClaw

Voice interface for any AI. Talk to Gemini, OpenAI, and more through a unified relay server -- on your phone, on your Mac, or from any WebSocket client.

Real-time voice conversations

Speak naturally and hear AI responses with low latency. Full-duplex audio with barge-in support — interrupt the AI mid-sentence just like a real conversation.

Multi-provider support

Switch between Gemini and OpenAI models without changing client code. The relay server handles protocol translation transparently.

Brain agent

An async tool-calling agent that gives the voice AI access to web search, calendars, tasks, memory, and more — powered by any OpenAI-compatible agent.

Screen sharing

Share your screen on desktop so the AI can see what you see. JPEG frames streamed at 1 FPS with context window compression.

Session resumption

Gemini sessions survive network drops and transparently reconnect. OpenAI sessions rotate with transcript summaries to maintain context.

Conversation history

Local SQLite storage on both mobile and desktop with full transcript search and conversation continuity.

Relay Server

TypeScript / Node.js WebSocket relay that translates between clients and AI providers. Handles brain agent calls, session management, and observability via Langfuse.

Read the docs

Desktop App

Electron + React + Tailwind macOS voice assistant with screen sharing capabilities.

Read the docs

Mobile App

React Native / Expo iOS voice assistant with native audio I/O and conversation history.

Read the docs

  1. Clone and install

    Terminal window
    git clone https://github.com/yagudaev/voiceclaw.git
    cd voiceclaw
    yarn install
  2. Start the relay server

    Terminal window
    cd relay-server
    cp .env.example .env # add your API keys
    yarn dev
  3. Start a client

    Terminal window
    cd desktop
    yarn dev
+-----------+ WebSocket +---------------+ WebSocket +----------------+
| | ---session.config---> | | ---provider setup---> | |
| Client | ---audio.append-----> | Relay Server | ---audio stream-----> | AI Provider |
| (mobile | <---audio.delta------ | | <---model audio------ | (Gemini / |
| or | <---transcript.delta- | - protocol | <---transcription---- | OpenAI) |
| desktop) | ---frame.append-----> | translate | ---video frames-----> | |
| | <---tool.call-------- | - brain agent | +----------------+
| | <---tool.progress---- | - tracing |
+-----------+ +---------------+

The relay server sits between clients and AI providers. It normalizes the different provider protocols into a single, clean WebSocket API. Clients never talk directly to Gemini or OpenAI — they speak the relay protocol, and the relay handles the translation.

  • Architecture — detailed system design, audio flow, and session lifecycle
  • Relay Server — WebSocket protocol reference, configuration, and brain agent
  • Desktop App — building from source, screen sharing, and settings
  • Mobile App — Expo build setup and iOS-specific notes
  • Contributing — development setup, code style, and branch strategy