VoiceClaw

Voice interface for any AI. Talk to Gemini, OpenAI, and more through a unified relay server -- on your phone, on your Mac, or from any WebSocket client.

Get Started View on GitHub

Key Features

Real-time voice conversations

Speak naturally and hear AI responses with low latency. Full-duplex audio with barge-in support — interrupt the AI mid-sentence just like a real conversation.

Multi-provider support

Switch between Gemini and OpenAI models without changing client code. The relay server handles protocol translation transparently.

Brain agent

An async tool-calling agent that gives the voice AI access to web search, calendars, tasks, memory, and more — powered by any OpenAI-compatible agent.

Screen sharing

Share your screen on desktop so the AI can see what you see. JPEG frames streamed at 1 FPS with context window compression.

Session resumption

Gemini sessions survive network drops and transparently reconnect. OpenAI sessions rotate with transcript summaries to maintain context.

Conversation history

Local SQLite storage on both mobile and desktop with full transcript search and conversation continuity.

Components

Relay Server

TypeScript / Node.js WebSocket relay that translates between clients and AI providers. Handles brain agent calls, session management, and observability via Langfuse.

Read the docs

Desktop App

Electron + React + Tailwind macOS voice assistant with screen sharing capabilities.

Read the docs

Mobile App

React Native / Expo iOS voice assistant with native audio I/O and conversation history.

Read the docs

Quick Start

Clone and install

git clone https://github.com/yagudaev/voiceclaw.git
cd voiceclaw
yarn install

Start the relay server

cd relay-server
cp .env.example .env    # add your API keys
yarn dev

Start a client
- Desktop
- Mobile
Terminal window
cd desktop yarn dev
Terminal window
cd mobile yarn dev

How It Works

+-----------+        WebSocket        +---------------+        WebSocket        +----------------+
|           |  ---session.config--->  |               |  ---provider setup--->  |                |
|  Client   |  ---audio.append----->  | Relay Server  |  ---audio stream----->  |  AI Provider   |
|  (mobile  |  <---audio.delta------  |               |  <---model audio------  |  (Gemini /     |
|   or      |  <---transcript.delta-  |  - protocol   |  <---transcription----  |   OpenAI)      |
|  desktop) |  ---frame.append----->  |    translate   |  ---video frames----->  |                |
|           |  <---tool.call--------  |  - brain agent |                        +----------------+
|           |  <---tool.progress----  |  - tracing     |
+-----------+                         +---------------+

The relay server sits between clients and AI providers. It normalizes the different provider protocols into a single, clean WebSocket API. Clients never talk directly to Gemini or OpenAI — they speak the relay protocol, and the relay handles the translation.

Learn More

Architecture — detailed system design, audio flow, and session lifecycle
Relay Server — WebSocket protocol reference, configuration, and brain agent
Desktop App — building from source, screen sharing, and settings
Mobile App — Expo build setup and iOS-specific notes
Contributing — development setup, code style, and branch strategy