Connecting Hermes
Hermes by Nous Research is a family of fine-tuned open-source language models known for strong instruction following and tool use. Because Hermes can be served behind any OpenAI-compatible API, it works as a drop-in brain agent for VoiceClaw.
VoiceClaw is a thin voice layer on top of your agent. You run Hermes locally, point the relay at it, and VoiceClaw gives it a voice.
Prerequisites
Section titled “Prerequisites”- One of: Ollama, vLLM, or any OpenAI-compatible inference server
- A Hermes model downloaded (e.g.,
hermes3:8bfor Ollama) - VoiceClaw relay server cloned and ready
-
Run Hermes via an OpenAI-compatible server
Terminal window # Install Ollama if you haven'tbrew install ollama# Pull and run Hermesollama pull hermes3:8bollama serveOllama exposes an OpenAI-compatible API at
http://localhost:11434/v1.Terminal window pip install vllmvllm serve NousResearch/Hermes-3-Llama-3.1-8B \--host 0.0.0.0 \--port 8000 \--api-key your-secret-keyvLLM exposes an OpenAI-compatible API at
http://localhost:8000/v1.Download Hermes from the LM Studio model browser, load it, and start the local server. It defaults to
http://localhost:1234/v1. -
Configure the VoiceClaw relay server
In your relay server
.envfile, point at your Hermes instance:Terminal window BRAIN_GATEWAY_URL=http://localhost:11434/v1BRAIN_GATEWAY_AUTH_TOKEN=Terminal window BRAIN_GATEWAY_URL=http://localhost:8000/v1BRAIN_GATEWAY_AUTH_TOKEN=your-secret-keyTerminal window BRAIN_GATEWAY_URL=http://localhost:1234/v1BRAIN_GATEWAY_AUTH_TOKEN= -
Start the relay server
Terminal window cd relay-serveryarn dev -
Connect and test
Open the desktop or mobile app with
brainAgentset to"enabled". Ask a question like “What is the capital of France?” — the voice model will callask_brain, the relay will query Hermes, and you will hear the answer.
System prompt for Hermes
Section titled “System prompt for Hermes”Since Hermes runs locally without built-in tools like web search or calendar, adapt the system prompt to match what your setup can do. Paste this into the model’s system instructions:
You are the brain behind a voice assistant called VoiceClaw.
You receive questions via the ask_brain tool when the realtime voice modelneeds help -- factual questions, analysis, reasoning, or knowledge retrieval.
Guidelines:- Respond concisely. Your answers will be spoken aloud, not read on a screen.- Keep responses to 2-3 sentences when possible. The user is listening.- Skip formatting (no markdown, no bullet lists, no headers). Plain text only.- Lead with the answer, then add context if needed.- If you do not know something, say so briefly rather than guessing.- You are running locally as a Hermes model. Be direct and helpful.Choosing a model size
Section titled “Choosing a model size”| Model | VRAM | Best for |
|---|---|---|
hermes3:8b | ~6 GB | Fast responses, lighter hardware |
hermes3:70b | ~40 GB | Deeper reasoning, better tool use |
For a voice assistant where latency matters, the 8B model is a good starting point. Upgrade to 70B if you need more nuanced answers and have the hardware.