Voicebox

Voice AI AgentsfreeOpen SourceTechnology, Media, Education, Entertainment, Productivity

Open-source local AI voice studio for cloning voices, generating speech, dictation, and giving MCP-aware agents custom voices.

Visit website ↗GitHub ↗

🛡️ AgentReady threat assessment

MAESTRO 7-layer threat model + OWASP AIVSS risk score for Voicebox, derived from its capabilities.

AIVSS 6.7 · Medium

View MAESTRO 7-layer threat model →

These scores are auto-generated from public information (the agent's own listing, docs, and repository) using the canonical OWASP AIVSS formula and the MAESTRO framework — an estimate for guidance, not a penetration test, audit, or certification. See the scoring methodology. Are you the vendor? Factual corrections are free.

Overview

Voicebox is a free, open-source, local-first AI voice studio from the jamiepine/voicebox GitHub project. It is described as an alternative to ElevenLabs and WisprFlow in one app, combining voice output and input workflows. Voicebox can clone voices from a few seconds of audio, generate speech in 23 languages across seven TTS engines, provide global-hotkey dictation into text fields, and let MCP-aware AI agents speak using voices the user owns. The project emphasizes local execution and privacy, with models, voice data, and captures running on the user's machine.

Key features

voice cloning
text to speech
speech to text
dictation
local first
mcp
open source
multilingual
tts
privacy

Use cases

Cloning a voice from a short reference audio sample
Generating multilingual text-to-speech locally
Dictating into text fields with a global hotkey
Giving MCP-aware AI agents a selected voice
Creating speech with preset voices
Running private voice workflows on a local machine