
Voicebox
Open-source local AI voice studio for cloning voices, generating speech, dictation, and giving MCP-aware agents custom voices.
🛡️ AgentReady threat assessment
MAESTRO 7-layer threat model + OWASP AIVSS risk score for Voicebox, derived from its capabilities.
These scores are auto-generated from public information (the agent's own listing, docs, and repository) using the canonical OWASP AIVSS formula and the MAESTRO framework — an estimate for guidance, not a penetration test, audit, or certification. See the scoring methodology. Are you the vendor? Factual corrections are free.
Overview
Voicebox is a free, open-source, local-first AI voice studio from the jamiepine/voicebox GitHub project. It is described as an alternative to ElevenLabs and WisprFlow in one app, combining voice output and input workflows. Voicebox can clone voices from a few seconds of audio, generate speech in 23 languages across seven TTS engines, provide global-hotkey dictation into text fields, and let MCP-aware AI agents speak using voices the user owns. The project emphasizes local execution and privacy, with models, voice data, and captures running on the user's machine.
Key features
- voice cloning
- text to speech
- speech to text
- dictation
- local first
- mcp
- open source
- multilingual
- tts
- privacy
Use cases
- Cloning a voice from a short reference audio sample
- Generating multilingual text-to-speech locally
- Dictating into text fields with a global hotkey
- Giving MCP-aware AI agents a selected voice
- Creating speech with preset voices
- Running private voice workflows on a local machine