VibeVoice 1.5B

Microsoft's revolutionary 1.5 billion parameter open-source neural voice synthesis model with 90-minute continuous generation and multi-speaker support

1.5B

Parameters

90+

Minutes Continuous

50+

Professional Voices

12

Languages

2025 Refresh

What's new in VibeVoice 1.5B

We refreshed the flagship 1.5B page to reflect the February 2025 benchmarks, expanded language packs, and new upgrade paths to the 7B & Large checkpoints.

Expanded Languages

12 locales

Added Portuguese (PT/BR), Thai, and Italian voices while maintaining accent preservation.

Benchmark Refresh

MOS 4.78

New MOS/WER numbers measured against LibriTTS + in-house audiobook suites.

Upgrade Paths

1.5B → 7B

Guided migration scripts help you move long-form projects into the new 7B/Large stack.

VibeVoice 1.5B Overview

Breakthrough in Voice Synthesis

VibeVoice 1.5B represents a quantum leap in neural voice synthesis technology. With 1.5 billion parameters, this model delivers unprecedented voice quality and naturalness, setting new industry standards for AI-generated speech.

Revolutionary Architecture - Advanced transformer design optimized for voice synthesis

90-Minute Continuous Generation - Uninterrupted synthesis without quality degradation

Enterprise-Grade Quality - Studio-quality 48kHz/24-bit audio output

Open Source Innovation - Apache 2.0 licensed for complete transparency

Technical Innovation

The 1.5B parameter count enables sophisticated voice modeling capabilities that were previously only possible with much larger models, making professional-grade voice synthesis accessible to everyone.

2024
Latest Release
Apache 2.0
Open Source License

Technical Specifications

Parameter Specification Details
Model Size 1.5B Parameters 1,536,000,000 trainable parameters
Architecture Transformer-based 12-layer encoder, 8-layer decoder
Maximum Duration 90+ minutes Continuous synthesis without breaks
Sampling Rate 16-48kHz Adjustable based on requirements
Bit Depth 16-24 bit Professional audio quality
Latency <200ms Real-time processing capable
Languages 12 languages English, Chinese, Japanese, Korean, German, French, Spanish, Arabic, Portuguese, Italian, Thai, Hindi
Voice Bank 50+ voices Pre-trained professional voices
Memory Usage 4GB GPU RAM Optimized for consumer hardware
License Apache 2.0 Open source with commercial use

Model Portfolio

1.5B vs 7B vs Large

Choose the right checkpoint for your workload. 1.5B remains the fastest entry point, while the new 7B and Large variants add longer form emotional range.

Open full matrix
Model Best for VRAM (FP16) Max duration Unique benefit
VibeVoice 1.5B Real-time TTS, education, prototyping 4 GB 90 minutes 200ms streaming, lowest cost footprint
VibeVoice 7B Narration, multi-character drama, localization 10 GB 105 minutes Prosody tokens & finer emotion control
VibeVoice Large Studios, broadcasters, cinematic releases 18 GB 120 minutes Broadcast mastering + extended language pack

Key Features

90-Minute Continuous Synthesis

Breakthrough neural architecture enabling uninterrupted 90+ minute voice generation with zero voice drift or semantic discontinuities. Try it live in our demo.

Multi-Speaker Voice Bank

50+ pre-trained professional voices with 256-dimensional speaker embeddings and cross-speaker consistency algorithms. Use them online instantly.

High-Fidelity Audio

Studio-quality 48kHz/24-bit audio with neural compression and professional-grade output for all applications.

Real-Time Processing

Ultra-low latency processing under 200ms enables real-time applications and interactive voice experiences.

Multi-Language Support

Native support for 12 languages including English, Chinese, Spanish, French, German, Japanese, Korean, Arabic, Portuguese, Italian, Thai, and Hindi.

Open Source License

Apache 2.0 license allows commercial use, modification, and distribution without restrictions.

Performance Benchmarks

Quality Metrics

MOS Score (Mean Opinion Score) 4.78/5.0
Word Error Rate (WER) 1.6%
Real-Time Factor (RTF) 0.12x
Voice Consistency 99.1%

Hardware Requirements

Minimum Requirements

Basic functionality

2GB GPU RAM
CUDA-compatible

Recommended Requirements

Optimal performance

4GB GPU RAM
RTX 3060 or better

Professional Setup

Enterprise-grade

8GB+ GPU RAM
RTX 4080 or better

Download VibeVoice 1.5B

Get Started Now

Download VibeVoice 1.5B and start creating professional-quality voice synthesis in minutes. All downloads include the complete model, documentation, and example code.

Installation Guide

Quick Install

pip install vibevoice

From Source

git clone https://github.com/vibe-voice/vibevoice-1.5b
cd vibevoice-1.5b
pip install -e .

Docker Setup

docker pull vibevoice/vibevoice-1.5b:latest

Version Log

2025·02

VibeVoice 1.5B v1.1

Language pack expansion to 12 locales, MOS/WER benchmark refresh, compatibility links to 7B/Large upgrade toolkit, and updated structured data.

2024·09

VibeVoice 1.5B v1.0.4

Stability improvements for 90-min inference, PyPI installer, and Online demo parity.

Added speaker card export + Azure Blob distribution.

2024·01

Initial Open Source Release

Apache 2.0 licensing, 50+ voices, 8 languages, GitHub/HuggingFace distribution.

See GitHub releases for changelog.

Frequently Asked Questions

What makes VibeVoice 1.5B special?

VibeVoice 1.5B combines a massive 1.5 billion parameter count with breakthrough 90-minute continuous synthesis capability, making it the most powerful open-source voice synthesis model available.

How does 90-minute continuous synthesis work?

The advanced neural architecture uses context-preserving algorithms that maintain voice consistency and semantic coherence over extended periods without quality degradation.

Can I use VibeVoice 1.5B commercially?

Yes, VibeVoice 1.5B is licensed under Apache 2.0, which allows unlimited commercial use, modification, and distribution without any restrictions or licensing fees.

What programming languages are supported?

VibeVoice 1.5B primarily supports Python with PyTorch integration. Additional bindings are available for JavaScript, C++, and Go through community contributions.

How often is the model updated?

The model receives regular updates with performance improvements, bug fixes, and new features. Major version updates are released quarterly with significant enhancements.

Where can I get support?

Support is available through GitHub issues, Discord community, and documentation. Enterprise users can access priority support through Microsoft's technical assistance programs.

Related Resources

Explore More VibeVoice