Model Comparison for English Speech Synthesis
This page contains some of the most iconic synthesised speech samples for each of the models. Code for sample generation and data processing can be found at github.com/krsaulitis/tts-survey-2023
Quality (NISQA naturalness) metric samples
Good sample average score: 3.94, bad sample average score: 2.49,
Model | Good sample | Bad sample |
---|---|---|
"The tip of the leaf is rounded" | "They were very difficult moments in my life" | |
CoMoSpeech | ||
MQTTS | ||
OverFlow | ||
YourTTS | ||
VITS | ||
Grad-TTS | ||
FastSpeech 2 | ||
Glow-TTS | ||
MaryTTS | ||
Common Voice |
Precision (CER) metric samples
Two samples with one of the worst average CER metric
Model | Sample Nr. 26 | Sample Nr. 32259925 |
---|---|---|
"or a man into the wind." | "Unknown error." | |
CoMoSpeech | ||
MQTTS | ||
OverFlow | ||
YourTTS | ||
VITS | ||
Grad-TTS | ||
FastSpeech 2 | ||
Glow-TTS | ||
MaryTTS | ||
Common Voice |