Model Comparison for English Speech Synthesis

This page contains some of the most iconic synthesised speech samples for each of the models. Code for sample generation and data processing can be found at github.com/krsaulitis/tts-survey-2023

Quality (NISQA naturalness) metric samples

Good sample average score: 3.94, bad sample average score: 2.49,

Model Good sample Bad sample
"The tip of the leaf is rounded" "They were very difficult moments in my life"
CoMoSpeech
MQTTS
OverFlow
YourTTS
VITS
Grad-TTS
FastSpeech 2
Glow-TTS
MaryTTS
Common Voice

Precision (CER) metric samples

Two samples with one of the worst average CER metric

Model Sample Nr. 26 Sample Nr. 32259925
"or a man into the wind." "Unknown error."
CoMoSpeech
MQTTS
OverFlow
YourTTS
VITS
Grad-TTS
FastSpeech 2
Glow-TTS
MaryTTS
Common Voice