Model Comparison for English Speech Synthesis

This page contains some of the most iconic synthesised speech samples for each of the models. Code for sample generation and data processing can be found at github.com/krsaulitis/tts-survey-2023

Quality (NISQA naturalness) metric samples

Good sample average score: 3.94, bad sample average score: 2.49,

Model	Good sample	Bad sample
	"The tip of the leaf is rounded"	"They were very difficult moments in my life"
CoMoSpeech
MQTTS
OverFlow
YourTTS
VITS
Grad-TTS
FastSpeech 2
Glow-TTS
MaryTTS
Common Voice