Evaluating Long-form Text-to-Speech: Comparing the Ratings of Sentences and Paragraphs
Evaluating Long-form Text-to-Speech: Comparing the Ratings of Sentences and Paragraphs
Text-to-speech systems are typically evaluated on single sentences.When long-form content, such as data consisting of full paragraphs or dialogues is considered, evaluating sentences in isolation is not always appropriate as the context in which the sentences are synthesized is missing.In this paper, we investigate three different ways of evaluating the …