DP-Parse: Finding Word Boundaries from Raw Speech with an Instance Lexicon
DP-Parse: Finding Word Boundaries from Raw Speech with an Instance Lexicon
Abstract Finding word boundaries in continuous speech is challenging as there is little or no equivalent of a ‘space’ delimiter between words. Popular Bayesian non-parametric models for text segmentation (Goldwater et al., 2006, 2009) use a Dirichlet process to jointly segment sentences and build a lexicon of word types. We …