Transitional probabilities are not used for word-segmentation

Saksida, A. , Langus, A. , Nespor, M. & Mehler, J.

International School for Advanced Studies, Trieste, Italy

Statistical learning represents a key mechanism in language acquisition, most prominently within the research of infant word segmentation from fluent speech. Statistical regularities, such as transitional probabilities (TPs) among adjacent syllables, tend to be present across languages, and infants might compute them. But it is not clear whether these regularities are universally salient enough to boost word extraction, nor is it clear where are the limits of such associative computation. We first present a study of child-directed corpora from 8 languages in which we show that the most prominently used models which use adjacent TPs do not successfully extract a significant amount of correct word candidates in all the languages. Statistical regularities might therefore not be universally salient. If humans nonetheless track adjacent probabilities, the number of statistically highly probable but unheard syllable sequences that can be extracted from each corpus vastly exceeds the number of the real words in the same corpus. Such false, phantom-like candidates would seriously impede the learning process of a potential learner. Second, we present 4 experiments with 6-month-old infants using HTTP. We familiarized infants with a continuous stream of artificial 3-syllabic words in which the only cues for segmentation were TPs drops at word boundaries. In the test phases of experiments 1&2 infants heard words versus part-words, and words versus non-words. In both, they looked significantly longer to the words. In the experiments 3&4, they heard statistically probable but unheard phantom-words versus part-words, and phantom-words versus non-words. Phantom-words received significantly less attention than part-words, but more than non-words. Infants, even if computing relative frequencies, thus rely more on the positive evidence in the input, and not primarily on the probabilities. We conclude that TPs are not universally salient, can produce misleading word candidates and are not primary source of information for the infants.