Statistical learning in non-Chinese speakers exposed to a sequence of words without spaces

Wang, T. , Liu, Y. , Chen, J. & Li, A.

National Taiwan Normal University, Taiwan

A printed Chinese sentence is composed of a string of characters with no extra spaces between words. We hypothesize that automatic word segmentation by Chinese readers is developed via the statistical learning mechanism that is universal to all language learners. We tested the hypothesis by exposing 20 non-Chinese speakers to a sequence of 3600 characters constructed from six disyllabic words with 6 different characters. The words were repeated 300 times and concatenated randomly except that the same word did not follow itself immediately. The transitional probabilities between any two characters in the sequence were .46 to 1 within words, and 0 to .29 between words. The sequence was presented one character every half a second, from left to right and from top to bottom over 36 screen pages. A character disappeared when the next one came up. Occasionally, with a probability of 0 to .03, the presentation rate doubled. The participants? task was to follow the characters as they appeared, and detect the instances of double presentation each time it occurred. At the end of the disguised vigilance task, the participants were given a surprise test which consisted of words and nonwords (reversals of the two characters in a word) from the sequence. A word and a nonword were shown each time in a random order, with one above the other. The participants pressed the up arrow key if they thought the top character string had appeared before, and the down arrow key if the bottom one had appeared. The averaged accuracy rate was .53, slightly but significantly greater than chance performance. This suggests that automatic word segmentation in reading Chinese texts can be achieved via the universal statistical learning mechanism that is available to non-Chinese speakers.