Quality not quantity in caregiver speech: Using a (statistical) computational model to isolate the effects of lexical diversity of input and amount of input

Jones, G. 1 & Rowland, C. 2

1 Nottingham Trent University
2 University of Liverpool

Children who hear large amounts of speech, and hear very diverse speech, learn language more quickly than children who do not. However, high correlations between exposure and diversity in speech samples confounds previous research, making it difficult to isolate the influence of each. We overcame this problem by controlling the input to a (statistical) computational model of language learning so that amount of exposure to linguistic input (quantity) and the lexical diversity of that input (quality) are independently manipulated. Sublexical, lexical, and multi-word knowledge were charted across development (study 1), showing that while quantity may be important early in learning, quality is ultimately more crucial, a prediction confirmed against children?s data (study 2). The model trained on a lexically diverse input also performed better on nonword repetition and sentence recall tests (study 3) and was quicker to learn new words over time (study 4). A language input that is rich in lexical diversity outperforms equivalent richness in amount of exposure for learned sublexical and lexical knowledge, for well-established language tests, and for acquiring words that have never been encountered before.