Predicting semantic constraints in implicit language learning with distributional semantics

Alikaniotis, D. & Williams, J. N.

Department of Theoretical and Applied Linguistics, University of Cambridge

In distributional semantics, words acquire meaning by exploiting the statistical information that is inherent in their linguistic environment. A common criticism towards such semantic representations is that they lack a rich conceptual structure, questioning the cognitive relevance of such statistical mechanisms during language learning.

Here we show that distributional semantic models provide a good fit to data obtained from implicit language learning experiments on adults (e.g. Leung & Williams, 2014). In these experiments participants are introduced to novel non-words, which co-occur with already known words conditioned on underlying semantic regularities, such as concrete/abstract, animate/inanimate. Participants can implicitly learn such underlying semantic regularities, although whether they do so depends upon the nature of the conceptual distinction involved and their first language.

In the present study, we trained two vector-space models based on the distributional semantics of an English and a Chinese corpus. We used the models? resulting semantic representations as input to a feed-forward neural network, which predicted the novel non-words, discovering the relevant elements of the input representation. Using datasets provided from four behavioural experiments, which used different semantic manipulations, we obtained generalisation gradients that matched closely those of humans, capturing the effects of various conceptual distinctions and cross-linguistic differences.