OS_06.1 - Learning semantic representations from sequential and syntactic statistics

Andrews, M. 1, 2 & Vigliocco, G. 2

1 Nottingham-Trent University
2 University College London

In recent years, a common computational approach to the problem of the learning semantic representations has been premised on the hypothesis that aspects of the meaning of words can be inferred from their statistical characteristics across spoken and written language. Well known examples of models of this kind include Latent Semantic Analysis due to Landauer et al. One of the widely shared assumptions of these models, however, has been to treat the linguistic context in which a word occurs as an unordered set of words, and by so doing they disregard fine-grained sequential and syntactic information. In the present work, we will describe a set of Bayesian distributional models that go beyond this so-called "bag-of-words" paradigm. These models avail of information regarding the sequential order in which words occur, the argument structure and general syntactic relationships within sentences, all of which potentially provide vital information about the possible meaning of words. By reference to word-associations norms and experimental behavioural measures of semantic representation in both monosemous and polysemous words, we demonstrate that more precise and psychologically valid semantic representations can be learned when these more fine-grained sources of statistical information are used.