PS_3.011 - Procura-PALavras (P-PAL): A web application for a new European Portuguese lexical database

Soares, A. 1 , Montserrat, C. 2 , Iriarte, A. 3 , Almeida, J. J. 4 , Simões, A. 4 , Costa, A. 2 , França, P. 2 & Machado, J. 2

1 School of Psychology, University of Minho
2 Centre for Research in Psychology, University of Minho
3 Institute of Arts and Human Sciences, Univeristy of Minho
4 Department of Informatics, University of Minho

Procura-PALavras (P-PAL) is a web application for a new European Portuguese (EP) lexical database that provides a series of objective (lexical and sublexical) and subjective indexes for ≈250.000 non-lemmatized and ≈42.000 lemmatized EP words. Based on a corpus of over 200 million EP words, the P-PAL web application enables users to obtain a broad range of statistics concerning the properties of word stimuli, including several measures of word frequency, syllable frequency, bigram and biphone frequency, orthographic and phonological structure, morphological and syntactic structure, orthographic and phonological similarity, lexical semantic indexes, concreteness, familiarity, imageability, valence, arousal, and dominance measures. In order to obtain these statistics the user should decide between a lemma or wordform search in the application and between two word-based queries: (i) generate lists of words with specific characteristics (objective and/or subjective); or (ii) analyze word lists in specific characteristics (objective and/or subjective). In this work we present the wordform and lemma frequency indexes already available (per million words and contextual diversity), as well as some structure and similarity orthographic measures such as word length, neighborhood density and frequency, transposition neighbors, and addition and deletion neighbors. Bigram and trigram type and token frequencies will be also presented.