Subtlex-GR is a Modern Greek word frequency database listing more than 23 million Modern Greek words taken from 6.000 subtitle files.

Previous evidence has shown that word frequencies calculated from corpora based on film and television subtitles can readily account for reading performance, since the language used in subtitles greatly approximates everyday language. The present study examines this issue in a society with increased exposure to subtitle reading. We compiled SUBTLEX-GR, a subtitled-based corpus consisting of more than 23 million Modern Greek words, and tested to what extent subtitle-based frequency estimates and those taken from a written corpus of Modern Greek account for the lexical decision performance of young Greek adults who are exposed to subtitle reading on a daily basis. Results showed that SUBTLEX-GR frequency estimates effectively accounted for participants’ reading performance in two different visual word recognition experiments. More importantly, different analyses showed that frequencies estimated from a subtitle corpus explained the obtained results significantly better than traditional frequencies derived from written corpora.

Full citation: Dimitropoulou, M., Duñabeitia, J., Avilés, A., Corral, J.& Carreiras, M. (2010). Subtitle-based word frequencies as the best estimate of reading behaviour: the case of Greek.Frontiers in Psychology, 1:218, 1-12.

Click here to access detailed information regarding the SUBTLEX-GR corpus and to download the corresponding files.