González Torre, I. 1 , Luque, B. 1, 2 , Lacasa, L. 2 , Luque, J. 3 & Hernández-Fernández, A. 4
1 Department of Applied Mathematics and Statistics, EIAE, Technical University of Madrid, Plaza Cardenal Cisneros, 28040 Madrid (Spain)
2 School of Mathematical Sciences, Queen Mary University of London, Mile End Road E14NS London (UK)
3 Telefonica Research, Edificio Telefónica-Diagonal 00, Barcelona (Spain)
4 Complexity and Quantitative Linguistics Lab, Laboratory for Relational Algorithmics, Complexity and Learning (LARCA), Institut de Ciències de l'Educació, Universitat Politècnica de Catalunya, Barcelona (Spain)
Linguistic laws have been routinely investigated in written corpora or in oral corpora but transcribed to their written \\\"equivalent\\\". Due to that fact, inferences of statistical patterns found in language may be biased by the arbitrary choice of segmentation of the acoustic signal, therefore making difficult the comparative studies between the human voice and other animal communication systems.
In a previous work we explore a method to directly study patterns in the energy releases of speech. The method has been applied for the first time to sixteen different languages recovering successfully some well-known laws of human communication (Guttenberg-Richter Law, Zipf?s Law, Brevity Law and Heaps Law). Universal patterns are reported below the phonetic temporal scale suggesting that linguistic laws emerge in speech as a consequence of the mechanism of phonation itself. The proposed methodology could be further applied to other acoustical communication systems.
Here this methodology is extended at the word and the phonemic levels produced by an automatic lexical analysis of the acoustic wave. We employ an oral corpus for exploring the possible biophysical and acoustic cues that statistical learning exploits in human language and also their connections with the regularities that involve these classical linguistic laws.