Learning language's abstract and rule-like structure

Willits, J.

Indiana University

Learning to represent language's hierarchical structure and its nonadjacent dependencies is thought to be difficult for association-based mechanisms. Most notably, it is argued that they have extreme difficulty learning some of language's abstract and rule-like relations. In the following work, I present two simulations of language learning using a simple recurrent network (SRN), demonstrating that SRNs are capable of learning abstract and rule-like knowledge. In Simulation 1, I show that SRNs can learn distance-invariant representations of nonadjacent dependencies when they experience those dependencies under variable conditions. For example, SRNs trained that A predicts B consistently at a distance of 3 (e.g. A-x1-x2-B), don't easily transfer their A-B knowledge to other distances (e.g. A-x1-x2-x3-B). However, SRNs that experiences distance variability (A-x1-B, A-x1-x2-B) easily transfer their expectation of A predicting B to distances they have not seen. The fact that SRNs can learn distance invariant relations is evidence that association-based mechanisms capture this important property of natural language. These results are also consistent with broad evidence that variability is useful in language acquisition. In Simulation 2, I show that SRNs can learn abstract rule-like relationships. Based on experiments with 7-month-old infants, Marcus (2000) argued that connectionist networks are fundamentally incapable of learning abstract, rule-like knowledge. Contra Marcus's claims, I will show that SRNs (even purely localist ones that do not represent microfeatural information about phonology or semantics) can learn arbitrary, abstract, and rule-like knowledge, as long as improper assumptions are not built into the model. Together, these simulations show that (contrary to previous claims) SRNs are capable of learning abstract and rule-like nonadjacent dependencies. The studies refute the claim that neural networks and other associative models are fundamentally incapable of representing hierarchical structure, and show how recurrent networks can provide insight about principles underlying human learning and the representation of language.