In an earlier post, I argued that for the logical necessity of admitting some kind of “magic” to account for lexically arbitrary behaviors like Romance metaphony or Slavic yers. In this post I’d like to briefly consider the consequences for the theory of language acquisition.

If mature adult representations have magic, infants’ hypothesis space must also include the possibility of positing magical URs (as Jim Harris argues for Spanish or Jerzy Rubach argues for Polish).What might happen the hypothesis space was not so specified? Consider the following thought experiment:

The Rigelians from Thought Experiment #1 did not do a good job sterilizing their space ships. (They normally just lick the flying saucer real good.) Specks of Rigelian dust carry a retrovirus that infects human infants and modifies their their faculty of language so that they no longer entertain magical analyses.

What then do we suppose might happen to Spanish and Polish patterns we previously identified as instances of magic? Initially, the primary linguistic data will not have changed, just the acquisitional hypothesis space. What kind of grammar will infected Spanish-acquiring babies acquire?

For Harris (and Rubach), the answer must be that infected babies cannot acquire the metaphonic patterns present in the PLD. Since there is reason to think (see, e.g., Gorman & Yang 2019:§3) that the diphthongization is the minority pattern in Spanish, it seems most likely that the children will acquire a novel grammar in which negar ‘to deny’ has an innovative non-alternating first person singular indicative *nego rather than niego ‘I deny’.

Not all linguists agree. For instance, Bybee & Pardo (1981; henceforth BP) claim that there is some local segmental conditioning on diphthongization, in the sense that Spanish speakers may be able to partially predict whether or not a stem diphthongizes on the basis of nearby segments.¹ Similarly, Albright, Andrade, & Hayes (2001; henceforth AAH) develop a computational model which can extract generalizations of this sort.² For instance, BP claim that an e followed by __r, __nt, or __rt are more likely to diphthongize, and AAH claim that a following stem-final __rr (the alveolar trill [r], not the alveolar tap [ɾ]) and a following __mb also favor diphthongization. BP are somewhat fuzzy about the representational status of these generalizations, but for AAH, who reject the magical segment analysis, they are expressed by a series of competing rules.

I am not yet convinced by this proposal. Neither BP nor AAH give the reader any general sense of the coverage of the segmental generalizations they propose (or in the case of AAH, that their computational model discovers): I’d like to know basic statistics like precision and recall for existing words. Furthermore, AAH note that their computational model sometimes needs to fall back on “word-specific rules” (their term), rules in which the segmental conditioning is an entire stem, and I’d like to know how often this is necessary.³Rather than reporting coverage, BP and AAH instead correlate their generalizations with the results of wug™-tasks (i.e., nonce word production tasks) by Spanish-speaking adults. The obvious objection here is that no evidence—or even an explicit linking hypothesis—links adults’ generalizations about nonce words in a lab to childrens’ generalizations about novel words in more naturalistic settings.

However, I want to extend an olive branch to linguists who are otherwise inclined to agree with BP and AAH. It is entirely possible that children do use local segmental conditioning to learn the patterns linguists analyzed with magical segments and/or morphs, even if we continue to posit magic segments or morphs. It is even possible that sensitivity to this segmental conditioning persists into adulthood as reflected in the aforementioned wug™-tasks. Local segmental conditioning might be an example of domain-general pattern learning, and might be likened to sound symbolism—such as the well-known statistical tendency for English words beginning in gl– to relate to “light, vision, or brightness” (Charles Yang, p.c.)—insofar as both types of patterns reduce apparent arbitrariness of the lexicon. I am also tempted to identify both local segmental conditioning and sound symbolism as examples of third factor effect (in the sense of Chomsky 2005). Chomsky identifies three factors in the design of language: the genetic endowment, “experience” (the primary linguistic data), and finally “[p]rinciples not specific to the faculty of language”. Some examples of third factors—as these principles not specific to the faculty of language are called—given in the paper include domain-general principles of “data processing” or “data analysis” and biological constraints, whether “architectural”, “computational”, or “developmental”. I submit that general-purpose pattern learning might be an example of of domain-general “data analysis”.

As it happens, we do have one way to probe the coverage of local segmental conditioning. Modern sequence-to-sequence neural networks, arguably the most powerful domain-general string pattern learning tool known to us, have been used for morphological generation tasks. For instance, in the CoNLL-SIGMORPHON 2017 shared task, neural networks are used to predict the inflected form of various words given some citation form and a morphological specification. For instance, given the pair (dentar, V;IND;PRS;1;SG) the models have to predict diento ‘I am teething’. Very briefly, these models, as currently designed, are much like babies infected with the Rigelian retrovirus: their hypothesis space does not include “magic” segments or lexical diacritics and they must rely solely on local segmental conditioning. It is perhaps not surprising, then, that they misapply diphthongization in Spanish (e.g., *recolan for recuelan ‘they re-strain’; Gorman et al. 2019) or yer deletion in Polish, when presented with previously unseen lemmata. But it is an open question how closely these errors pattern like those made by children, or with adults’ behaviors in wug™-tasks.

Acknowledgments

I thank Charles Yang for drawing my attention to some of the issues discussed above.

Endnotes

Similarly, Rysling (2016) argues that Polish yers are epenthesized to avoid certain branching codas, though she admits that their appearance is governed in part by magic (according to her analysis, exceptional morphs of the Gouskova/Pater variety).
Later versions of this model developed by Albright and colleagues are better known for popularizing the notion of “islands of reliability”.
Bill Idsardi (p.c.) raises the question of whether magical URs and morpholexical rules are extensionally equivalent. Good question.

References

Albright, A., Andrade, A., and Hayes, B. 2001. Segmental environments of Spanish diphthongization. UCLA Working Papers in Linguistics 7: 117-151.
Bybee, J., and Pardo, E. 1981. Morphological and lexical conditioning of rules: experimental evidence from Spanish. Linguistics 19: 937-968.
Chomsky, N. 2005. Three factors in language design. Linguistic Inquiry 36(1): 1-22.
Gorman, K. and Yang, C. 2019. When nobody wins. In Franz Rainer, Francesco Gardani, Hans Christian Luschützky and Wolfgang U. Dressler (ed.), Competition in inflection and word formation, pages 169-193. Springer.
Gorman, K., McCarthy, A.D., Cotterell, R., Vylomova, E., Silfverberg, M., Markowska, M. 2019. Weird inflects but okay: making sense of morphological generation errors. In Proceedings of the 23rd Conference on Computational Natural Language Learning, pages 140-151.
Rysling, A. 2016. Polish yers revisited. Catalan Journal of Linguistics 15: 121-143.

Thought experiment #2

Acknowledgments

Endnotes

References

Leave a Reply Cancel reply