On the past tense debate; Part 2: dual-route models are (still) incomplete

Dual-route models remain for the most part incompletely specified. Because crucial details are missing from their specification, they have generally not been implemented as computational cognitive models. Therefore, there is far less empirical rigor in dual-route thinking. To put it starkly, dual-route proponents have conducted expensive, elaborate brain imaging studies to validate their model but have not proposed a model detailed enough to implement on a $400 laptop.

The dual-route description of the English past tense can be given as such:

  1. Use associative memory to find a past tense form.
  2. If this lookup fails, or times out, append /-d/ and apply phonology.

Note that this ordering is critical: one cannot ask simply ask whether a verb is regular, since by hypothesis some or all regular verbs are not stored as such. And, as we know (Berko 1958), novel and nonce verbs are almost exclusively inflected with /-d/, consistent with the current ordering.1 This model equates—rightly, I think—the notions of regularity with the elsewhere condition. The problem is with the fuzziness in how one might reach condition (2). We do not have any notion of what it might mean for associative memory lookup to fail. Neural nets, for instance, certainly do not fail to produce an output, though they will happily produce junk in certain cases. Nor do we much of a notion of how it might time out.

I am aware of two serious attempts to spell out this crucial detail. The first is Baayen et al.’s 1997 visual word recognition study of Dutch plurals. They imagine that (1) and (2) are competing activation “routes” and that recognition occurs when either of the routes reaches activation threshold, as if both routes run in parallel. To actually fit their data, however, their model immediately spawns epicycles in the form of poorly justified hyperparameters (see their fn. 2) and as far as I know, no one has ever bothered to reuse or reimplement their model.2 The second is O’Donnell’s 2015 book, which proposes a cost-benefit analysis for storage vs. computation. However, this complex  and clever model is not described in enough detail for a “white room” implementation, and no software has been provided. What dual route proponents owe us, in my opinion, is a next toolkit. Without serious investment in formal computational description and reusable, reimplementable, empirically validated models, it is hard to take the dual-route proposal seriously.

Endnotes

  1. There’s a lot of work which obfuscates this point. An impression one might get from Albright & Hayes (2003) is that adult nonce word studies produce quite a bit of irregularity, but this is only true in their rating task and hardly at all in their “volunteering” (production) task, and a hybrid task finds much higher ratings for noce irregulars. Schütze (2005) argues—convincingly, in my opinion—that this is because speakers use a different task model in rating tasks, one that is mostly irrelevant to what Albright & Hayes are studying.
  2. One might be tempted to fault Baayen et al. for using visual stimulus presentation (in a language with one of the more complex and opaque writing systems), or for using recognition as a proxy for production. While these are probably reasonably critiques today, visual word recognition was still the gold standard in 1997.

References

Albright, A. and Hayes, B. 2003. Rules vs. analogy in English past tenses: a computational/experimental study. Cognition 90(2): 119-161.
Baayen, R. H., Dijkstra, T., and Schreuder, R. 1997. Singulars and plurals in Dutch: evidence for a parallel dual-route model. Journal of Memory & Language 37(1): 94-117.
Berko, J. 1958. The child’s learning of English morphology. Word 14: 150-177.
O’Donnell, T. 2015. Productivity and Reuse in Language: a Theory of Linguistic Computation and Storage. MIT Press.
Schütze, C. 2005. Thinking about what we are asking speakers to do. In S. Kepser and M. Reis (ed.), Linguistic Evidence: Empirical, Theoretical, and Computational Perspectives, pages 457-485. Mouton de Gruyter.

On the past tense debate; Part 1: the RAWD approach

I have not had time to blog in a while, and I really don’t have much time now either. But here is a quick note (one of several, I anticipate) about the past tense debate.

It is common to talk as if connectionist approaches and dual-route models are the two opposing approaches to morphological irregularity, when in fact there are three approaches. Linguists since at least Bloch (1947)1 have claimed that regular, irregular, and semiregular patterns are all rule-governed and ontologically alike. Of course, the irregular and semiregular rules may require some degree lexical conditioning, but phonologists have rightly never seen this as some kind of defect or scandal. Chomsky & Halle (1968), Halle (1977), Rubach (1984), and Halle & Mohanan (1985) all spend quite a bit of space developing these rules, using formalisms that should be accessible to any modern-day student of phonology. These rules all the way down (henceforth RAWD) approaches are empirically adequate and have been implemented computationally with great success: some prominent instances include Yip & Sussman 1996, Albright & Hayes 2003,2 and Payne 2022. It is malpractice to ignore these approaches.

One might think that RAWD has more in common with dual-route approaches than with connectionist thinking, but as Mark Liberman noted many years ago, that is not obviously the case. Mark Seidenberg, for instance, one of the most prominent Old Connectionists, has argued that there is a tendency for regulars and irregulars to share certain structural similarities. To take one example, semi-regular slept does not look so different from stepped, and the many zero past tense forms (e.g., hit, bid) end in the same phones—[t, d]—used to mark the plural. While I am not sure this is a meaningfuly generalization, it clearly is something that both connectionist and RAWD models can encode.3 This is in contradistinction to dual-route models, which have no choice but to treat these observations as coincidences. Thus, as Mark notes, connectionists and RAWD proponents find themselves allied against dual-route models.

(Mark’s post, which I recommend, continues to draw a parallel between dual-routism and bi-uniqueness which will amuse anyone interested in the history of phonology.)

Endnotes

  1. This is not exactly obscure work: Bloch taught at two Ivies and was later the president of the LSA. 
  2. To be fair, Albright & Hayes’s model does a rather poor job recapitulating the training data, though as they argue, it generalizes nonce words in a way consistent with human behavior.
  3. For instance, one might propose that slept is exceptionally subject to a vowel shortening rule of the sort proposed by Myers (1987) but otherwise regular.

References

Albright, A. and Hayes, B. 2003. Rules vs. analogy in English past tenses: a computational/experimental study. Cognition 90(2): 119-161.
Bloch, B. 1947. English verb inflection. Language 23(4): 399-418.
Chomsky, N., and Halle, M. 1968. Sound Pattern of English. Harper & Row.
Halle, M. 1977. Tenseness, vowel shift and the phonology of back vowels in Modern English. Linguistic Inquiry 8(4): 611-625.
Halle, M., and Mohanan, K. P. 1985. Segmental phonology of Modern English. Linguistic Inquiry 16(1): 57-116.
Myers, S. 1987. Vowel shortening in English. Natural Language & Linguistic Theory 5(4): 485-518.
Payne, S. R. 2022. When collisions are a good thing: the acquisition of morphological marking. Bachelor’s thesis, University of Pennsylvania. 
Pinker, S. 1999. Words and Rules: the Ingredients of Language. Basic Books.
Rubach, J. 1984. Segmental rules of English and cyclic phonology. Language 60(1): 21-54.
Yip, K., and Sussman, G. J. 1997. Sparse representations for fast, one-shot learning. In Proceedings of the 14th National Conference on Artificial Intelligence and 9th Conference on Innovative Applications of Artificial Intelligence, pages 521-527.

The Wordlikeness Project

We (myself, Karthik Durvasula, and Jimin Kahng) recently got the good news that our NSF collaborative research proposal has been funded. This works springs ultimately from my dissertation. There I argue—using a mix of logical argumentation and “archival” wordlikeness data mostly taken from appendices of previously published work—that the view of phonotactic grammar as statistical patterns or constraints projected from the lexicon is not strongly supported by the available data. My conclusions are perhaps weakened by the low overall quality of this archival data, which is drawn from various stimulus presentation modalities (i.e., auditory vs. orthographic) and response modalities (Likert scale vs. binary forced-choice vs. transcription). In the NSF study, we will be collecting wordlikeness data in English and Korean, manipluating these stimulus presentation and response modalities, and this data will be made publicly available under the name of the Wordlikeness Project. (Here we draw inspiration from the English Lexicon Project and spinoffs.) We will also be using this data for extensive computational modeling, to answer some of the questions raised in my dissertation and in Karthik and Jimin’s subsequent work.

Stop being weird about the Russian language

As you know, Russia is waging an unprovoked war on Ukraine. It should go without saying that my sympathies are with Ukraine, but of course both states are undemocratic, one-party kleptocracies and I have little hope for anything good coming from the conflict.

That’s all besides the point. Since the start of the war, I have had several conversations with linguists who suggested that the study of the Russian language—one of the most important languages in linguistic theorizing over the years—is now “cringe”. This is nonsense. First, official statistics show that a majority of Ukrainian citizens identify as ethnically Russian, and that a substantial minority speak Russian as a first language (and this is probably skewed by social-desirability bias). Secondly, it is wrong to identify a language with any one nation. (It is “cringe” to use flag emojis to label languages; just use the ISO codes.) Third, it is foolish to equate the state with the people who live underneath them, particularly after the end of the kind of mass political movements that in earlier times could stop this kind of state violence. It is a basic corollary of the i-language view that children learn whatever languages they’re sufficiently exposed to, regardless of their location or of their caretakers’ politics. The iniquity of war does not travel from nation to language to its speakers. Stop being weird about it.

The end of defectivity

As of yesterday I have completed my series of defectivity case studies, at least for the time being. From these I propose the following tentative taxonomy:

It is not clear to me whether three categories are really needed. In both of the latter two, here seems to be some tight phonotactic constraint on inflectional variants which results in ungrammaticality and defectivity if not satisfied. In the two cases from Africa, these constraints are of a metrical nature and impact many lexemes; in the cases from Scandinavia, they concern stem-final consonant clusters and possible mutations to them. And this looks a lot like the case of Russian verbs. This just leaves Tagalog, which I think has simply been misanalyzed, and Turkish, where the only defective lexemes are a handful of subminimal borrowings.

I am aware of two other cases of interest: (various stages of) Sanskrit (Stump 2010) and Latvian. These are phenomenologically quite different from the ones I’ve discussed so far: both involve gaps in the paradigms of inflected pronouns. I do not find gaps in the distribution of functional elements to be nearly as shocking as the failure of, say, an otherwise unobjectionable Russian or Spanish verb to have a 1sg. form. I should mention that the constraint against contracting n’t to am in standard English (see Yang 2017:§3 and references therein) is also possibly an example of this form; I suppose it depends on whether or not n’t is really an inflectional affix or not.

References

Stump, G. 2010. Interactions between defectiveness and syncretism. In M. Baerman, G. G. Corbett, and D. Brown (ed.), Defective Paradigms: Missing Forms and What They Tell Us, pages 181-210. Oxford University Press.
Yang, C. 2017. How to wake up irregular (and speechless). In C. Bowern, L. Horn, and R. Zanuttini (ed.), On Looking into Words (and Beyond), pages 211-233. Language Science Press.

Defectivity in Hungarian

[This is part of a series of defectivity case studies.]

Hungarian verbs exhibit one of better-documented cases of defectivity, discussed first in English by Hetzron (1975). In particular, the study by Rebrus & Törkenczy (2009), henceforth RT, is a wealth of information, though this information is presented in so many different forms that it has taken me some effort to extract the basic generalizations. Furthermore, they seem to have deliberately avoided putting forth an actual analysis. Rather than trying to explain the data RT’s way I’ve tried to put matters into my own words. Note that below I often give multiple allomorphs for a suffix or epenthetic segment; these are conditioned by Hungarian’s well-known process of vowel harmony (see, e.g., Siptár & Törkenczy 2000 for description).

Hungarian has a class of verbs, mostly intransitive, which form a 3sg. indefinite present indicative in -ik rather than the ordinary null ending. Of these, some such verb stems end in surface consonant clusters. When consonant-initial suffixes (e.g., the imperative -j, the imperative subjunctive -d, the potential -hat/-het) are added to verb stems of this form, a fleeting vowel (o, ӧ, or e) normal breaks up the stem cluster (e.g., ugrik ‘s/he jumps’,  ugorhat ‘s/he may jump’). It is not clear to me whether this fleeting vowel is underlying or epenthetic, though it seems to be standard in Hungarian philology to assume the latter. However, for certain verbs, this does not occur and the verb is simply defective (e.g., csuklik ‘s/he hiccups’ but *csuklhat, *csukolhat). According to Hetzron  (ibid., 864f.)  all the defective verbs have stems of the form /…Cl/ or /…Cz/, though it is not the case that all verbs of this form are defective (e.g., vérzik ‘s/he bleeds’, vérezhe‘s/he may bleed’; hajlik ‘s/he bends’, hajolhat ‘s/he may bend’). A study by Lukács, Rebrus, and Törkenczy (2010), conducted detailed grammaticality judgment tasks and appears to largely have confirmed the description given by Hetzron and RT.

What is the source of these patterns?  It may be that whatever causes the fleeting vowel to be epenthesized is less than fully productive; but then it is also necessary to also appeal to absolute phonotactic illformedness to derive the defectivity. It is not obvious that this will work either, because at least some of these clusters ought to simplify according to RT.

References

Hetzron, R. 1975. Where the grammar fails. Language 51: 859-872.
Lukács, Á., Rebrus, P., and Törkenczy, M. 2010. Defective verbal paradigms in Hungarian: description and experimental study. In M. Baerman, G. C. Corbett, and D. Brown (ed.), Defective Paradigms: Missing Forms and What They Tell Us, pages 85-102. Oxford University Press.
Rebrus, P., and Törkenczy, M. 2009. Covert and overt defectiveness in paradigms. In C. Rice and S. Blaho (ed)., When Nothing Wins: Modeling Ungrammaticality in OT, pages 195-234. Equinox.
Siptár, P., and Törkenczy, M. 2000. The Phonology of Hungarian. Oxford University Press.

Defectivity in Spanish

[This is part of a series of defectivity case studies.]

Harris (1969:114) observes that the Spanish verbs agredir ‘to attack’ and aguerrir ‘to harden’ are defective in certain inflectional forms. Gorman & Yang (2019:180), consulting various Spanish dictionaries, expand this list to include abolir ‘to abolish’, arrecir(se) ‘to freeze’, aterir(se) ‘to freeze’, colorir ‘to color/dye’, descolorir ‘to discolor/bleach’, despavorir ‘to fear’, empedernir ‘to harden’, preterir ‘to ignore’, and tra(n)sgredir ‘to transgress’.

All of the defective Spanish verbs appeear to belong to the 3rd (-ir) conjugation, which is the smallest of the three and characterized by extensive irregularity. including raising and/or diphthongization of mid eo to i, and to ieue, respectively. For instance, the verb dormir ‘to sleep’ (along with the minimally different morir ‘to die’) undergoes diphthongization to ue when stress falls on the stem (e.g., duermo ‘I sleep’) and raising in various desinence-stressed forms (e.g., durmamos ‘we would sleep’). According to  Maiden & O’Neill (2010), it is exactly these forms which would show stem vowel changes which are defective (e.g., in the paradigm of abolir), though this claim ought to be verified with native speakers.

Harris has long argued that these stem changes are limited to stems bearing abstract phonemes. In his 1969 book, stems which undergo diphthongization bear an abstract feature +D; in his 1985 paper, a similar distinction between diphthongizing and non-alternating stem mid vowels is marked using moraic prespecification. The apparent raising of stem mid vowels is analyzed as a dissimilatory lowering of an underlying [+high] vowel, but since there are some stems which do not lower (e.g., vivir/vivo ‘to live/I live’), similar magic (i.e., abstract specifications) is likely called for. In contrast, I understand Bybee & Pardo (1981) and Albright et al. (2001) as arguing that these stem changes are locally predictable, conditioned by nearby segments.1 Albright (2003) suggests that competition between these locally predictable conditioning factors is responsible for defectivity.

Gorman & Yang (2019) argue that, on the basis of a Tolerance Principle analysis, that there are no productive generalizations for 3rd conjugation mid vowel stem verbs in Spanish, if diphthongization and raising are viewed as competitors to a “no change” analysis.2 In support of this, they note that children acquiring Spanish as their first language only very rarely produce incorrect stem changes in the 3rd conjugation. This suggests that during production, they may be “picking and choosing” verbs for which they have already acquired the relevant inflectional patterns.

Endnotes

  1. Of course their arguments are largely limited to adult nonce word studies. I consider this inherently dubious for reasons discussed by Schütze (2005).
  2. They claim “no change” is productive in both the 1st and 2nd conjugations. In support of this they note that diphthongization is commonly underapplied in these conjugations by children acquiring Spanish as their first language.

References

Albright, A., Andrade, A., and Hayes, B. 2001. Segmental environments of Spanish diphthongization. UCLA Working Papers in Linguistics 7: 117-151.
Albright, A. 2003. A quantitative study of Spanish paradigm gaps. In Proceedings of the 22th West Coast Conference on Formal Linguistics, pages 1-14.
Bybee, J. L. and Pardo, E. 1981. On lexical and morphological conditioning of alternations: a nonce-prob e experiment with Spanish verbs. Linguistics 19: 937-968.
Gorman, K. and Yang, C. 2019. When nobody wins. In F. Rainer, F. Gardani, H. C. Luschützky and W. U. Dressler (ed.). Competition in Inflection and Word Formation, pages 169-193. Springer.
Harris, J. W. 1969. Spanish Phonology. MIT Press.
Harris, J. W. 1985. Spanish diphthongisation and stress: a paradox resolved. Phonology 2: 31-45.
Maiden, M. and O’Neill, P. 2010. On morphomic defectiveness: evidence from the Romance languages of the Iberian peninsula. In M. Baerman, G. G. Corbett, and D. Brown (ed)., Defective Paradigms: Missing Forms and What They Tell Us, pages 103-124. Oxford University Press.
Schütze, C. 2005. Thinking about what we are asking speakers to do. In S. Kepser and M. Reis (ed.), Linguistic Evidence: Empirical, Theoretical, and Computational Perspectives, pages 457-485. Mouton de Gruyter.

Animal liberation for linguists

I recently read Animal Liberation (Singer 1975), an attempt to develop a consistent ethics of human treatment of animals. Somewhat surprisingly, several parts were interesting to me as a linguist. In the first chapter (and a few other places throughout), Singer ribs Descartes for the speciesism he sees in the Frenchman’s notions of the human soul; Singer prefers the focus on the capacity for suffering rather than anything so ethereal as a soul, and I guess this would be a problem if you think Cartesian duality has something to do with the human language endowment.

But more interesting to me was the second chapter, entitled “Tools for research…”, with the subtitle “your taxes at work”. The subject of this chapter is animals as research subjects. This is not something I think about much: animals are not terribly useful subjects in linguistics. Singer argues that experimentation on animals is not so much cruel as gratituously cruel, and unconnected to any sensible scientific hypotheses. I simply must describe some of these cruelties to make the point; you’re welcome to skip along if you are easily disturbed. 

Singer discusses baroque and sadistic experiments in which apes are first trained to operate primitive flight simulators and then exposed to lethal doses of radiation, presumably to test whether they really are lethal…I’m not sure. He discusses experiments trying to distress and impair monkeys into psychopathololgy, noting that other experts doubt the existence of psychopathic monkeys outside of the lab. He mentions attempts to inflict “learned helplessness” in rats via arbitrary shock, though notes that the authors report that the “implications of these findings for learned helplessness theory are not entirely clear”. I can at this point interject that there does not seem to be any substantive content in “learned helplessness theory” beyond the ideas that helplessness can be induced in animals via arbitrary cruelty, that this state of mind would be similar to depression in humans, and there are numerous reasons to doubt both claims. Singer mentions labs that have run their regimens of cruelty over dozens of animals: Shetland ponies but also “white rats, kangaroo rats, wood rats, hedgehogs, dogs, cats, monkeys, opposums, seals, dolphins, and elephants” (p. 91). Singer concludes (loc. cit.) that “despite the suffering the animals have gone through, the results obtained, even as reported by the experimenters themselves, are trivial, obvious, or meaningless.” Singer describes studies, going back into the 19th century, establishing that dogs do in fact die in hot cars.

Why does this cruelty have so little point? I suspect in part it is due to the theoretical morass of mid-century behaviorist doctrine and its confusion about the ways in which humans are like and unlike other animals, its tendency to see humans as atypical rats.  This is illustrated clearly by a quote from an unnamed psychologist:

When fifteen years ago I applied to do a degree course in
psychology, a steely-eyed interviewer, himself a psychologist, questioned me closely on my motives and asked me what I believed psychology to be and what was its principal subject matter? Poor naive simpleton that I was, I replied that it was the study of the mind and that human  beings were its raw material. With a glad cry at being able to deflate me so effectively, the interviewer declared that psychologists were not interested in the mind, that rats were the golden focus of study, not people, and then he advised me strongly to trot
around to the philosophy department next door… (p. 94)

Certainly the rats don’t benefit from the cruelty, and if the cruelty is not for humans either, what could the point be?

One thing I am acutely aware of is that students need something to do. Each scientific paradigm vying for hegemony must have an answer to this question. For behaviorism, the answer is to torture and manipulate animals. One of the virtues of the so-called cognitive turn has been the development of cruelty-free ways to penetrate the mind.

References

Singer, P. 1975. Animal Liberation: A New Ethics for Our Treatment of Animals. HarperCollins.

The next toolkit 2: electric boogaloo

I just got back from the excellent Workshop on Model Theoretic Representations in Phonology. While I am not exactly a member of the “Delaware school”, i.e., the model-theoretic phonology (MTP) crowd, I am a big fan. In my talk, I contrasted the model-theoretic approach to an approach I called the black box approach, using neural networks and program synthesis solvers as examples of the latter. I likened the two styles to neat vs. scruffy, better is better vs. worse is better, rationalists vs. empiricists, and cowboys vs. aliens.

One lesson I drew from this comparison is the need for MTPists to develop high-quality software—the next toolkit 2. I didn’t say much during my talk about what I imagine this to be like, so I thought I’d leave my thoughts here. Several people—Alëna Aksënova, Hossep Dolatian, and Dakotah Lambert, for example—have developed interesting MTP-oriented libraries. While I do not want to give short schrift to their work, I think there are two useful models for the the next next toolkit: (my own) Pynini and PyTorch. Here is what I see as the key features:

  1. They are ordinary Python on the front-end. Of course, both have a C++ back-end, and PyTorch has a rarely used C++  API, but that’s purely a matter of performance; both have been slowly moving Python code into the C++ layer over the course of their development.The fact of the matter is that in 2022, just about anyone who can code at all can do so in Python.
  2. While both are devilishly complex, their design follows the principle of least suprise; there is only a bit of what Pythonistas call exhuberant syntax (Pynini’s use of the @ operator, PyTorch’s use of _ to denote in-place methods).
  3. They have extensive documentation (both in-module and up to book length).
  4. They have extensive test suites.
  5. They are properly packaged and can be installed via PyPi (i.e., via pip) or Conda-Forge (via conda).
  6. They have corporate backing.

I understand that many in the MTP community are naturally—constitutionally, even—drawn to functional languages and literate programming. I think this should not be the initial focus. It should be ease of use, and for that it is hard to beat ordinary Python in 2022. Jupyter/Colab support is a great idea, though, and might satisfy the literate programming itch too.

Dialectical vs. dialectal

The adjective dialectical describes ideas reasoned about through dialectic, or the interaction of opposing or contradictory ideas. However, it is often used to in a rather different sense: ‘pertaining to dialects’. For that sense, the more natural word—and here I am being moderately prescriptivist, or at least distinctivist—is dialectal. Dialectical used for this latter sense is, in my opinion, a solecism. This essentially preserves a nice distinction, like the ones between classic and classical and between economic and economical. And certainly there are linguists who have good reason to write about both dialects and dialectics, perhaps even in the same study.