Defectivity in Hungarian

[This is part of a series of defectivity case studies.]

Hungarian verbs exhibit one of better-documented cases of defectivity, discussed first in English by Hetzron (1975). In particular, the study by Rebrus & Törkenczy (2009), henceforth RT, is a wealth of information, though this information is presented in so many different forms that it has taken me some effort to extract the basic generalizations. Furthermore, they seem to have deliberately avoided putting forth an actual analysis. Rather than trying to explain the data RT’s way I’ve tried to put matters into my own words. Note that below I often give multiple allomorphs for a suffix or epenthetic segment; these are conditioned by Hungarian’s well-known process of vowel harmony (see, e.g., Siptár & Törkenczy 2000 for description).

Hungarian has a class of verbs, mostly intransitive, which form a 3sg. indefinite present indicative in -ik rather than the ordinary null ending. Of these, some such verb stems end in surface consonant clusters. When consonant-initial suffixes (e.g., the imperative -j, the imperative subjunctive -d, the potential -hat/-het) are added to verb stems of this form, a fleeting vowel (o, ӧ, or e) normal breaks up the stem cluster (e.g., ugrik ‘s/he jumps’, ugorhat ‘s/he may jump’). It is not clear to me whether this fleeting vowel is underlying or epenthetic, though it seems to be standard in Hungarian philology to assume the latter. However, for certain verbs, this does not occur and the verb is simply defective (e.g., csuklik ‘s/he hiccups’ but *csuklhat, *csukolhat). According to Hetzron (ibid., 864f.) all the defective verbs have stems of the form /…Cl/ or /…Cz/, though it is not the case that all verbs of this form are defective (e.g., vérzik ‘s/he bleeds’, vérezhet ‘s/he may bleed’; hajlik ‘s/he bends’, hajolhat ‘s/he may bend’). A study by Lukács, Rebrus, and Törkenczy (2010), conducted detailed grammaticality judgment tasks and appears to largely have confirmed the description given by Hetzron and RT.

What is the source of these patterns? It may be that whatever causes the fleeting vowel to be epenthesized is less than fully productive; but then it is also necessary to also appeal to absolute phonotactic illformedness to derive the defectivity. It is not obvious that this will work either, because at least some of these clusters ought to simplify according to RT.

References

Hetzron, R. 1975. Where the grammar fails. Language 51: 859-872.
Lukács, Á., Rebrus, P., and Törkenczy, M. 2010. Defective verbal paradigms in Hungarian: description and experimental study. In M. Baerman, G. C. Corbett, and D. Brown (ed.), Defective Paradigms: Missing Forms and What They Tell Us, pages 85-102. Oxford University Press.
Rebrus, P., and Törkenczy, M. 2009. Covert and overt defectiveness in paradigms. In C. Rice and S. Blaho (ed)., When Nothing Wins: Modeling Ungrammaticality in OT, pages 195-234. Equinox.
Siptár, P., and Törkenczy, M. 2000. The Phonology of Hungarian. Oxford University Press.

Defectivity in Spanish

[This is part of a series of defectivity case studies.]

Harris (1969:114) observes that the Spanish verbs agredir ‘to attack’ and aguerrir ‘to harden’ are defective in certain inflectional forms. Gorman & Yang (2019:180), consulting various Spanish dictionaries, expand this list to include abolir ‘to abolish’, arrecir(se) ‘to freeze’, aterir(se) ‘to freeze’, colorir ‘to color/dye’, descolorir ‘to discolor/bleach’, despavorir ‘to fear’, empedernir ‘to harden’, preterir ‘to ignore’, and tra(n)sgredir ‘to transgress’.

All of the defective Spanish verbs appeear to belong to the 3rd (-ir) conjugation, which is the smallest of the three and characterized by extensive irregularity. including raising and/or diphthongization of mid e, o to i, u and to ie, ue, respectively. For instance, the verb dormir ‘to sleep’ (along with the minimally different morir ‘to die’) undergoes diphthongization to ue when stress falls on the stem (e.g., duermo ‘I sleep’) and raising in various desinence-stressed forms (e.g., durmamos ‘we would sleep’). According to Maiden & O’Neill (2010), it is exactly these forms which would show stem vowel changes which are defective (e.g., in the paradigm of abolir), though this claim ought to be verified with native speakers.

Harris has long argued that these stem changes are limited to stems bearing abstract phonemes. In his 1969 book, stems which undergo diphthongization bear an abstract feature +D; in his 1985 paper, a similar distinction between diphthongizing and non-alternating stem mid vowels is marked using moraic prespecification. The apparent raising of stem mid vowels is analyzed as a dissimilatory lowering of an underlying [+high] vowel, but since there are some stems which do not lower (e.g., vivir/vivo ‘to live/I live’), similar magic (i.e., abstract specifications) is likely called for. In contrast, I understand Bybee & Pardo (1981) and Albright et al. (2001) as arguing that these stem changes are locally predictable, conditioned by nearby segments.¹ Albright (2003) suggests that competition between these locally predictable conditioning factors is responsible for defectivity.

Gorman & Yang (2019) argue that, on the basis of a Tolerance Principle analysis, that there are no productive generalizations for 3rd conjugation mid vowel stem verbs in Spanish, if diphthongization and raising are viewed as competitors to a “no change” analysis.² In support of this, they note that children acquiring Spanish as their first language only very rarely produce incorrect stem changes in the 3rd conjugation. This suggests that during production, they may be “picking and choosing” verbs for which they have already acquired the relevant inflectional patterns.

Endnotes

Of course their arguments are largely limited to adult nonce word studies. I consider this inherently dubious for reasons discussed by Schütze (2005).
They claim “no change” is productive in both the 1st and 2nd conjugations. In support of this they note that diphthongization is commonly underapplied in these conjugations by children acquiring Spanish as their first language.

References

Albright, A., Andrade, A., and Hayes, B. 2001. Segmental environments of Spanish diphthongization. UCLA Working Papers in Linguistics 7: 117-151.
Albright, A. 2003. A quantitative study of Spanish paradigm gaps. In Proceedings of the 22th West Coast Conference on Formal Linguistics, pages 1-14.
Bybee, J. L. and Pardo, E. 1981. On lexical and morphological conditioning of alternations: a nonce-prob e experiment with Spanish verbs. Linguistics 19: 937-968.
Gorman, K. and Yang, C. 2019. When nobody wins. In F. Rainer, F. Gardani, H. C. Luschützky and W. U. Dressler (ed.). Competition in Inflection and Word Formation, pages 169-193. Springer.
Harris, J. W. 1969. Spanish Phonology. MIT Press.
Harris, J. W. 1985. Spanish diphthongisation and stress: a paradox resolved. Phonology 2: 31-45.
Maiden, M. and O’Neill, P. 2010. On morphomic defectiveness: evidence from the Romance languages of the Iberian peninsula. In M. Baerman, G. G. Corbett, and D. Brown (ed)., Defective Paradigms: Missing Forms and What They Tell Us, pages 103-124. Oxford University Press.
Schütze, C. 2005. Thinking about what we are asking speakers to do. In S. Kepser and M. Reis (ed.), Linguistic Evidence: Empirical, Theoretical, and Computational Perspectives, pages 457-485. Mouton de Gruyter.

Animal liberation for linguists

I recently read Animal Liberation (Singer 1975), an attempt to develop a consistent ethics of human treatment of animals. Somewhat surprisingly, several parts were interesting to me as a linguist. In the first chapter (and a few other places throughout), Singer ribs Descartes for the speciesism he sees in the Frenchman’s notions of the human soul; Singer prefers the focus on the capacity for suffering rather than anything so ethereal as a soul, and I guess this would be a problem if you think Cartesian duality has something to do with the human language endowment.

But more interesting to me was the second chapter, entitled “Tools for research…”, with the subtitle “your taxes at work”. The subject of this chapter is animals as research subjects. This is not something I think about much: animals are not terribly useful subjects in linguistics. Singer argues that experimentation on animals is not so much cruel as gratituously cruel, and unconnected to any sensible scientific hypotheses. I simply must describe some of these cruelties to make the point; you’re welcome to skip along if you are easily disturbed.

Singer discusses baroque and sadistic experiments in which apes are first trained to operate primitive flight simulators and then exposed to lethal doses of radiation, presumably to test whether they really are lethal…I’m not sure. He discusses experiments trying to distress and impair monkeys into psychopathololgy, noting that other experts doubt the existence of psychopathic monkeys outside of the lab. He mentions attempts to inflict “learned helplessness” in rats via arbitrary shock, though notes that the authors report that the “implications of these findings for learned helplessness theory are not entirely clear”. I can at this point interject that there does not seem to be any substantive content in “learned helplessness theory” beyond the ideas that helplessness can be induced in animals via arbitrary cruelty, that this state of mind would be similar to depression in humans, and there are numerous reasons to doubt both claims. Singer mentions labs that have run their regimens of cruelty over dozens of animals: Shetland ponies but also “white rats, kangaroo rats, wood rats, hedgehogs, dogs, cats, monkeys, opposums, seals, dolphins, and elephants” (p. 91). Singer concludes (loc. cit.) that “despite the suffering the animals have gone through, the results obtained, even as reported by the experimenters themselves, are trivial, obvious, or meaningless.” Singer describes studies, going back into the 19th century, establishing that dogs do in fact die in hot cars.

Why does this cruelty have so little point? I suspect in part it is due to the theoretical morass of mid-century behaviorist doctrine and its confusion about the ways in which humans are like and unlike other animals, its tendency to see humans as atypical rats. This is illustrated clearly by a quote from an unnamed psychologist:

When fifteen years ago I applied to do a degree course in
psychology, a steely-eyed interviewer, himself a psychologist, questioned me closely on my motives and asked me what I believed psychology to be and what was its principal subject matter? Poor naive simpleton that I was, I replied that it was the study of the mind and that human beings were its raw material. With a glad cry at being able to deflate me so effectively, the interviewer declared that psychologists were not interested in the mind, that rats were the golden focus of study, not people, and then he advised me strongly to trot
around to the philosophy department next door… (p. 94)

Certainly the rats don’t benefit from the cruelty, and if the cruelty is not for humans either, what could the point be?

One thing I am acutely aware of is that students need something to do. Each scientific paradigm vying for hegemony must have an answer to this question. For behaviorism, the answer is to torture and manipulate animals. One of the virtues of the so-called cognitive turn has been the development of cruelty-free ways to penetrate the mind.

References

Singer, P. 1975. Animal Liberation: A New Ethics for Our Treatment of Animals. HarperCollins.

The next toolkit 2: electric boogaloo

I just got back from the excellent Workshop on Model Theoretic Representations in Phonology. While I am not exactly a member of the “Delaware school”, i.e., the model-theoretic phonology (MTP) crowd, I am a big fan. In my talk, I contrasted the model-theoretic approach to an approach I called the black box approach, using neural networks and program synthesis solvers as examples of the latter. I likened the two styles to neat vs. scruffy, better is better vs. worse is better, rationalists vs. empiricists, and cowboys vs. aliens.

One lesson I drew from this comparison is the need for MTPists to develop high-quality software—the next toolkit 2. I didn’t say much during my talk about what I imagine this to be like, so I thought I’d leave my thoughts here. Several people—Alëna Aksënova, Hossep Dolatian, and Dakotah Lambert, for example—have developed interesting MTP-oriented libraries. While I do not want to give short schrift to their work, I think there are two useful models for the the next next toolkit: (my own) Pynini and PyTorch. Here is what I see as the key features:

They are ordinary Python on the front-end. Of course, both have a C++ back-end, and PyTorch has a rarely used C++ API, but that’s purely a matter of performance; both have been slowly moving Python code into the C++ layer over the course of their development.The fact of the matter is that in 2022, just about anyone who can code at all can do so in Python.
While both are devilishly complex, their design follows the principle of least suprise; there is only a bit of what Pythonistas call exhuberant syntax (Pynini’s use of the @ operator, PyTorch’s use of _ to denote in-place methods).
They have extensive documentation (both in-module and up to book length).
They have extensive test suites.
They are properly packaged and can be installed via PyPi (i.e., via pip) or Conda-Forge (via conda).
They have corporate backing.

I understand that many in the MTP community are naturally—constitutionally, even—drawn to functional languages and literate programming. I think this should not be the initial focus. It should be ease of use, and for that it is hard to beat ordinary Python in 2022. Jupyter/Colab support is a great idea, though, and might satisfy the literate programming itch too.

Dialectical vs. dialectal

The adjective dialectical describes ideas reasoned about through dialectic, or the interaction of opposing or contradictory ideas. However, it is often used to in a rather different sense: ‘pertaining to dialects’. For that sense, the more natural word—and here I am being moderately prescriptivist, or at least distinctivist—is dialectal. Dialectical used for this latter sense is, in my opinion, a solecism. This essentially preserves a nice distinction, like the ones between classic and classical and between economic and economical. And certainly there are linguists who have good reason to write about both dialects and dialectics, perhaps even in the same study.

Defectivity in Greek

[This is part of a series of defectivity case studies.]

Sims (2006: ch. 4, henceforth just S) documents gaps in the genitive plural (gen.pl.) forms of modern Greek nouns. These gaps appear to be reasonably well known and quite common: S’s primary data comes from two Greek dictionaries published in the 1990s. The LNEG dictionary contains over 1,600 nouns defective in the gen.pl.; the LKN over 1,300.¹ If this is even remotely close to correct, this is still probably the most extensive case of defectivity, in terms of affected lexemes, documented in any language. Interestingly, S notes that there is “surprisingly little agreement between the two dictionaries about which lexemes are defective in the genitive plural” (S: 81, fn. 54), with the two dictionaries agreeing on 470 nouns.

Modern Greek declines nouns for four cases and two numbers. The primary gen.pl. marker, written -ων (-on) may or many not trigger rightward shift of primary stress, depending on the noun. Pappús ‘grandfather’ has no gen.pl. stress shift (pappúdon), whereas náftis ‘sailor’ and ónoma ‘name’ do (naftón, onomáton).² There does not seem to be much—except etymology—which predicts whether or not a noun will shift. S suggests that there is an important distinction between nouns which already have final stress in the nom.sg., and others which have stress on earlier syllables in the nom.sg., as the former, represented by pappús and referred to by S as “type 1” and “columnar stress” nouns, never have stress shift in the gen.pl., and do not exhibit gen.pl. gaps.³ Assuming this is correct, all that remains is to understand why there is no productive default in the non-“type 1” declensional classes.

There are several other things going on here that are not yet well understood. First, modern Greek is in some sense diglossic, and more specifically pluricentric, with vernacular and learned registers (known as dimotiki and katharevousa, respectively) existing alongside various divergent dialects. One might be tempted to suggest that this would result in heightened awareness of lexical variation in gen.pl. stress shift. Second, S notes that the genitive itself is in competition with a periphrastic alternative that uses a preposition governing the accusative case (S: 86f.). According to S, the periphrastic accusative is associated with dimotiki and the genitive with katharevousa. Since katharevousa is itself reasonably artificial, colloquial use of the genitive may be increasingly moribund. Third, the LNK dictionary also lists quite a few nouns are defective in other plural forms as well. These have received little attention; some are probably singularia tantum (see, e.g., Sims 2015: 150) but perhaps something else is going on too.

Most of the points made by S in her 2006 dissertation are echoed in her 2015 monograph. I have little to say about her idea that defectivity is encoded via implicational hierarchies in the morphological paradigms, and I didn’t find the experimental results informative.

Endnotes

See S (ibid.) for these citations. I have briefly consulted both works in print, just to spot check S’s coding. Since I found no major issues, I am instead working from tables in S’s appendices. Thanks to Lucas Ashby for help digitizing these.
I have taken the liberty of standardizing S’s transliterations somewhat.
It is hard to tell whether S thinks this is a synchronic explanation for defectivity. Elsewhere, S (ch. 5) claims that defectivity is grammaticalized (or perhaps lexicalized) by essentially having the morphological module return some kind of “null” for defective combinations, so that there is no need for further synchronic motivation. S seems to consider this a desirable conclusion.

References

Sims, A. 2006. Minding the gap: Inflectional defectiveness in a paradigmatic theory. Doctoral dissertation, Ohio State University.
Sims, A. 2015. Inflectional Defectiveness. Cambridge University Press.

Defectivity in Russian; part 2: nouns

[This is part of a series of defectivity case studies.]

Pertsova (2005, henceforth P) describes defectivity in the genitive plural (henceforth, gen.pl.) of Russian nouns. This defectivity is not quite as extensive as in the verbs: Zaliznyak’s (1977) renowned morphological dictionary, Pertsova’s primary source, labels roughly forty noun lexemes as Р.мн. затрудн. ‘gen.pl. difficult’ (P gives this as ‘awkward’), and a smaller number are labeled Р.мн. нет ‘gen.pl. does not exist’. Pre-theoretically, the Russian gen.pl. form has three complexities. First, there are three (surface) gen.pl. suffixes: -ej, -ov, and a null -∅ (though this is often taken to be the surface form of a back yer; e.g., Bailyn & Nevins 2008). Secondly, stress may either fall on the stem or the desinence, depending on a number of factors. First, the fleeting vowels found in many Slavic languages, sometimes surface in the final syllable of the stem of null gen.pl. forms; e.g., kiška ‘gut’ has a gen.pl kišok.

Let us consider the ‘do not exist’ cases first. A few examples are dno ‘bottom’, mgla ‘haze’, and mzda, an archaic word meaning ‘bribe’. All three of these are built from stems consisting solely of consonants—the -o and -a are nom.sg. affixes—and according to P, we would expect them to form a null gen.pl.¹ Were they do so, then, the gen.pl. form would be a purely consonantal sequence (*dn, *mgl, *mzd, etc.). While Russian allows quite rich onsets, it does not permit vowel-less prosodic words. So, presumably, these ‘do not exist’ cases are in some sense derived, but are unpronounceable.

The cases labeled by Zaliznyak as ‘difficult’ are perhaps more challenging. These include fata ‘veil’, yula ‘weasel’, and suma ‘bag, pouch’. Many of these (e.g., čalma ‘turban’, taxta ‘ottoman’, mulla ‘mullah’) are borrowings from Tatar or other Turkic languages. All forty ‘difficult’ nouns are all feminines in -a, and all of them have desinential stress (i.e., stress on the case/number suffix) throughout. According to P, these ‘difficult’ nouns would all have a null gen.pl. were it not ineffable. This means that stem stress is the only option (because duh, there’s no way to stress a null desinence). Yet speakers are reluctant to introduce a new, stem-stressed allomorph of the stem that does not otherwise exhibit stem stress, because of a principle known as lexical conservatism.

I follow this basic logic, but I am not sure how to make the lexical conservatism filter into something mechanistic and appropriately parameterized so that it can actually predict the data we observe. P points out that there are other nouns which have exactly the same properties (null gen.pl., desinential stress) but resolve this conflict by retracting stress to the final syllable of the stem. For example, borodá ‘beard’ has a null, stem-stressed, but unobjectionable gen.pl. boród, as does the previously mentioned kišká/kišók.² After considering and rejecting an appeal to frequency, P suggests that there is something special about monosyllabic stems that makes them more resistent to stem allomorphy. The problems do not end here, because as P gamely points out, the gen.pl. is not the only case/number combination which forces stress retraction; it also happens in the null nom.sg. of masculines like stol ‘table’, but I am aware of any Russian nouns which are defective in the nom.sg.

While I find P’s description clear, I do not think her analysis works. It correctly identifies two interesting explananda—the fact that gaps are localized to monosyllabic stems, and to the genitive plurals—but gives little explanation for these facts. And the invocation of lexical conservatism, never that well-defined in the first place, has done little to help; it could just as well be that the conflict between desinential stress and the selection of a null gen.pl. produces ineffability. Lexical conservatism actually prevents us from uniting the ‘does not exist’ and the ‘difficult’ cases, which after all both involve monosyllabic stems expected to exhibit desinential stress and a null gen.pl.³

Sims (2015) notes that these defective nouns might be an interesting case for studying the interaction between defectivity and syncretism, a topic first discussed by Stump (2010). In Russian there is syncretism between gen.pl. and accusative plural (acc.pl.) forms for all animate nouns. Since twelve of the defective nouns are animates, this means that either there is an exception to this syncretism (in the case that the acc.pl. is acceptable) or perhaps, the defectivity itself is carried over by the syncretism and nouns are defective in the acc.pl. too. No one seems to have gathered the relevant judgments about the well-formedness of these animates in their gen.pl. vs. acc.pl. forms yet.

Postscript

As is also the case for Russian verbs, defectivity in Russian nouns has arguably risen to the level of conscious awareness among Russophones. In his short story Kocherga, the Soviet humorist Mikhail Zoshchenko tells of workers who wish to order five more fire pokers for their drafty office. The nom.sg. kocherga ‘fire poker’, is unobjectionable, but no one they talk to is sure what the gen.pl., the case used for five or more of an object in a quantified noun phrase like ‘five fire pokers’, ought to be, and both they and their interlocutors use clever circumlocutions to dodge the question.

Endnotes

I note that Wiktionary gives the gen.pl. of dno as don’ev, with a back stem yer surfacing (also bearing stress) and the -ev gen.pl. suffix. If this is correct, this is a problem for P’s claim that stems of this class are expected to have a null gen.pl., unless there is some other reason dno to behaves differently than mgla or mzda.
Here I am using acute accents to indicate stress, as is common practice in (anglophone) Russian linguistics.
I am loathe to assign grammatical distinctions to informal gradations of unacceptability, so I see no incentive to distinguish these cases.

References

Bailyn, J. F. and Nevins, A. 2008. Russian genitive plurals are impostors. In A. Bachrach and A. Nevins (ed.), Inflectional Identity, pages 237-270. Oxford University Press.
Pertsova, K. 2005. How lexical conservatism can lead to paradigm gaps. UCLA Working Papers in Linguistics 11: 13-30.
Sims, A. 2015. Inflectional Defectiveness. Cambridge University Press.
Stump, G. 2010. Interactions between defectiveness and syncretism. In M. Baerman, G. G. Corbett, and D. Brown (ed.), Defective Paradigms: Missing Forms and What They Tell Us, pages 181-210. Oxford University Press.
Zaliznyak, A. A. 1977. Grammatičeskij slovar’ russkogo jazyka. Moskva.

Defectivity in Russian; part 1: verbs

[This is part of a series of defectivity case studies.]

The earliest discussion of defectivity within the generativist tradition can be found in an early paper by Halle (1973:7f.).

…one finds various kinds of defective paradigms in the inflection. For instance, in Russian there are about 100 verbs (all, incidentally, belonging to the so-called “second conjugation”) which lack first person singular forms of the nonpast tense. Russian grammar books frequently note that such forms as (8) “do not exist” or “are not used”, or “are avoided.”

(8)
*lažu ‘I climb’
*pobežu (or *pobeždu) ‘I conquer’
*deržu ‘I talk rudely’
*muču ‘I stir up’
*erunžu ‘I behave foolishly’

Subsequent work slightly lowers Halle’s estimate of 100 verbs. By combining evidence from Russian morphological dictionaries, Sims (2006) provides a list of 70 defective verbs, and Pertsova (2016) further refines Sims’ list to 63. But by any account, defectivity affects many more verb types than, for example, in the English verbs.

All of the defective verbs end in a dental consonant—s, z, t, or d—and belong to the second conjugation, in which verbs form infinitives in -et’ or -it’), and are defective only in the 1sg. non-past form, marked with a –u and a mutation of the stem-final dental.

Baerman (2008) provides a detailed history of the mutations of t and d. The modern mutations, to š and ž respectively, represent the expected Russian reflexes of Common Slavic *tʲ, *dʲ, respectively. Christianization, beginning at the close of the first millennium, brought about a period of substantial contact with southern Slavic speakers, and their liturgical language, Old Church Slavonic (OCS), contributed novel reflexes of *tʲ, *dʲ, namely č [tʃʲ] and žd [ʐd] . The OCS reflexes were found in, among other contexts, the 1sg. non-past—where they competed with the native mutations—and the past passive participle, where they were largely entrenched. Ultimately, č persisted in the 1sg. non-past but žd was driven out sometime in the early 20th century (ibid., 85). However, the latter persists in past passive participles (e.g., rodit’ ‘to give birth’ has the past passive participle roždënnyj).¹The OCS affricate mutations are rarely found in contemporarily written Russian. However, Gorman & Yang (2019; henceforth G&K) cite some weak evidence that the OCS mutation has some synchronic purchase in the minds of Russian speakers. First, Sims (2006) administers a cloze task in which Russian speakers are asked to produce the 1sg. non-past of a defective verb shown in the infinitive (e.g., ubedit’ ‘to convince’) and several participants select the OCS-like ubeždu, which is proscribed. Secondly, Slioussar and Kholodilova (2013), Pertsova (2016), and Spektor (2021) catalog what happens when verbs borrowed from English end in a dental consonant. For instance, from the English friend comes zafrendit’ ‘to add s.o. to one’s friend list on social media and rasfrendit’ ‘to unfriend s.o. on social media’, and among the many options, they find instances of the OCS-like zafrenždu in addition to the expected zafrenžu. To add to the confusion, there there is some hesitation on the part of Russian speakers to apply either of the expected 1sg. non-past mutations, and some speakers produce the the unexpected, unmutated zafrendu.² There is no precedent for this among native Russian verb lexemes.

The mutations of s and z to š and ž, respectfully, have no competitor inherited from contact with OCS. These mutations occur across the board. However, that’s not quite the whole story: English borrowings in the Slioussar and Kholodilova corpus often fail to alternate. For instance, for fiksit’ ‘to fix s.t.’, they record both the expected fikšu as well as the unexpected, unmutated fiksu.

G&K develop an account of the Russian verbal gaps which assume that each of these four dental consonants does have a synchronically active competitor, and that there is simply no default. They couch this in terms of the Yang’s Tolerance Principle, but even one rejecting that particular method of deciding what is and is not productive might still agree with the basic insight—as indicated by English dental-stem loanwords—that the dental mutations are no longer productive and that this lack of productivity, along with sparse data during acquisition, results in defectivity.

Other accounts of this phenomena can be found in ch. 7 of Sims 2015 and in Pertsova 2016. These two studies contain many interesting suggestions for future work. However, with respect I must say I am not sure how to operationalize their suggestions as part of a mechanistic account of these observations.

Postscript

The aforementioned defectivity is the subject of occasional humor among Russian speakers. For instance, as discussed by as discussed by Sims (2015:5), a Russian translation of one of Milne’s Winnie the Pooh stories has the anthropomorphic bear puzzling over the 1sg. of pobedit‘ ‘to be victorious’. This suggests that Russian verbal defectivity has risen to the level of consciousness, and may reflect sociolinguistic “change from above”.

Endnotes

This form was cited in G&K:186; I have taken the liberty of fixing an inconsistency in the transliteration: there рождённый was transliterated as roždënny (note the missing final glide).
Russian has many indeclinable nouns, nouns which do not bear the ordinary case-number suffixes (Wade 2020:§36-40). For instance, radio ‘ibid.’ and VIČ ‘HIV’ can be used in any of the six cases and two numbers, but never bears any case-number suffixes. Crucially, though, indeclinables, unlike the aforementioned verbs, are either phonotactically-odd loanwords or acronyms, but as far as I can tell there is nothing phonotactically odd about zafrendit’ or its stem. And one should certainly not equate indeclinability and defectivity.

References

Baerman, M. 2008. Historical observations on defectiveness: The first singular non-past. Russian Linguistics 32: 81-97.
Gorman,. K. and Yang, C. 2019. When nobody wins. In F. Rainer, F. Gardani, H. C. Luschützky and W. U. Dressler (ed.), Competition in Inflection and Word Formation, pages 169-193. Springer.
Halle, M. 1973. Prolegomena to a theory of word formation. Linguistic Inquiry 4: 3-16.
Pertsova. 2016. Transderivational relations and paradigm gaps in Russian verbs.
Glossa 1: 13.
Sims, A. 2006. Minding the gap: Inflectional defectiveness in a paradigmatic theory. Doctoral dissertation, Ohio State University.
Sims, A. 2015. Inflectional Defectiveness. Cambridge University Press.
Slioussar, N. and Kholodilova, M. 2011. Paradigm leveling in non-standard Russian. In
Proceedings of the 20th meeting of Formal Approaches to Slavic Linguistics, pages 243-258.
Spektor, Y. 2021. Detection and morphological analysis of novel Russian
loanwords. Master’s thesis, Graduate Center, City University of New York.
Wade, T. 2020. A Comprehensive Russian Grammar. Wiley Blackwell, 4th edition.

On “significance levels”

R (I think it was R) introduced a practice in which multiple asterisk characters are used to indicate different significance levels for tests. [Correction: Bill Idsardi points out some prior art that probably predates the R convention. I have no idea what S or S-Plus did, nor what R was like before 2006 or so. But certainly R has helped popularize it.] For instance, in R statistical summaries, * denotes a p-value such that .01 < p < .05, ** denotes a p-value such that .001 < p < .01, and *** denotes a p-value < .001. This type of reporting increasingly can be found in papers also, but there are good reasons not to copy R’s bad behavior.

In null hypothesis testing, the mere size of the p-value itself has no meaning. All that matters is whether p is greater than or less than the α-level. Depending on space, we may report the exact value of p for a test (often rounded to two digits and “< .01″ used for abbreviatory purposes, since you don’t want to round down here), but we need not. And it simply does not matter at all how small p is when it’s less than the α-level. There is no notion of “more significant” or “less significant”.

R also uses the period character ‘.’ is used to indicate a p-value between .05 and .1. Of course, I have never read a single study using an α-level greater than .05 (I suppose this would simply make the possibility of Type I error too high), so I’m not sure what the point is.

My suggestion here is simple. If you want, use ‘*’ to indicate a significant (p < α) result, and then in the caption write something like “*: p < .05″ (assuming that your α-level is .05). Do not use additional asterisks.

Major projects at the Computational Linguistics lab

[The following is geared towards our incoming students. I’m just using the blog as a easy publishing mechanism.]

The following are some major projects ongoing in the GC Computational Linguistics Lab.

Many phonologists believe that phonotactic knowledge is independent of knowledge of phonological alternations. In my dissertation I evaluated computational models of autonomous phonotactic knowledge as predictions of speakers’ judgments of wordlikeness, and I found that these fail to consistently outperform simple baselines. In part, these models fail because they predict gradience that is poorly correlated with human judgments. However, these conclusions were tentative because of the poor quality of the available data, collected little attention paid to experimental design or choice of stimuli. With funding from the National Science Foundation, and in collaboration with professors Karthik Durvasula at Michigan State University and Jimin Kahng at the University of Mississippi, we are building a open-source “megastudy” of human wordlikeness judgments and performing computational modeling of the resulting data.

Speech recognizers and synthesizers are, essentially, engines for synthesizing or recognizing sequences of phonemes. Therefore, it is necessary to transform text into phoneme sequences. Such transformations are challenging insofar as they require linguistic expertise—and language-specific knowledge—and are not always amenable to generic machine learning techniques. We are engaged in several projects involving these mappings. The lab maintains WikiPron (Lee et al. 2020), software and databases for building multilingual pronunciation dictionaries, and has organized two SIGMORPHON shared tasks on multilingual grapheme-to-phoneme conversion (Gorman et al. 2020, Ashby et al. 2021). And with funding from the CUNY Professional Staff Congress, PhD student Amal Aissaoui is engaged building diacritization engines for Arabic and Latin, engines which supply missing pronunciation information for these scripts.

Morphological generation systems use machine learning to predict the inflected forms of words. In 2019 I led a team of researchers in an error analysis of the top two systems in the CoNLL-SIGMORPHON 2017 shared task on morphological generation (Gorman et al. 2019). We found that the top models struggled with inflectional patterns which are sensitive to lexeme-inherent morphosyntactic features like gender, animacy, and aspect, which are not provided in the task data. For instance, the top models often inflect Russian perfective verbs as if they were imperfective, or Polish inanimate nouns as if they were animate. Finally, we find that models struggle with abstract morphophonological patterns which cannot be inferred from the citation form alone. For instance, the top models struggle to predict whether or not a Spanish verb will undergo diphthongization under stress (e.g., negar–niego ‘to deny-I deny’ vs. pegar–pego ‘to stick-I stick’). In collaboration with professor Katharina Kann and PhD student Adam Weimerslage at the University of Colorado, Boulder, we are developing an open-source “challenge set” for morphological generation, a set that targets complex inflectional patterns in a diverse sample of 10-20 languages. This challenge set will act as benchmarks for neural network models of inflection, and will allow us to further study inherent features and abstract morphophonological patterns. In designing these challenge sets we have targeted a wide variety of morphological processes, including reduplication and templatic formation in addition to affixation and stem change. MA students Kristysha Chan, Mariana Graterol, and M. Elizabeth Garza, and PhD student Selin Alkan have all contributed to the development of this challenge set thus far.

Inflectional defectivity is the poorly-understood dark twin of productivity. With funding from the CUNY Professional Staff Congress, Emily Charde (MA 2020) is engaged in a computational study of defectivity in Greek nouns and Russian verbs.