Wall clock time

Computers and humans have radically different ways to reckon time. While a computer can tell you how long sometime took it, the computer is constantly switching between tasks, so this number has to be converted to wall clock time, or rather how much time elapsed in the real world while it was working on the job.

I guess I’m a dualist, because I think there’s something special about sentience. I think of humans (and possibly others creatures) are essentially divine but finite beings, whereas to me computers are mere objects. We divine beings can spend some of our finite time on earth to make a program run faster, but at some point it makes more sense to simply wait, and do something else while the program is running. It is hard to draw an equivalence between the opportunity cost for a divine being vs. an object. Learning when to just wait is one of the most important skills a developer can acquire.

Defectivity in Russian; part 1: verbs

[This is part of a series of defectivity case studies.]

The earliest discussion of defectivity within the generativist tradition can be found in an early paper by Halle (1973:7f.).

…one finds various kinds of defective paradigms in the inflection. For instance, in Russian there are about 100 verbs (all, incidentally, belonging to the so-called “second conjugation”) which lack first person singular forms of the nonpast tense. Russian grammar books frequently note that such forms as (8) “do not exist” or “are not used”, or “are avoided.”

(8)
*lažu ‘I climb’
*pobežu (or *pobeždu) ‘I conquer’
*deržu ‘I talk rudely’
*muču ‘I stir up’
*erunžu ‘I behave foolishly’

Subsequent work slightly lowers Halle’s estimate of 100 verbs. By combining evidence from Russian morphological dictionaries, Sims (2006) provides a list of 70 defective verbs, and Pertsova (2016) further refines Sims’ list to 63. But by any account, defectivity affects many more verb types than, for example, in the English verbs.

All of the defective verbs end in a dental consonant—s, z, t, or d—and belong to the second conjugation, in which verbs form infinitives in -et’ or -it’), and are defective only in the 1sg. non-past form, marked with a –and a mutation of the stem-final dental.

Baerman (2008) provides a detailed history of the mutations of t and d. The modern mutations, to š and ž respectively, represent the expected Russian reflexes of Common Slavic *tʲ, *dʲ, respectively. Christianization, beginning at the close of the first millennium, brought about a period of substantial contact with southern Slavic speakers, and their liturgical language, Old Church Slavonic (OCS), contributed novel reflexes of *tʲ, *dʲ, namely č [tʃʲ] and žd [ʐd] . The OCS reflexes were found in, among other contexts, the 1sg. non-past—where they competed with the native mutations—and the past passive participle, where they were largely entrenched. Ultimately, č persisted in the 1sg. non-past but žd was driven out sometime in the early 20th century (ibid., 85). However, the latter persists in past passive participles (e.g., rodit’ ‘to give birth’ has the past passive participle roždënnyj).1 The OCS affricate mutations are rarely found in contemporarily written Russian. However, Gorman & Yang (2019; henceforth G&K) cite some weak evidence that the OCS mutation has some synchronic purchase in the minds of Russian speakers. First, Sims (2006) administers a cloze task in which Russian speakers are asked to produce the 1sg. non-past of a defective verb shown in the infinitive (e.g., ubedit’ ‘to convince’) and several participants select the OCS-like ubeždu, which is proscribed. Secondly, Slioussar and Kholodilova (2013), Pertsova (2016), and Spektor (2021) catalog what happens when verbs borrowed from English end in a dental consonant. For instance, from the English friend comes zafrendit’ ‘to add s.o. to one’s friend list on social media and rasfrendit’ ‘to unfriend s.o. on social media’, and among the many options, they find instances of the OCS-like zafrenždu in addition to the expected zafrenžu. To add to the confusion, there there is some hesitation on the part of Russian speakers to apply either of the expected 1sg. non-past mutations, and some speakers produce the the unexpected, unmutated zafrendu.2 There is no precedent for this among native Russian verb lexemes.

The mutations of s and z to š and ž, respectfully, have no competitor inherited from contact with OCS. These mutations occur across the board. However, that’s not quite the whole story: English borrowings in the Slioussar and Kholodilova corpus often fail to alternate. For instance, for fiksit’ ‘to fix s.t.’, they record both the expected fikšu as well as the unexpected, unmutated fiksu. 

G&K develop an account of the Russian verbal gaps which assume that each of these four dental consonants does have a synchronically active competitor, and that there is simply no default. They couch this in terms of the Yang’s Tolerance Principle, but even one rejecting that particular method of deciding what is and is not productive might still agree with the basic insight—as indicated by English dental-stem loanwords—that the dental mutations are no longer productive and that this lack of productivity, along with sparse data during acquisition, results in defectivity.

Other accounts of this phenomena can be found in ch. 7 of Sims 2015 and in Pertsova 2016. These two studies contain many interesting suggestions for future work. However, with respect I must say I am not sure how to operationalize their suggestions as part of a mechanistic account of these observations.

Postscript

The aforementioned defectivity is the subject of occasional humor among Russian speakers. For instance, as discussed by as discussed by Sims (2015:5), a Russian translation of one of Milne’s Winnie the Pooh stories has the anthropomorphic bear puzzling over the 1sg. of pobedit‘ ‘to be victorious’. This suggests that Russian verbal defectivity has risen to the level of consciousness, and may reflect sociolinguistic “change from above”.

Endnotes

  1. This form was cited in G&K:186; I have taken the liberty of fixing an inconsistency in the transliteration: there рождённый was transliterated as roždënny (note the missing final glide).
  2. Russian has many indeclinable nouns, nouns which do not bear the ordinary case-number suffixes (Wade 2020:§36-40). For instance, radio ‘ibid.’ and VIČ ‘HIV’ can be used in any of the six cases and two numbers, but never bears any case-number suffixes. Crucially, though, indeclinables, unlike the aforementioned verbs, are either phonotactically-odd loanwords or acronyms, but as far as I can tell there is nothing phonotactically odd about zafrendit’ or its stem. And one should certainly not equate indeclinability and defectivity.

References

Baerman, M. 2008. Historical observations on defectiveness: The first singular non-past. Russian Linguistics 32: 81-97.
Gorman,. K. and Yang, C. 2019. When nobody wins. In F. Rainer, F. Gardani, H. C. Luschützky and W. U. Dressler (ed.), Competition in Inflection and Word Formation, pages 169-193. Springer.
Halle, M. 1973. Prolegomena to a theory of word formation. Linguistic Inquiry 4: 3-16.
Pertsova. 2016. Transderivational relations and paradigm gaps in Russian verbs.
Glossa 1: 13.
Sims, A. 2006. Minding the gap: Inflectional defectiveness in a paradigmatic theory. Doctoral dissertation, Ohio State University.
Sims, A. 2015. Inflectional Defectiveness. Cambridge University Press.
Slioussar, N. and Kholodilova, M. 2011. Paradigm leveling in non-standard Russian. In
Proceedings of the 20th meeting of Formal Approaches to Slavic Linguistics, pages 243-258.
Spektor, Y. 2021. Detection and morphological analysis of novel Russian
loanwords. Master’s thesis, Graduate Center, City University of New York.
Wade, T. 2020. A Comprehensive Russian Grammar. Wiley Blackwell, 4th edition.

On “significance levels”

R (I think it was R) introduced a practice in which multiple asterisk characters are used to indicate different significance levels for tests. [Correction: Bill Idsardi points out some prior art that probably predates the R convention. I have no idea what S or S-Plus did, nor what R was like before 2006 or so. But certainly R has helped popularize it.] For instance, in R statistical summaries, * denotes a p-value such that .01 < p < .05, ** denotes a p-value such that .001 < p < .01, and *** denotes a p-value < .001. This type of reporting increasingly can be found in papers also, but there are good reasons not to copy R’s bad behavior.

In null hypothesis testing, the mere size of the p-value itself has no meaning. All that matters is whether p is greater than or less than the α-level. Depending on space, we may report the exact value of p for a test (often rounded to two digits and “< .01″ used for abbreviatory purposes, since you don’t want to round down here), but we need not. And it simply does not matter at all how small p is when it’s less than the α-level. There is no notion of “more significant” or “less significant”.

R also uses the period character ‘.’ is used to indicate a p-value between .05 and .1. Of course, I have never read a single study using an α-level greater than .05 (I suppose this would simply make the possibility of Type I error too high), so I’m not sure what the point is.

My suggestion here is simple. If you want, use ‘*’ to indicate a significant (p < α) result, and then in the caption write something like “*: < .05″ (assuming that your α-level is .05). Do not use additional asterisks.

Avoid adjacent delimiters

A mundane but highly effective writing tip is to avoid structures like “…) ( …” in your writing. For instance instead of

As argued by Chomsky & Halle (1968) (henceforth, SPE)…

you can (and should!) write

As argued by Chomsky & Halle (1968; henceforth, SPE)…

which I think you’ll agree Just Looks Better. A closely related trick is to avoid things like

The Greek letter Υ denotes /y/, /yː/…

and instead write

The Greek letter Υ denotes /y, yː/…

You can do this with phonemic forward slashes, phonetic square brackets, or the curly braces used to specify sets.

Major projects at the Computational Linguistics lab

[The following is geared towards our incoming students. I’m just using the blog as a easy publishing mechanism.]

The following are some major projects ongoing in the GC Computational Linguistics Lab.

Many phonologists believe that phonotactic knowledge is independent of knowledge of phonological alternations. In my dissertation I evaluated computational models of autonomous phonotactic knowledge as predictions of speakers’ judgments of wordlikeness, and I found that these fail to consistently outperform simple baselines. In part, these models fail because they predict gradience that is poorly correlated with human judgments. However, these conclusions were tentative because of the poor quality of the available data, collected little attention paid to experimental design or choice of stimuli. With funding from the National Science Foundation, and in collaboration with professors Karthik Durvasula at Michigan State University and Jimin Kahng at the University of Mississippi, we are building a open-source “megastudy” of human wordlikeness judgments and performing computational modeling of the resulting data.

Speech recognizers and synthesizers are, essentially, engines for synthesizing or recognizing sequences of phonemes. Therefore, it is necessary to transform text into phoneme sequences. Such transformations are challenging insofar as they require linguistic expertise—and language-specific knowledge—and are not always amenable to generic machine learning techniques. We are engaged in several projects involving these mappings. The lab maintains WikiPron (Lee et al. 2020), software and databases for building multilingual pronunciation dictionaries, and has organized two SIGMORPHON shared tasks on multilingual grapheme-to-phoneme conversion (Gorman et al. 2020, Ashby et al. 2021). And with funding from the CUNY Professional Staff Congress, PhD student Amal Aissaoui is engaged building diacritization engines for Arabic and Latin, engines which supply missing pronunciation information for these scripts.

Morphological generation systems use machine learning to predict the inflected forms of words. In 2019 I led a team of researchers in an error analysis of the top two systems in the CoNLL-SIGMORPHON 2017 shared task on morphological generation (Gorman et al. 2019). We found that the top models struggled with inflectional patterns which are sensitive to lexeme-inherent morphosyntactic features like gender, animacy, and aspect, which are not provided in the task data. For instance, the top models often inflect Russian perfective verbs as if they were imperfective, or Polish inanimate nouns as if they were animate. Finally, we find that models struggle with abstract morphophonological patterns which cannot be inferred from the citation form alone. For instance, the top models struggle to predict whether or not a Spanish verb will undergo diphthongization under stress (e.g., negarniego ‘to deny-I deny’ vs. pegarpego ‘to stick-I stick’). In collaboration with professor Katharina Kann and PhD student Adam Weimerslage at the University of Colorado, Boulder, we are developing an open-source “challenge set” for morphological generation, a set that targets complex inflectional patterns in a diverse sample of 10-20 languages. This challenge set will act as benchmarks for neural network models of inflection, and will allow us to further study inherent features and abstract morphophonological patterns. In designing these challenge sets we have targeted a wide variety of morphological processes, including reduplication and templatic formation in addition to affixation and stem change. MA students Kristysha Chan, Mariana Graterol, and M. Elizabeth Garza, and PhD student Selin Alkan have all contributed to the development of this challenge set thus far.

Inflectional defectivity is the poorly-understood dark twin of productivity. With funding from the CUNY Professional Staff Congress, Emily Charde (MA 2020) is engaged in a computational study of defectivity in Greek nouns and Russian verbs.

Quiet quitting is work-to-rule but worse

This week’s hot media trend is quiet quitting, and if you’re even remotely familiar with the US labor movement, you’ll recognize this as a version of organized labor’s work to rule actions, in which workers do the absolute minimum amount of work required by the contract. The difference is that a quiet quitter slacks off alone, whereas work to rule actions are applied across organized groups of employees under similar work conditions. The Wall St. Journal is willing to tell you about the former behavior, which is youth-coded and unlikely to result in improved conditions, but is not in a hurry to tell you about traditional forms of collective labor action.

On who is allowed to graduate

There is a convention I’ve seen at several institutions whereby a PhD (usually) student who already has a job or post-doc lined up is permitted to defend a dissertation that is less complete than would otherwise be accepted were they not up against a deadline. One suspects this sort of thing is applied in a rather biased fashion, but let’s suppose it was not. I cannot see any justification for it. It produces poor science, it is bad for departmental morale and espirit de corps, and it doesn’t prepare the student for future success in an environment where their advisor can no longer put a finger on the scale.

Now it is true that advisors or committee members, for whatever reason, occasionally try to squeeze a student for more one more experiment that is more of a nice-to-have than essential to make the argument being made in the thesis, but it is not clear why accepting a sub-par dissertation should be a remedy for it, and why such a remedy should only be available if you have a new job starting in two weeks.

Defectivity in Kinande

[This is part of a series of defectivity case studies.]

I have already written a bit about reduplication in Kinande; it too is an example of inflectional defectivity, and here I’ll focus on that fact.

In this language, most verbs participate in a form of reduplication with the semantics of roughly ‘to hurriedly V’ or ‘to repetitively V’. Mutaka & Hyman (1990; henceforth MH), argue that the reduplicant is a bisyllabic prefix. For instance, the reduplicated form of e-ri-gend-a ‘to leave’ is e-ri-gend-a-gend-a ‘to leave hurriedly’, with the reduplicant underlined. (In MH’s terms, e- is the “augment”, -ri the “prefix”, and -a is the “final vowel” morpheme.)

Certain verbal suffixes, known to Bantuists as extensions, may also be found in the reduplicant when the reduplicant would otherwise be less than bisyllabic. For instance, the passive suffix, underlyingly /-u-/, surfaces as [w] and is copied by reduplication. Thus for the verb root hum ‘beat’ the passive e-ri-hum-w-a reduplicates as e-ri-hum-w-a-hum-w-a. More interesting is there are “unproductive” (MH’s term) extensions.1 Verbs bearing these extensions rarely have a compositional semantic relationship with their unextended form (if an unextended verb stem exists at all). For instance, whereas luh-uk-a ‘take a rest’ may be semantically related to luh-a ‘be tired’, but there is no unextended *bát-a to go with bát-uk-a ‘move’.

Interesting things happen when we try to reduplicate unproductivity extended monosyllabic verb roots. For some such verbs, the extension is not reduplicated; e.g., e-rí-bang-uk-a ‘to jump about’ has a reduplicated form e-rí-bang-a-bang-uk-a. This is the same behavior found for “productive” extensions. For others, the extension is reduplicated, producing a trisyllabic—instead of the normal bisyllabic—reduplicant; e.g., e-ri-hurut-a ‘to snore’ has a reduplicated form e-ri-hur-ut-a-hur-ut-a. Finally, there are some stems—all monosyllabic verb roots with unproductive extensions—which do not undergo reduplication; e.g., e-rí-bug-ul-a ‘to find’ does not reduplicate and neither *e-rí-bug-a-bug-ul-a or *e-rí-bug-ul-a-bug-ul-a exist.

While one could imagine there are certain semantic restrictions on reduplication, like in Chaha, MH make no mention of such restrictions in Kinande. If possible, we should rule out this as a possible explanation for the aforementioned defectivity.

Endnotes

  1. I will segment these with hyphens though it may make sense to regard some unproductive extensions as part of morphologically simplex stems.

References

Mutaka, N. and Hyman, L. M. 1990. Syllables and morpheme integrity in Kinande reduplication. Phonology 7: 73-119.

re.compile is otiose

Unlike its cousins Perl and Ruby, Python has no literal syntax for regular expressions. Whereas one can express the sheep language /baa+/ with a simple forward-slashed literal in Perl and Ruby, in Python one has to compile them using the function re.compile, which produces objects of type re.Pattern. Such objects have various methods for string matching.

sheep = re.compile(r"baa+")
assert sheep.match("baaaaaaaa")

Except, one doesn’t actually have to compile regular expressions at all, as the documentation explains:

Note: The compiled versions of the most recent patterns passed to re.compile() and the module-level matching functions are cached, so programs that use only a few regular expressions at a time needn’t worry about compiling regular expressions.

What this means is that in the vast majority of cases, re.compile is otiose (i.e., unnecessary). One can just define expression strings, and pass them to the equivalent module-level functions rather than using the methods of re.Pattern objects.

sheep = r"baa+"
assert re.match(sheep, "baaaaaaaa")

This, I would argue, is slightly easier to read, and certainly no slower. It also makes typing a bit more convenient since str is easier to type than re.Pattern.

Now, I am sure there is some usage pattern which would favor explicit re.compile, but I have not encountered one in code worth profiling.

Defectivity in Polish

[This is part of a series of defectivity case studies.]

Gorman & Yang (2019), following up on a tip from Margaret Borowczyk (p.c.) discuss inflectional gaps in Polish declension. In this language, masculine genitive singular (gen.sg.) are marked either with -a or -u. The two gen.sg. suffixes have a similar type frequency, and neither appears to be more default-like than the other. For instance, both allomorphs are used with loanwords. Because of this, it is generally agreed that the gen.sg. allomorphy is purely arbitrary and must be learned by rote, a process that continues into adulthood (e.g., Dąbrowska 2001, 2005).

Kottum (1981: 182) reports his informants have no gen.sg. for masculine-gender toponyms like Dublin ‘id.’ (e.g., *Dublina/*Dublinu), Göteborg ‘Gothenburg’ and Tarnobrzeg ‘id.’, and Gorman & Yang (2019: 184) report their informants do not have a gen.sg. for words like drut ‘wire’ (e.g., *druta/*drutu, though the latter is prescribed), rower ‘bicycle’, balon ‘baloon’, karabin ‘rifle’, autobus ‘bus’, and lotos ‘lotus flower’.

References

Dąbrowska, E. 2001. Learning a morphological system without a default: The Polish genitive. Journal of Child Language 28: 545-574.
Dąbrowska, E. 2005. Productivity and beyond: mastering the Polish genitive inflection. Journal of Child Language 32:191-205.
Gorman,. K. and Yang, C. 2019. When nobody wins. In F. Rainer, F. Gardani, H. C. Luschützky and W. U. Dressler (ed.), Competition in Inflection and Word Formation, pages 169-193. Springer.
Kottum, S. S. 1981. The genitive singular form of masculine nouns in Polish. Scando-Slavica 27: 179-186.