Thought experiment #1

A non-trivial portion of what we know about the languages we speak includes information about lexically-arbitrary behaviors, behaviors that are specific to certain roots and/or segments and absent in other superficially-similar roots and/or segments. One of the earliest examples is the failure of English words like obesity to undergo Chomsky & Halle’s (1968: 181) rule of trisyllabic shortening: compare sereneserenity to obese-obesity (Halle 1973: 4f.). Such phenomena are very common in the world’s languages. Some of the well-known examples include Romance mid-vowel metaphony and the Slavic fleeting vowels, which delete in certain phonological contexts.1

Linguists have long claimed (e.g., Harris 1969) one cannot predict whether a Spanish e or o in the final syllable of a verb stem will or will not undergo diphthongization (to ie or ue, respectively) when stress falls on the stem rather than the desinence. For instance negar ‘to deny’ diphthongizes (niego ‘I deny’, *nego) whereas the superficially similar pegar ‘to stick to s.t.’ does not (pego ‘I stick to s.t.’, *piego). There is no reason to suspect that the preceding segment (n vs. p) has anything to do with it; the Spanish speaker simply needs to memorize which mid vowels diphthongize.2 The same is arguably true of the Polish fleeting vowels known as yers, which delete in, among other contexts, the genitive singular (gen.sg.) of masculine nouns. Thus sen ‘dream’ has a gen.sg. snu, with deletion of the internal e, whereas the superficially similar basen ‘pool’ has a gen.sg. basenu, retaining the internal (Rubach 2016: 421). Once again, the Polish speaker needs to memorize whether or not each deletes.

So as to not presuppose a particular analysis, I will refer to segments with these unpredictable alternations—diphthongization in Spanish, deletion in Polish—as magical. Exactly how this magic ought to be encoded is unclear.3 One early approach was to exploit the feature system so that they were underlyingly distinct from non-magical segments. These “exploits” might include mapping magical segments onto gaps in the surface segmental inventory, underspecification, or simply introducing new features. Nowadays, phonologists are more likely to use prosodic prespecification. For instance, Rubach (1986) proposes that the Polish yers are prosodically defective compared to non-alternating e.4 Others have claimed that magic resides in the morph, not the segment.

Regardless of how the magic is encoded, it is a deductive necessity that it be encoded somehow. Clearly something is representationally different in negar and pegar, and sen and basen. Any account which discounts this will be descriptively inadequate. To make this a bit clearer, consider the following thought experiment:

We are contacted by a benign, intelligent alien race, carbon-based lifeforms from the Rigel system with feliform physical morphology and a fondness for catnip. Our scientists observe that they exhibit a strange behavior: when they imbibe fountain soda, their normally-green eyes turn yellow, and when they imbibe soda from a can, their eyes turn red. Scientists have not yet been able to determine the mechanisms underlying these behaviors.

What might we reason about the alien’s seemingly magical soda sense? If we adopt a sort of vulgar uniformitarianism—one which rejects outlandish explanation like time travel or mind-reading—then the only possible explanation remaining to us is that there really is something chemically distinct between the two classes of soda, and the Rigelian sensory system is sensitive to this difference.

Really, this deduction isn’t so different from the one made by linguists like Harris and Rubach: both observe different behaviors and posit distinct entities to explain them. Of course, there is something ontologically different between the two types of soda and the two types of Polish e. The former is a purely chemical difference; the latter arises  because the human language faculty turns primary linguistic data, through the epistemic process we call first language acquisition, into one type of meat (brain tissue), and that type of meat makes another type of meat (the articulatory apparatus) behave in a way that, all else held equal, will recapitulate the primary linguistic data. But both of these deductions are equally valid.

Endnotes

  1. Broadly-similar phenomena previously studied include fleeting vowels in Finnish, Hungarian, Turkish, and Yine, ternary voice contrasts in Turkish, possessive formation in Huichol, and passive formation in Māori.
  2. For simplicity I put aside the arguments by Pater (2009) and Gouskova (2012) that morphs, not segments, are magical. While I am not yet convinced by their arguments, everything I have to say here is broadly consistent with their proposal.
  3. This is yet another feature of language that is difficult to falsify. But as Ollie Sayeed once quipped, the language faculty did not evolve to satisfy a vulgar Popperian falsificationism.
  4. Specfically, Rubach assumes that the non-alternating e‘s have a prespecified mora, whereas the alternating e‘s do not.

References

Chomsky, N. and Halle, M. 1968. The Sound Pattern of English. Harper & Row.
Gouskova, M. 2012. Unexceptional segments. Natural Language & Linguistic Theory 30: 79-133.
Halle, M. 1973. Prolegomena to a theory of word formation. Linguistic Inquiry 4: 3-16.
Harris, J. 1969. Spanish Phonology. MIT Press.
Pater, J. 2009. Morpheme-specific phonology: constraint indexation and inconsistency resolution. In S. Parker (ed.), Phonological Argumentation: Essays on Evidence and Motivation, pages 123-154. Equinox.
Rubach, J. 1986. Abstract vowels in three-dimensional phonology: the yers. The Linguistic Review 5: 247-280.
Rubach, J. 2016. Polish yers: Representation and analysis. Journal of Linguistics 52: 421-466.

Asymmetries in Latin glide formation

Let us assume, as I have in the past, that the Classical Latin glides [j, w] are allophones of the short high monophthongs /i, u/. Then, any analysis of this allophony must address the following four asymmetries between [j] and [w]:

  1. Intervocalical /i/ is [j.j], as in peior [pej.jor] ‘worse’; intervocalic /u/ is simple.
  2. Intervocalically, /iu/ is realized as [jw], as in laeua [laj.wa] ‘left, leftwards’ (fem. nom.sg.), but /ui/ is realized as [wi], as in pauiō [pa.wi.oː] ‘I beat’.
  3. /u/ preceded by a liquid and followed by a vowel is also realized as [w], as in ceruus [ker.wus] and silua [sil.wa] ‘forest’, but /i/ is never realized as a glide in this position.
  4. There are two cases in which [u] alternates with [w] (the deadjectival suffix /-u-/ is realized as /-w-/ when preceded by a liquid, as in caluus [cal.wus] ‘bald’, and the perfect suffix /-u-/ is realized as /-w-/ in “thematic” stems like cupīuī [ku.piː.wiː] ‘I desired’); there are no alternations between [i] and [j].

What rules gives rise to these asymmetries?

A theory of error analysis

Manual error analyses can help to identify the strengths and weaknesses of computational systems, ultimately suggesting future improvements and guiding development. However, they are often treated as an afterthought or neglected altogether. In three of my recent papers, we have been slowly developing what might be called a theory of error analysis. The systems evaluated include:

  • number normalization (Gorman & Sproat 2016); e.g., mapping 97000 onto quatre vingt dix sept mille,
  • inflection generation (Gorman et al. 2019); e.g., mapping pairs citation form and inflectional specification like (aufbauen, V;IND;PRS;2) onto inflected forms like baust auf, and
  • grapheme-to-phoneme conversion (Lee et al. 2020); e.g., mapping orthographic forms like almohadilla onto phonemic or phonetic forms like /almoaˈdiʎa/ and [almoaˈðiʎa].

While these are rather different types of problems, the systems all have one thing in common: they generate linguistic representations. I discern three major classes of error such systems might make.

  • Target errors are only apparent errors; they arise when the gold data, the data to be predicted, is linguistically incorrect. This is particularly likely to arise with crowd-sourced data though such errors are also present in professionally annotated resources.
  • Linguistic errors are caused by misapplication of independently attested linguistic behaviors to the wrong input representations.
    • In the case of number normalization, these include using the wrong agreement affixes in Russian numbers; e.g., nom.sg. *семьдесят миллион for gen.sg. семьдесят миллионов ‘nine hundred million’ (Gorman & Sproat 2016:516)
    • In inflection generation, these are what Gorman et al. 2019 call allomorphy errors; e.g., for instance, overapplying ablaut to the Dutch weak verb printen ‘to print’ to produce a preterite *pront instead of printte (Gorman et al. 2019:144).
    • In grapheme-to-phoneme conversion, these include failures to apply allophonic rules; e,g, in Korean, 익명 ‘anonymity’ is incorrectly transcribed as [ikmjʌ̹ŋ] instead of [iŋmjʌ̹ŋ], reflecting a failure to apply a rule of obstruent nasalization not indicated in the highly abstract hangul orthography (Lee et al. under review).
  • Silly errors are those errors which cannot be analyzed as either target errors or linguistic errors. These have long been noted as a feature of neural network models (e.g., Pinker & Prince 1988, Sproat 1992:216f. for discussion of *membled) and occur even with modern neural network models.

I propose that this tripartite distinction is a natural starting point when building an error taxonomy for many other language technology tasks, namely those that can be understood as generating linguistic sequences.

References

K. Gorman, A. D. McCarthy, R. Cotterell, E. Vylomova, M. Silfverberg, and M. Markowska (2019). Weird inflects but OK: making sense of morphological generation errors. In CoNLL, 140-151.
K. Gorman and R. Sproat (2016). Minimally supervised number normalization. Transactions of the Association for Computational Linguistics 4: 507-519.
J. L. Lee, L. F.E. Ashby, M. E. Garza, Y. Lee-Sikka, S. Miller, A. Wong, A. D. McCarthy, and K. Gorman (under review). Massively multilingual pronunciation mining with WikiPron.
S. Pinker and A. Prince (1988). On language and connectionism: analysis of a parallel distributed processing model of language acquisition. Cognition 28(1–2):73–193.
R. Sproat (1992). Morphology and computation. Cambridge: MIT Press.

Is formal phonology in trouble?

I recently attended the 50th meeting of the North East Linguistics Society (NELS), which is not much of a society as a prestigious generative linguistics conference. In recognition of the golden jubilee, Paul Kiparsky gave a keynote in which he managed to reconstruct nearly all of the NELS 1 schedule, complete with at least one handout, from a talk by Anthony Kroch and Howard Lasnik. Back then, apparently, handouts were just examples: no prose.

In his talk, Paul showed a graph showing that phonology accounts for an increasingly small number of paper at NELS, and in fact the gap has actually gotten worse over the last few decades. Paul proposed something of an explanation: that the introduction of Optimality Theory (OT) and its rejection of “derivational” explanations has forever introduced a schism between phonology and other subareas, and that syntacticians and semanticists are simply uncomfortable with the non-derivational nature of modern phonological theorizing.

With all due respect, I do not find this explanation probable. As he admits, most OT theorizing (including his own) now actually rejects the earlier rejection of derivational explanations. And on the other hand, modern syntactic theories are a heady brew of derivational (phases, copy theory, etc.) and non-derivational (move α, uninterpretable feature matching, etc.) thinking. And finally it’s not really clear why the aesthetic preferences of syntacticians (if that’s all they are) should produce the data, i.e., fewer phonology papers at NELS.

But I do agree that OT is the elephant in the room, responsible for an enormous amount of fragmentation in phonological theorizing.

I would liken Prince & Smolensky’s “founding document” (1993) to Martin Luther’s Ninety-five Theses. Scholars believe that Luther wished to start a scholarly theological debate rather than a popular revolution, and I suspect the founders of OT were similarly surprised with the enormous impact their proposal had on the field. Luther’s magnificient heresy may have failed to move the Church in the directions he wished, but he is the father of hundreds if not thousands of Protestant sects, each with their own new and vibrant “heresies”. The founders of OT, I think, are similarly unable to put the cat back into the bag (if they wish to at all).

In my opinion, OT’s early rejection of derivationalism has been an enormous empirical failure, and the full-blown functionalistic-externalist thinking—one of the first post-OT heresies (let’s liken it to Calvinism)—is, in my opinion, ontologically incoherent. That said, I would encourage OT believers to try more theory-comparison. The article on “Christian denominations” in Diderot’s & d’Alembert’s Encyclopédie begins with the obviously insincere suggestion that someone ought to study which of the various Protestant sects is most likely to lead to salvation. But I would sincerely love to find out which variant of OT is in fact most optimal.

[Thanks to Charles Reiss for discussion.]

Latin vowel-glide alternations

Post-war structuralist phonology greatly emphasized phonemics and largely ignored morphophonemics. But in 1959, Morris Halle’s Sound Pattern of Russian argued that the distinction between allophony and alternation has little cognitive importance, and in fact the distinction leads to an unnecessary duplication of effort. As a result of Halle’s forceful arguments, the contrast between phonemic and morphophonemic processes plays little role in modern phonological theory. I would like to go one step further and suggest that patterns of alternation are actually more principled facts than those of allophony. Simply put, a speaker must command the pattern of alternation in their language; but it is not at all clear whether they exploit allophony when constructing their lexical entries. This is highlighted most clearly by the notions of lexicon optimization, Stampean occultation, and richness of the base in Optimality Theory, though as Hale et al. (1998) note, similar points apply to rule-based theories.

In writing the Romans did not draw distinctions between the high monophthongs [i, u, iː, uː] and glides [j, w], respectively. This naturally led structuralist linguists (e.g., Hall 1946) to suggest that the glides are allophones of the high monophthongs. There are some apparent problems with this suggestion, though not all of them are fatal. One point that has largely been ignored in this discussion is that Classical Latin has at least four types of plausible alternations between high monophthongs and the corresponding glides. In this squib I review these alternations.

Deverbal -u- derivatives

There are a large number of adjectival derivatives formed from verbal stems by the addition of -u- and the appropriate agreement suffixes, e.g., masculine nominative singular (masc. nom.sg.) -u-us, feminine nom.sg. -u-a, and neuter nom.sg. -u-um, and so on. These derivatives have a similar semantics to past participles (“having been Xed”) but in some cases have a secondary meaning “able to be Xed”. For example, the masc. nom.sg. form dīuiduus [diːwi.du.us] means ‘divided’ (cf. dīuidō [diːwi.doː] ‘I divide’) but also ‘divisible’. This is a fairly productive process, as the following examples show. (I have taken the liberty of leaving off certain further productive derivatives, such as intensified adjectives in per-.)

(1) assiduus ‘constant, ambiguus ‘hither and thither’, annuus ‘annual, arduus ‘elevated’, cernuus ‘bowed forward’, circumfluus ‘flowing around’ (refluus ‘ebbing’), cōnspicuus ‘visible’, contiguus ‘neighboring’, continuus ‘continuous’, dīuiduus ‘divided; divisible’ (indīuiduus ‘undivided; indivisible’), exiguus ‘strict’, fatuus ‘foolish’, incaeduus ‘uncut’,  ingenuus ‘indigenous’, irriguus ‘irrigated’, mēnstruus ‘monthly’, mortuus ‘dead’ (dēmortuus ‘departed’, intermortuus ‘decayed’, praemortuus ‘prematurely dead’), mūtuus ‘borrowed’ (prōmūtuus ‘paid in advance’), nocuus ‘harmful’ (innocuus ‘harmless’), occiduus ‘westerly’, pāscuus ‘for pasturing’, perpetuus ‘perpetual’, perspicuus ‘transparent’, praecipuus ‘particular’, prōmiscuus ‘indiscriminate’, residuus ‘remaining’,  riguus ‘irrigated’, strēnuus ‘brisk’, succiduus ‘sinking’, superuacuus ‘superfluous’, uacuus ’empty’, uiduus ‘destitute’

In all the above cases …uus is read [u.us]. However, when the stem ends in a liquid [l, r] …uus is read [wus], indicating that the deadjectival affix is realized as [w].

(2)
a. caluus ‘bald’, fuluus ‘reddish-yellow, tawny’, giluus ‘pale yellow’, heluus ‘honey yellow’
b. aruus ‘arable’, curuus ‘bent’ (incuruus ‘bent’), furuus ‘dark, swarthy’, paruus ‘small’, prōteruus ‘violent’, toruus ‘savage’

It is interesting to note that the contexts where -u- is realized as [w] align with a well-known allophonic generalization (Devine & Stephens 1977: 61., 134f.): a u preceded by a (tautomorphemic) coda liquid or front glide, and followed by a vowel, is realized as [w], as in silua [sil.wa] ‘forest’ or ceruus [ker.wus] ‘deer’, but is realized as a vowel when the preceding consonant is either a nasal, an obstruent, or part of a consonant cluster, as in lituus [li.tu.us] ‘trumpet’ or patruus [pa.tru.us] ‘paternal uncle’.

Two residual issues remain. First, when the verbal stem end in qu [kw], the adjectival derivative is spelled …quus. By the normal rules of spelling this would be read as [kwus], which would suggest that a zero allomorph of the adjectival suffix is selected for here.

(3) aequus ‘equal’, antīquus ‘old’, fallāciloquus ‘falsely speaking’ (fātiloquus ‘prophetic’, flexiloquus ‘ambiguous’, grandiloquus ‘grandiloquent’, magniloquus ‘boastful’, uāniloquus ‘lying’, uersūtiloquus ‘slyly speaking’), inīquus ‘unjust’, longinquus ‘distant’, oblīquus ‘slanting, oblique’, pedisequus ‘following on foot’, propinquus ‘near’, reliquus ‘remaining’

This is consistent with the metrical evidence. For instance in the following verse, aequus must be read as bisyllabic.

(4)
hoc opus hic labor est paucī quōs
aequus amāuit (Verg., Aen. 6.129)[ok.ko.pu|sik.la.bo|rest.paw|kiː.kwoː|saj.kwu.sa|maːwit]

Secondly, there are a number of deverbal derivatives in -u-us where the verb form also has a stem-final [w]. In this case we also observe [wus].

(5)
a. cauus [ka.wus] ‘hollowed; hollow’ (concauus ‘hollow’); cf. cauō [ka.woː] ‘I excavate’
b. flāuus [flaː.wus] ‘yellow, gold, blonde’ (sufflāuus ‘yellowish’); cf. flāueō [flaː.we.oː] ‘I am yellow’
c. (g)nāuus [naː.wus] ‘active’ (īgnāuus ‘lazy’); cf. nāuō [naː.woː] ‘I do s.t. enthusiastically’
d. nouus [no.wus] ‘new’; cf. nouō [no.woː] ‘I renew’
e. saluus [sal.wus] ‘safe; well’; cf. salueō [sal.we.oː] ‘I am well’
f. uīuus [wːi.wus] ‘living’ (rediuīuus ‘restored to life’); cf. uīuō [wiː.woː] ‘I live’

This may be another context where the adjectival suffix has a zero allomorph, though it is not clear whether we are looking at the same derivational process as above.

The foregoing discussion leads me to posit a deverbal adjective-forming suffix /-u-/ with two phonologically-predictable allomorphs: [w] before liquids, and zero before [kw] and possibly, [w].

The “third stem”

Schoolchildren learning Latin memorize four forms (or principal parts) of each verb: the first person singular (1sg.) present active indicative (e.g., amō ‘I love’), the present infinitive (amāre ‘to love’), the 1sg. perfect active (amāvī ‘I loved’), and the perfect passive participle (amātus masc. nom.sg. ‘loved). The first two principal parts effectively index the so-called “present stem” of the verb, and the third principal part gives the so-called “perfect stem”. The relationship between the present and perfect stem is often unpredictable. Some perfect stems lengthen a monophthong in the final syllable of the present stem (e.g., legō/lē‘I choose/chose’); some perfect stems omit a post-vocalic nasal in the final syllablem with comcomitant lengthening (uincō/uī ‘I win/won’); some are mutated by the addition of a -s- perfect suffix (cō/dīxī [diː.koː, diːk.siː] ‘I say/said’); others bear a CV-reduplication prefix, and so on. This has lead some to suggest that the latter two stems are essentially “listed” or “stored” for all verbs. This is, for instance, the position of Lieber (1980:141f., 152f.), but has been disputed by Aronoff (1994: chap. 2) and Steriade (2012), among others, who claim there are many productive regularities in both cases.

The majority of verbs have perfects that consist of the bare verb root, the theme vowel, a high back vocoid perfect suffix, and the appropriate person-number agreement suffixes (e.g., 1sg. -ī-). The perfect suffix is preceded by a theme vowel and as the appropriate agreement suffixes are all vowel-initial, it is always intervocalic. Allophonically, this is a context where [u] is never found but [w] is, and this is what we find here: amāuī [a.maː.wiː] ‘I loved’. This type of perfect is in fact found in all conjugations, and found in the overwhelming majority of 1st (-ā- theme vowel) and 4th conjugation (-ī-) verbs (Aronoff 1994:43f.).

(6)
a. cōnsōlāuī [kon.soː.laː.wiː], portāuī [por.taː.wiː] ‘I carried’
b. dēlēuī [deː.leː.wiː] ‘I destroyed’, plēuī [pleː.wiː] ‘I filled up’
c. cupīuī [ku.piː.wiː] ‘I desired’, petīuī [pe.tiː.wiː] ‘I sought’
d. audīuī [aw.diː.wiː] ‘I listened to’, mūnīuī [muː.niː.wiː] ‘I fortified’

However, there is an alternative formulation in which the theme vowel is omitted,  placing the perfect suffix to the right of a consonant, and in this context it is instead realized as [u]. This type of perfect is also found in all conjugations but is most common in the 2nd (-ē-) conjugation.

(7)
a. domuī [do.mu.iː] ‘I tamed’, uetuī [we.tu.iː] ‘I forbid’
b. docuī [do.ku.iː] ‘I taught’, tenuī [te.nu.iː] ‘I held’
c. rapuī [rap.u.iː] ‘I snatched’, texuī [tek.su.iː] ‘I wove’
d. aperuī [a.pe.ru.iː] ‘I opened’, saluī [sa.lu.iː] ‘I leapt’

Together the patterns in (6-7) account for the vast majority of perfects in all conjugations except the 3rd (itself a grab-bag of etymologically dissimilar verbs).

I propose that the default perfect suffix is /-u-/ and that it undergoes glide formation to [w] in (6), in intervocalic position, a generalization consistent with the allophonic facts. In (7), when adjacent to the verb root, glide formation is blocked. However, the examples in (7) cannot take a “free ride” on any allophonic generalization. As can be seen in (7d), the perfect suffix does not form [l.w, r.w] syllable contact clusters, unlike the adjectival suffix in (5). There is a surfeit of possible analyses for the failure of glide formation in this context: it might be an effect specific to the perfect suffix or to the category of verb, or the result of cyclicity or phase-based spellout. We leave the question open for now.

The “fourth stem”

The form of the perfect passive participle, the fourth principal part, similarly problematic. For many verbs, the perfect passive participle is formed by adding to the verb root a -t- suffix and the appropriate agreement suffixes (e.g., in citation form, the masc. nom.sg. -us), once again sometimes accompanied by lengthening of the stem-final vowel and/or leftward voice assimilation (an exception-less rule of Latin) triggered by the -t- as in (8b).

(8)
a. docuī [do.ku.iː] ‘I teach’, doctus [dok.tus] masc. nom.sg ‘taught’
b. tegō [te.goː] ‘I clothe’, tēctus [tek.tus] masc. nom.sg. ‘clothed’

Two verb roots which end in consonant followed by a high back vocoid and form a -t- perfect passive participle: soluō [solwoː] ‘I loosen; I explain’ and uoluō [wolwoː] ‘I roll’. This places the root-final high back vocoid, by hypothesis /u/, between two consonants, a context where glides are forbidden. The result is solūtus [soluːtus] and uolūtus [woluːtus]. However, it should be noted that this particular pattern is limited to these two verbs and their derivatives, and that the long ū is unexpected unless it reflects stem vowel lengthening (cf. tēctus above).

Synizesis and diaeresis

Latin poetry exhibits variation in glide formation. (The following examples are all drawn from Lehmann 2005). Synizesis, the unexpected overapplication of glide formation in response to the meter, can be seen in the following verse.

(9)
tenuis
ubī argilla et dūmōsīs calculus aruīs
(Verg., G. 2.180)
[ten.wi.su|biːr.gil|let.duː|mōsīs|kal.ku.lu|sar.wiːs]

In this verse, tenuis ‘thin’ occurs initially, which requires that the first syllable be heavy. The only way to accomplish this is to read it as the bisyllabic [ten.wis] rather than the expected trisyllabic [te.nu.is]. Similarly, in another verse (Verg., Aen. 8.599), abiēte, the ablative singular of abiēs ‘silver fir’, must be read as trisyllabic [ab.jeː.te] rather than the expected [ab.i.eː.te].

On the other hand, the poets also make use of diaeresis, or apparent underapplication of glide formation. For example, siluae, the genitive singular of silua ‘forest’, is in one verse (Hor., Carm. 1.23.4) read as trisyllabic [si.lu.aj] rather than as the expected bisyllabic [sil.waj]. The conditions governing synizesis and diaeresis are not yet well understood, but they constitute further evidence for the close grammatical relationship between [i ~ j] and [u ~ w] in Classical Latin.

Conclusion

We have seen four ways in which the Latin high vocoids alternate between vowels and glides. Together, these four patterns provide indirect evidence for the hypothesis that Latin glides are allophones of the corresponding high vowels, though there are some minor dissociations between patterns of allophony and alternations.

[Earlier writing about Latin glides: Latin glides and the case of “belua”]

References

Aronoff, Mark. 1994. Morphology by itself: stems and inflectional classes. Cambridge: MIT Press.
Devine, Andrew M., and Stephens, Laurence D. 1977. Two studies in Latin phonology. Saratoga: Anma Libri.
Hall, Robert A. 1946. Classical Latin noun inflection. Classical Philology 41(2): 84-90.
Hale, Mark and Kissock, Madelyn, and Reiss, Charles. 1998. Output-output correspondence in Optimality Theory. In Proceedings of WCCFL, pages 223-236.
Halle, Morris. 1959. The sound pattern of Russian. The Hague: Mouton.
Lehmann, Christian. 2005. La structure de la syllabe latine. In Touratier, Christian (ed.), Essais de phonologie latine, pages 157-206. Aix-en-Provence: Publications de l’Université de Provence.
Lieber, Rochelle. 1980. On the organization of the lexicon. Doctoral dissertation, MIT.
Steriade, Donca. 2012. The cycle without containment: Latin perfect stems. Ms., MIT.

Latin glides and the case of “belua”

Latin texts leave the distinction between high monophthongs [i, u, ī, ū] and glides [j, w] unspecified. This has lead some to suggest that the glides are allophones of the monophthongs. For instance, Steriade (1984) implies that the syllabicity of [+high, +vocalic] segments in Latin is largely predictable. Steriade points out two contexts where high vocoids are (almost) always glides: initially before a vowel (# __ V) and intervocalically (V __ V). In these two contexts, the only complications I am aware of arise from competition between generalizations. For instance, in ūua [uː.wa] ‘grape’ and ūuidus [uː.wi.dus] ‘damp’,  intervocalic glide formation appears to bleed word-initial glide formation. (Or it could be the case that ū is ineligible for glide formation by virtue of its length.) And the behavior of two adjacent high vocoids flanked by vowels is somewhat idiosyncratic: compare naevus [naj.wus] ‘birthmark’ and saeuiō [saj.wi.oː] ‘I am furious’, where (by hypothesis) /ViuV/ surfaces as [j.w], to dēuius [deː.wi.us] ‘devious’ and pauiō [pa.wi.oː] ‘I beat’, where (by hypothesis) /VuiV/ surfaces as [.wi] but never as *[w.j]. And so on.

However, Cser (2012) claims that syllabicity of high vocoids is not at all predictable after a consonant and before a vowel, i.e., in the context C __ V. Here we usually observe [w] when the preceding consonant is coda [j, l, r], as in the aforementioned naevus or silua [sil.wa] ‘forest’. Cser contrasts this latter form with belua ‘wild beast’, which is trisyllabic rather than bisyllabic. However, it is not clear this is a good near-minimal pair. The word was clearly not pronounced as [be.lu.a] because the first syllable scans heavy. In the following hexameter verse, the word comprises the fifth foot, a dactyl:

et centumgeminus Briareus, ac belua Lernae (Verg., Aen. 6.287)

Lewis & Short and the Oxford Latin Dictionary both give this word as bēlua [beː.lu.a]. However, it seems much more likely that the word is in fact bellua [bel.lu.a], as it was sometimes written. (Note also that tautomorphemic geminate ll is robustly attested in Latin.) In this case we would expect glide formation to be blocked because the [lw] complex onset is totally unattested, just as Cser predicts from general principles of sonority sequencing. Thus the above verse is:

[et.ken|tũː.ge.mi|nus.bri.a|re.u.sak|bel.lu.a|ler.naj]

As Cser notes, many of the remaining near-minimal pairs occur at morphological boundaries⁠—and thus look to someone with my theoretical commitments as evidence for the phonological cycle—or relate to the complex onsets qu [kw] and su [sw], which might be treated as contour segments underlyingly. But much work will be needed to show that these apparent exceptions follow from the grammar of Latin.

References

Cser, András. 2012. The role of sonority in the phonology of Latin. In Parker, Steve (ed.), The sonority controversy, pages 39-64. Berlin: Mouton de Gruyter.
Steriade, Donca. 1984. Glides and vowels in Romanian. In Proceedings of the Berkeley Lingusitics Society, pages 47-64.

Exceptions to reduplication in Kinande

Mutaka & Hyman’s (1990) study of reduplication in Kinande, a Bantu language spoken in “Eastern Zaire” (now the Democratic Republic of the Congo), is the sort of phonology study one doesn’t see much of anymore. The authors begin by noting the recent interest in reduplication phenomena, but note that most of the major work has completely ignored Bantu, an enormous language family in which nearly every language has one or more type of reduplication. Mutaka & Hyman (MH) proceed to describe Kindande reduplication in detail, with only occasional reference to other languages.

Nouns that undergo reduplication have the semantics of roughly ‘the real X’. Most Kinande verbs also undergo reduplication, with the semantics of roughly ‘to hurriedly X’ or ‘to repetitively X’. Verbal reduplication is somewhat more interesting because certain other verbal suffixes (or “extensions”, as they’re sometimes called in Bantu) may also be found in the reduplicant, argued to be a roughly-bisyllabic prefix.  For instance, the passive suffix is argued to be underlyingly /u/ but surfaces as [w], and is copied over in reduplication. Thus for the verb hum ‘beat’ the passive e-ri-hum-w-a ‘to be beaten’ reduplicates as erihumwahumwa. However, larger vowel-consonant verbal suffixes are not copied; the applied (-ir-) passive infinitive e-ri-hum-ir-w-a ‘to be beaten for’ has a reduplicated form erihumahumirwa, and for the verb tum ‘send’ the applied passive reciprocal (-an-) infinitive e-rí-tum-ir-an-w-a ‘to be sent to each other’ has a reduplicated form erítumatumiranwa (MH, 56).

What’s even more interesting to me is the behavior of verb stems with what MH call ‘unproductive’ extensions (all of which appear to be vowel-consonant). MH report that for only a small minority of these verb stems is there any plausible etymological relationship to a verb without the extension. One example is luh-uk-a ‘take a rest’ which is plausibly related to luh-a ‘be tired’ (MH, 73e), but there is no *bát-a paired with bát-uk-a ‘move’ (MH, 74d). Verb stems bearing unproductive suffixes may have one of three behaviors with respect to reduplication. For some such stems, reduplication is forbidden: eríbugula ‘to find’. For others, reduplication occurs but the ‘unproductive’ extension is stranded (the same behavior as the ‘productive’ extensions): e-rí-banguk-a ‘to jump about’ reduplicates as eríbangabanguka. Finally, some such stems (roughly half) unexpectedly build a trisyllabic (rather than bisyllabic) reduplicant consisting of the verb root and the unproductive extension: e-ri-hurut-a ‘to snore’ reduplicates as erihurutahuruta (MH, 75). This entire distribution poses a fascinating puzzle. How is the failure of reduplication encoded in the first case? What licenses the trisyllabic reduplicant in the last case?

References

Mutaka, Ngessimo and Hyman, Larry M. 1990. Syllables and morpheme integrity in Kinande reduplication. Phonology 7: 73-119.

A Morris Halle memory

Morris Halle passed away earlier today. Morris was an absolute giant in the field of linguistics. His work in the 1950s and 1960s completely revolutionized phonological theory. He did this, primarily, by rejecting an axiom of the previous century’s work.
The theory of phonology was so utterly transformed by his argument against the principle of biuniqueness that the very concept is rarely even taught in the 21st century.
And this was just one of his earliest scientific contributions.

I could say a lot more about Morris’s work, but instead let me tell a short anecdote. In 2010 or so I happened to be in the Boston area and my advisor kindly arranged for me to meet Morris. After getting coffee we walked to his spare shared office. The only thing of note was a single wall-mounted bookshelf containing three books: Morris’ own Sound Pattern of Russian and Sound Pattern of English—with the dust cover removed so as to exhibit the unique bas-relief cover designed by Morris’s wife, a talented visual artist—and of course, Walker’s rhyming dictionary. For whatever reason, we started to discuss Latin. Working with the legal pad, Morris first showed me a novel analysis of thematic vowels. Ignoring a few irregular (“athematic”) stems, all Latin verb stems have a characteristic final vowel: -ā- in the first conjugation, -ē- in the second, a vowel of varying quality (usually e or i) in the third, and -ī- in the fourth. In the first conjugation and most of the third conjugation, this vowel disappears in the first person singular active indicative verb, which is marked with an suffix. Thus for the second conjugation verb docēre ‘teach’, we have doceō ‘I teach’, with the theme vowel preserved, and similarly for the fourth conjugation. In contrast, for the first conjugation verb amāre ‘love’, we have amō ‘I love’, with the theme vowel omitted, and similarly for the majority of the third conjugation. This much I already knew. To me it was just one of those conjugational quirks one has to memorize when learning Latin but Morris suggested that it was not necessarily so. What if, he argued, the first conjugation -ā- was deleted by a following ? (Certainly that rule is surface-true, except for a handful of Greek loanwords like chaos.) But what about the third conjugation? Morris suggested that he had long believed the underlying form of the third conjugation theme vowel was [+back], something like /ɨ/, and he proceeded to lay out the necessary allophonic rules, and finally a rule which deletes the first of two [+back] segments! I was floored.

I then showed him an analysis I was working on at the time. Once again ignoring a few irregulars, Latin masculines and feminine nouns of the third declension are characterized by a nominative singular suffix -s. When the verb stem is athematic and ends in a /t, d/, this consonant is deleted in the nominative singular (e.g., frons, frontis ‘forehead’). I argued that this rule ought to be extended to also target /r/ so as to account for the so-called “rhotic” stems like honōs, honōris ‘honor’ (e.g., /honōr-s/ → [honōs]). To make this work, one must write the rule so that it bleeds its own application (see here for the full analysis), and as one of several opaque rules. This is something which is possible in the rule-application framework proposed by Morris and colleagues, but which cannot be straightforwardly implemented in more recent theoretical frameworks. I must have hesitated for a moment as I was talking through this, because Morris grabbed my hand and said to me: “Young man, remember always to speak clearly and to never apologize for your rule ordering.” And then he bid me adieu.

Libfix report for February 2018

  • While -splain (and -splainer, -splaining) clearly have potential, they hadn’t, as far as I could tell, gotten much beyond mansplain and occasionally, womansplain. But I changed my mind once I saw a podcast episode entitled “Orbsplainer“, about, well, the orb, you remember the orb, right? How could you forget the Orb? The Orb forbids it! Anyways, looks like a libfix to me.
  • Constantine Lignos draws my attention to -tainment, a term which refers to media (particularly video and video games) which entertains in addition while doing something else. The locus classicus is the ’90s term edutainment, which looks much more like a blend than a libfix, as does infotainment, politainment, and psychotainment. But but pornotainment suggests this is on its way to affix liberation.

Libfix report for May 2016

Two bits of creative morphology I’ve been seeing around the city:

  • Lime-a-rita: This trademark (of Anheuser-Busch InBev) isn’t just a redundant way to refer to a margarita (which has a lime base—a non-lime “margarita” is a barbarism), but rather a “light American lager” blended with additional lime-y-ness. I have to imagine this coinage, albeit rather corporate, was helped along by the existence of the truncation ‘rita, occasionally used in casual conversation by their most comitted devotees.
  • -otto: I first came aware of this through pastotto, the suggested name for a dish of pasta (perhaps penne), fried in olive oil and butter and then cooked in stock, like risotto; according to popularizer Mark Bittman, this is an old trick. Now, that one looks a bit blend-y, given that the ris- part of risotto is really a reference to arborio rice, and that the final -a in the base pasta appears to be lost in the combination. But not so much for barleyotto, which satisfies even the most stringent criteria for libfix-hood.