Optimality Theory on exceptionality

[This post is part of a series on theories of lexical exceptionality.]

I now take a large jump in exceptionality theory from the late ’70s to the mid-aughts. (I am skipping over a characteristically ’90s approach, which I’ll cover in my final post.) I will focus on a particular approach—not the only one, but arguably the most robust one—to exceptionality in Optimality Theory (OT). This proposal is as old as OT itself, but is developed most clearly in Pater (2006), and Pater & Coetzee (2005) propose how it might be learned. I will also briefly discuss the application of the morpheme-specific approach to the yer patterns characteristic of Slavic languages by Gouskova and Rysling, and Rubach’s critique of this approach.

I will have very little to say about the cophonology approach to exceptionality that has also been sporadically entertained in OT.  Cophonology holds that morphemes may have arbitrarily different constraint rankings. Pater (henceforth P) is quite critical of this approach throughout his 2006 paper: among other criticisms, he regards it as completely unconstrained. I agree: it makes few novel predictions, and would challenge cophonologists (if any exist in 2024) to consider how cophonology might be constrained so as to derive interesting predictions about exceptionality.

Indexed constraints

Even the earliest work in Optimality Theory supposed that some constraints might be specific to particularly grammatical categories or morphemes. This of course is a loosening of the idea that Con, the universal constraint family, is language-universal and finite, but it seems to be a necessary assumption. P claims that this device is powerful enough to handle all known instances of exceptionality in phonology. The basic idea is extremely simple: for every constraint X there may also exist indexed constraints of the form X(i) whose violations are only recorded when the violation occurs in the context of some morpheme i.1 There are then two general schemas that produce interesting results.

(1) M(i) >> F >> M
(2) F(i) >> M >> F

Here M stands for markedness and F for faithfulness. As will be seen below, (1) has a close connection to the notions of mutability and catalysis introduced in my earlier post; (2) in turn has a close connection with quiescence and inalterability.

One of P’s goals is to demonstrate that this approach can be applied to Piro syncope. His proposal is not quite as detailed as one might wish, but it is still worth discussing and trying to fill in the gaps. For P, the general syncope pattern arises from the ranking Align-Suf-C >> Max; in prose, it is permissible to delete a segment if doing so brings the suffix in contact with a consonant. This also naturally derives the non-derived environment condition since it specifically mentions suffixhood. P derives the avoidance of tautomorphemic clusters, previously expressed with the VC_CV environment, with the markedness constraint *CCC. This gives us *CCC >> Align-Suf-C >> Max thus far.  This should suffice for derivations whose roots are all mutable and catalytic.

For P, inalterable roots are distinguished from mutable ones by an undominated, indexed clone of Max which I’ll call Max(inalt), giving us a partial ranking like so.

(3) Max(inalt) >> Align-Suf-C >> Max

This is of course an instance of schema (2). Note that since the ranking without the indexing is just Align-Suf-C >> Max, it seemingly treats mutability as the default and inalterability as exceptional, a point I’ll return to shortly.

Quiescent roots in P’s analysis are distinguished from catalytic ones by a lexically specific clone of Align-Suf-C; here the lexically indexed one targets the catalytic suffixes, so we’ll write it Align-Suf-C(cat), giving us the following partial ranking.

(4) Align-Suf-C(cat) >> Max >> Align-Suf-C

This is an instance of schema (1). It is interesting to note that the Align constraint bridges the distinction between target and trigger, since the markedness is a property of the boundary itself. Note also that it also seems to treat quiescence as the default and catalysis as exceptional.

Putting this together we obtain the full ranking below.

(5) *CCC, Max(inalt) >> Align-Suf-C(cat) >> Max >> Align-Suf-C

P, unfortunately, does not take the time to compare his analysis to Kisseberth’s (1970) proposal, or to contrast it with Zonneveld’s (1978) critiques, which I discussed in detail in the earlier post. I do observe one potential improvement over Kisseberth. Recall that Kisseberth had trouble with the example /w-čokoruha-ha-nu-lu/ [wčokoruhahanru] ‘let’s harpoon it’ , because /-ha/ is quiescent and having a quiescent suffix in the left environment is predicted counterfactually to block deletion in /-nu/. As far as I can tell this is not a problem for P; the following suffix /-lu/ is on the Align-Suf-C(cat) lexical list and /-nu/ is not on the Max(inalt) list and that’s all that matters. Presumably, P gets this effect because of the joint operation of the two flavors of Align-Suf-C  and *CCC means has properly localized the catalysis/quiescence component of the exceptionality. However, P’s analysis does not seem to generate the right-to-left application; it has no reason to favor the attested /n-xipa-lu-ne/ [nxipalne] ‘my sweet potato’ over *[nxiplune]. This reflects a general issue in OT in accounting for directional application.

As I mentioned above, P’s analysis of Piro treats mutability and quiescence as productive and inalterability and catalysis as exceptional. Indeed, it predicts mutability and quiescence in the absence of any indexing, and one might hypothesize that Piro speakers would treat a new suffix of the appropriate shape as mutable and quiescent. I know of no reason to suppose this is correct; for Matteson (1965), these are arbitrary and there is no obvious default, whereas my impression is that Kisseberth views mutability (like P) and catalysis (unlike P) as the default. This question of productivity is one that I’ll return to below as I consider how indexing might be learned.

Learning indexed constraints

Pater and Coetzee (2005, henceforth P&C) propose indexed constraint rankings can be learned using a variant of the Biased Constraint Demotion (BCD) algorithm developed earlier by Prince and Tesar (2004). Most of the details of that algorithm are not strictly relevant here; I will focus on the ones that are. BCD supposes that learners are able to accumulate UR/SR pairs and then use the current state of their constraint hierarchy to record them as a data structure called a called mark-data pair. These give, for each constraint violation, whether that violation prefers the actual SR or a non-optimal candidate. From a collection of these pairs it is possible to rank constraints via iterative demotion.2 The presence of lexical exceptionality produces a case where it is not possible to for vanilla BCD to advance the demotion because a conflict exists: some morphemes favor one ranking whereas others favor another. P&C propose that in this scenario, indexed constraints will be introduced to resolve the conflict.

P&C are less than formal in specifying how this cloning process works, so let us consider how it might function. Their example, a toy, concerns syllable shape. They suppose that they are dealing with a language in which /CVC/ is marked (via NoCoda) but there are a few words of this shape which surface faithfully (via Max). They suppose that this results in a ranking paradox which cannot be resolved with the existing constraints. As stated, I have to disagree: their toy provides no motivation for NoCoda >> Max.3 Let us suppose, though, for sake of argument that there is some positive evidence, after all, for that ranking. Perhaps we have the following.

(6) Toy grammar (after P&C):
a. /kap/ – > [ka]
b. /gub/ -> [gu]
c. /net/ -> [net]
d. /mat/ -> [mat]

Let us also suppose that there is some positive evidence that /kap, gub/ are the correct URs so they are not changed to faithful URs via Lexicon Optimization. Then, (6ab) favor NoCoda >> Max but (6cd) favor Max >> NoCoda. P&C suppose this is resolved by cloning (i.e., generating an indexed variant of) Max, producing a variant for each faithfully-surfacing /CVC/ morpheme. If these morphemes are /net/ and /mat/, then we obtain the following partial ranking after BCD.

(7) Max(net), Max(mat) >> NoCoda >> Max

This is another instance of schema (2); there are just multiple indexed constraints in the highest stratum. Indeed, P&C imagine various mechanisms by which Max(net) and Max(mat) might be collapsed or conflated at a later stage of learning.

It is crucial to the P&C’s proposal that the child actual observes the exceptional morphemes both of (6cd) surfacing faithfully; however, it is not necessary to observe (6ab), just to observe some morphemes in which, like in (6ab), a coda consonant is deleted so as to trigger cloning. The critical sample for (7), then, is either (6acd) or (6bcd). It is not necessary to see both (6a) and (6b), but it is necessary to see both of (6cd). Thus, there is some very real sense in which this analysis treats coda deletion as the productive default and coda retention as exceptional behavior, much like how P’s analysis of Piro treated mutability and quiescence as productive. However, it seems like P&C could have instead adapted schema (1) and proposed that what is cloned is NoCoda, obtaining the following ranking.

(8) NoCoda(kap), NoCoda(gub) >> Max >> NoCoda

Then, for this analysis, the crucial sample is either (6abc) or (6abd), and there is a similar sense in which coda retention is now the default behavior.

P&C give no reason to prefer (7) over (8). Reading between the lines, I suspect they imagine that the relative frequency (i.e., number of morpheme types) which either retain or lose their coda is the crucial issue, and perhaps they would appeal to an informal “majority-rules” principle. That is, if forms like (6ab) are more frequent than those like (6cd) they would probably prefer (7) and would prefer (8) if the opposite is true. However, I think P&C should have taken up this question and explained what is cloned when. Indeed, there is an alternative possibility: perhaps cloning produces all of the following constraints in addition to Max and NoCoda.

(9) Max(kap), Max(gub), Max(net), Max(mat), NoCoda(kap), NoCoda(gub), NoCoda(net), NoCoda(mat)

While I am not sure, I think BCD would be able to proceed and would either converge on (7) or (8), depending on how it resolves apparent “ties”.

Another related issue, which also may lead to the proliferation of indexed constraints, is that P&C have little to say about how constraint cloning words in complex words. Perhaps the cloning module is able to localize the violation to particular morphemes. For instance, it seems plausible that one could inspect a Max violation, like the ones produced by Piro syncope, to determine which morpheme is unfaithful and thus mutable. However, if we wish to preserve P’s treatment of mutability as the default (and that inalterable morphemes have a high-ranked Max clone), we instead need to do something more complex: we need to determine that a certain morpheme does not violate Max (good so far), but also that under a counterfactual ranking of this constraint and its “antagonist” Align-Suf-C, would have done so; this may be something which can be read off of mark-data pairs, but I am not sure. Similarly, to preserve P’s treatment of quiescence as the default, we need to determine that a certain suffix has an Align-Suf-C violation (again, good so far), but also that under a counterfactual ranking of this constraint and its antagonist, it would have not done so.

While I am unsure if this counterfactual reasoning part of the equation can be done in general, I can think of least one case where the localization reasoning cannot be done: epenthesis at morpheme boundaries, as in the [-əd] allomorph of the English regular past. Here there is no sense in which the Dep violation can be identified to a particular morpheme. Indeed, Dep violations are defined by the absence of correspondence. This is a perhaps an unfortunate example for P&C’s approach. English has a number of few “semiweak” past tense forms (e.g. from Myers 1987: bitbledhidmet, spedledreadfedlitslid) which are characterized by a final dental consonant and shortening of the long nucleus of the present tense form. Given related pairs like keep-kept, one might suppose that these bear a regular /-d/ suffix, but fail to trigger epenthesis (thus *[baɪtəd], etc.). To make this work, we assume the following.

(10) Properties of semiweak pasts:
a.  Verbs with semiweak pasts are exceptionally indexed to a high-ranking Dep constraint which dominates relevant syllable structure markedness constraints.
b. Verbs with semiweak pasts are exceptionally indexed to high-ranking markedness constraint(s) triggering “Shortening” (in the sense of Myers 1987)
c. A general (i.e., non-indexed) markedness constraint against hetero-voiced obstruent clusters dominates antagonistic voice faithfulness constraints.

The issue is this: how do children localize the failure of epenthesis in (10a) to the root and not the suffix, given that the counterfactual epenthetic segment is not an exponent of either, occurring rather at the boundary between the two? Should one reject the sketchy analysis given in (10), there are surely many other cases where correspondence alone is insufficient; for example, consider vowels which coalesce in hiatus.

The yers

I have again already gone on quite long, but before I stop I should briefly discuss the famous Slavic yers as they relate to this theory.

In a very interesting paper, Gouskova (2012) presents an analysis of the yers in modern Russian. In Russian, certain instances of the vowels e and alternate with zero in certain contexts. These alternating vowels are termed yers in traditional Slavic grammar. A tradition, going back to early work by Lightner (1965), treats yers in Russian and other Slavic languages, as underlyingly distinct from non-alternating e and o, either featurally or, in later work, prosodically. For example, лев [lʲev] ‘lion’ has a genitive singular (gen.sg.) льва [lʲva] and мох [mox] ‘moss’ has a gen.sg. [mxa].

Gouskova (henceforth G) wishes to argue that yer patterns are better analyzed using indexed constraints, thus treating morphemes with yer alternations as exceptional rather than treating the yer segments as underspecified. In terms of the constraint indexing technology, G’s analysis is straightforward. Alternating vowels are underlyingly present in all cases, and their deletion is triggered by a high-ranked constraint *Mid (which disfavors mid vowels, naturally) which is indexed to target exactly those morphemes which contain yers. Additional phonotactic constraints relating to consonant sequences are used to prevent deletion that produces word-final consonant clusters. Roughly, then, the analysis is:

(11) *CC]σ >> *Mid(yer morphemes) >> Max-V >> *Mid

As G writes (99-100, fn. 18): “In Russian, deletion is the exception rather than the rule: most morphemes do not have deletion, and neither do loanwords…”

It should be noted that G’s analysis departs from the traditional (“Lightnerian”) analysis in ways not directly to the question of localizing exceptionality (i.e., in the morpheme vs. the segment). For one, (11) seems to frame retention of a mid vowel as a default. In contrast, the traditional analysis does not seem to have any opinion on the matter. In that analysis, whether or not a mid vowel is alternating is a property of its underlying form, and should thus be arbitrary in the Saussurean sense. This is not to say that we expect to find yers in arbitrary contexts. There are historical  reasons why yers are mostly found in the final syllable—this is the one of the few places where the historical sound change called Havlík’s Law, operating more or less blindly, could introduce synchronic yer/zero alternations in the first place (in many other contexts the yers were simply lost), and in other positions it is impossible to ascertain whether or not a mid vowel is a yer. Whether or not an alternative versions of the sound change could have produced an alternative-universe Russian where yers target the first syllable is an unknowable counterfactual given that we live in our universe, with our universe’s version of Havlík’s Law. Secondly, the traditional analysis (see Bailyn & Nevins 2008 for a recent exemplar) usually conditions the retention of yers on the presence of a yer (which may or may not be itself retained) in the following syllable. In contrast, G does not seem to posit yers for this purpose nor does she condition their retention on the presence of nearby yers. In the traditional analysis, these conditioning yers are motivated by the behavior of yers in prefixes and suffixes in derivational morphology, and much of this hinges on apparent cyclicity. G provides an appendix in which she attempts to handle several of these issues in her theory, but it remains to be seen whether this has been successful in dismissing all the concerns one might raise.

G provides a few arguments as to why the exceptional morpheme analysis is superior to the traditional analysis. G wishes to establish that mid vowels are in fact marked in Russian, so that yer deletion can take a something of a “free ride” on this constraint. As such, she claims that yer deletion is related to the reduction of mid vowels in unstressed syllables. But how do we know that these these facts are connected? And, if they are in fact connected, is it possible that there is an extra-grammatical explanation? For instance, there may be a “channel bias” in production and/or perception that disfavors faithful realization of mid vowels (and thus imposes a systematic bias in favor of reduction and deletion) compared to the more extreme phonemic vowels (in her analysis, /a, i, u/) which caused the actuation of both changes. Phenomenologically speaking, it is true that there are two ways in which certain Russian mid vowels are unfaithful, but this is just one of a infinite set of true statements about Russian phonology, and there is something “just so” about this one.

Before I conclude, let us now turn briefly to Polish. Like Russian, this language has mid-vowels which alternate with zero in certain contexts. (Unlike Russian, for whatever reason, the vast majority of alternating vowels are e; there are just three morphemes which have an alternating o.)

Rubach (2013, 2016), explicitly critiques constraint indexation using data from Polish. Rubach argues that G’s analysis cannot be generalized straightforwardly to Russian. He draws attention to stems that contain multiple mid vowels, only one of which is a yer (e.g., sfeter/sfetri ‘sweater’); and concludes that it is not necessarily possible to determine which (or both) should undergo deletion in an “exceptional” morpheme. The only mechanism with which one might handle this is a rather complex series of markedness constraints on consonant sequences. Unfortunately, Polish is quite permissive of complex consonant clusters and this mechanism cannot always be relied upon to deliver the correct answer. He also draws attention to the behavior of derivational morphology such as double diminuitives. In contrast, Rysling (2016) attempts to generalize G’s indexed constraint analysis of yers to Polish. However, her analysis differs from G’s analysis of Russian in that she derives the yers from epenthesis to avoid word-final consonant clusters. Furthermore, for Rysling, epenthesis, in the relevant phonotactic contexts (to a first approximation, certain C_C#), is the default, and failure to epenthesize is exceptional.5 Sadly, there is little interaction between the Rubach and Rysling papers (the latter briefly discusses the former’s 2013 paper), so I am not prepared to say whether Rysling’s radical revision addresses Rubach’s concerns with constraint indexation.

References

  1. P and colleagues refer to these constraints as “lexically specific”, but in fact it seems the relevant structures are all morphemes, and never involve polymorphemic words or lexemes.
  2. As far as I know, though, there is no proof of convergence,  under any circumstances, for BCD.
  3. Perhaps they are deriving this from the assumption that the initial state is M >> F, but without alternation evidence, BCD would rerank this as Max >> NoCoda and cloning would not be triggered.
  4. A subsequent rule of obstruent voice assimilation, which is needed independently would give us [kɛpt] from /kip-d/, and so on.
  5. Rysling seems to derive this proposal from an analysis of lexical statistics: she counts how many Polish nouns have yer alternations in the context …C_C# and compares this to non-alternating …CeC# and …CC#. It isn’t clear to me how the proposal follows from the statistics, though: non-epenthesis and epenthesis in …C_C# are about equally common in Polish, and their relative frequencies are not much different from what she finds in Russian.

References

Bailyn, J. F. and Nevins, A. 2008. Russian genitive plurals are impostors. In A. Bachrach and A. Nevins (ed.), Inflectional Identity, pages 237-270. Oxford University Press.
Gouskova, Maria. 2012. Unexceptional segments. Natural Language & Linguistic Theory 30: 79-133.
Kenstowicz, M. 1970. Lithuanian third person future. In J. R. Sadock and A. L. Vanek (ed.), Studies Presented to Robert B. Lees by His Students, pages 95-108. Linguistic Research.
Lightner, T. Segmental phonoloy of Modern Standard Russian. Doctoral dissertation, Massachusetts Institute of Technology.
Matteson, E. 1965. The Piro (Arawakan) Language. University of California Press.
Myers, S. 1987. Vowel shortening in English. Natural Language & Linguistic Theory 5(4): 485-518.
Pater, J. and Coetzee, A. W. 2005. Lexically specific constraints: gradience, learnability, and perception. In Proceedings of the Korea International Conference on Phonology, pages 85-119.
Pater, J. 2006. The locus of exceptionality: morpheme-specific phonology as constraint indexation. In University of Massachusetts Occasional Papers 32: Papers in Optimality Theory: 1-36.
Pater. J. 2009. Morpheme-specific phonology: constraint indexation and inconsistency resolution. In S. Parker (ed.), Phonological Argumentation: Essays on Evidence and Motivation, pages 123-154. Equinox.
Prince, A. and Tesar, B. 2004. Learning phonotactic distributions. In Kager, R., Pater, J. and Zonneveld, W. (ed.), Constraints in Phonological Acquisition, pages 245-291. Cambridge University Press.
Rubach, J. 2013. Exceptional segments in Polish. Natural Language & Linguistic Theory 31: 1139-1162.
Rysling, A. 2016. Polish yers revisited. Catalan Journal of Linguistics 15:121-143.
Zonneveld, W. 1978. A Formal Theory of Exceptions in Generative Phonology. Peter de Ridder.

Kisseberth & Zonneveld on exceptionality

[This post is part of a series on theories of lexical exceptionality.]

In a paper entitled simply “The treatment of exceptions”, Kisseberth (1970) proposes an interesting revision to the theory of exceptionality. Many readers may be familiar with the summary of this work given by Kenstowicz & Kisseberth 1977:§2.3 (henceforth K&K). Others may know it from the critique by Zonneveld (1978: ch. 3) or Zonneveld’s (1979) review of K&K’s book. I will discuss all of these in this post.

Kisseberth (1970)

A quick sidebar: Kisseberth’s paper is a fascinating scholarly artifact in that it probably could not be published in its current form today. (To be fair it was published in an otherwise-obscure journal, Papers in Linguistics.) For one, all the data is drawn from Matteson’s (1965) grammar of Piro;1 the only other referenced work is SPE. Kisseberth (henceforth K) gives no page, section, or example numbers for the forms he cites. I have tried to track down some of the examples in Matteson’s book, and it is extremely difficult to find them. K gives no derivations, only a few URs, and the entire study hinges around a single rule. But it’s provocative stuff all the same.

K observes that Piro has a rule which syncopates certain stem-final vowels. He gives the following formulation:

(1) Vowel Drop:

V -> ∅ / VC __ + CV


For example, [kama] ‘to make, form’ has a nominalization [kamlu] ‘handicraft’ with nominalizing suffix /-lu/, and [xipalu] ‘sweet potato’ has a possessed form /n-xipa-lu-ne/ [nxipalne] ‘my sweet potato’.2 One might think that (1) is intended to be applied simultaneously, as this is the convention for rule application in SPE, but this would predict *[nxiplne], with a medial triconsonantal cluster. Left-to-right application gives *[nxiplune]; the only way to get the observed [nxipalne] is via right-to-left (RTL) application, which I’ll assume henceforth. As far as I know, the directionality issue has not been noticed in prior work.

Of course, there are exceptions of several types. (I am drawing additional data from the unpublished paper by CUNY graduate student Héctor González, henceforth G. I will not make any attempt to make González’s transcriptions or glosses comparable to those used by K, but doing so should be straightforward.)

One type is exemplified by /nama/ ‘mouth of’, which does not undergo Vowel Drop, as in /hi-nama-ya/ [hinamaya] ‘3sgmpssr-mouth.of-Obl.’ (G 5a); under RTL application we would expect *[hinmaya]. This is handled easily in the SPE exceptionality theory I reviewed a few weeks ago by marking /nama/ as [-Vowel Drop].

However, other apparent instances of exceptionality are not so easily handled. Consider two forms involving the verb root /nika/ ‘eat’. In  /n-nika-nanɨ-m-ta/ [hnikananɨmta] ‘1sg-eat-Extns-Nondur-Vcl’ (G 5b) both vowels of the root satisfy (1) but do not undergo syncope. One might be tempted to mark this root as [-Vowel Drop], but it does undergo deletion in other derivations, such as in /nika-ya-pi/ [nikyapi] ‘eat-Appl-Instr.Nom’ (G 4d). Rather, it seems to be that the following /-nanɨ/ fails to trigger deletion. This is not easily handled in the SPE approach. K gives a number of similar examples involving the verbal theme suffixes /-ta/ and /-wa/, which also do not trigger syncope. If a morphemes vary in whether or not they undergo and whether or not they trigger Vowel Drop, one can imagine that these properties might cross-classify:

  • Mutable, catalytic The nominalizing suffix /-lu/, discussed above, is both  mutable (i.e., undergoes syncope), and catalytic (triggers syncope) in /n-xipa-lu-ne/ [nxipalne].
  • Inalterable, catalytic I have not found any relevant examples in Piro; Kenstowicz & Kisseberth 1977 (118f.) present a hastily-described example from Slovak.
  • Mutable, quiescent /meyi-wa-lu/ [meyiwlu] ‘celebration’ shows that intranstive verb theme suffix /-wa/ is mutable but quiescent (does not trigger syncope; *[meywalu]).
  • Inalternable, quiescent /yimaka-le-ta-ni-wa-yi/ shows that the imperfective suffix /-wa/ (not to be confused with the homophonous intransitive /-wa/) is inalterable; /r-hina-wa/ [rɨnawa] ‘3-come-Impfv’ (G 6c) shows that it is quiescent (*[rɨnwa]).

According to G, there is also one additional category that does not fit into the above taxonomy: the elative suffix /-pa/ triggers deletion of the penultimate (rather than preceding) vowel, as in /r-hitaka-pa-nɨ-lo/ [rɨtkapanro] ‘3-put-Evl-Antic-3sgf’ (G 7a). Furthermore, /-pa/ appears to lose its catalytic property when it undergoes syncope, as in /cinanɨ-pa-yi/ [cinanɨpyi] ‘full-Elv-2sg’ (G 7c). Given the rather unexpected set of behaviors here, apparently confined to a single suffix, I wonder if this is the full story.

Having reviewed this  data, I don’t have an abundance of confidence in it, particularly given K’s hasty presentation. However, K has identified something not obviously anticipated by the SPE theory. K’s proposal is a simple extension of the SPE theory; in addition to rule features for the target, we also need rule features for the context. For instance, inalterable, quiescent imperfective marker /-wa/,4 which neither undergoes nor triggers Vowel Drop, would be underlyingly [-rule Vowel Drop, env. Vowel Drop]. Then, the rule interpretative procedure applies a rule R when its structural description is met, when the target is [+rule R], and when the all morphemes in the context are [+env. R].

Zonneveld (1978, 1979)

I have already gone on pretty long, but I should briefly discuss what subsequent writers have had to say about this proposal. Kenstowicz & Kisseberth (1977, henceforth K&K), perhaps unsurprisingly endorse the proposal, and provide some very hasty examples of how one might use this new mechanism. Zonneveld (henceforth Z), in turn, is quite critical of K’s theory. These criticisms are laid out in chapter 3 of Zonneveld 1978 (a published version of his doctoral dissertation), which reviews quite a bit of contemporary work dealing with this issue. The 1978 book chapter (about 120 typewritten pages in all) is a really good review; it is well organized and written, and full of useful quotations from the sources it reviews, and while it is somewhat dense it is hard to imagine how it could be made less so. Z reprises the criticisms of K’s theory briefly, and near verbatim, in his uncommonly-detailed review of K&K’s book (Zonneveld 1979). Z has several major criticisms of rule environment theory.

First, he draws attention to an example on where the conventions proposed by K will fail; I will spell this out in a bit more detail than Z does. The key example is /w-čokoruha-ha-nu-lu/ [wčokoruhahanru] ‘let’s harpoon it’. The anticipatory /-nu/ is mutable (but quiescent) and it is in the phonological context for syncope. To its left is the ‘sinister hortatory’ /-ha/, and this is known to be quiescent because it does not trigger deletion of the final vowel in /čokoruha/; cf. /čokoruha-kaka/ [čokoruhkaka] ‘to cause to harpoon’, which shows that the substring /…čokoruha-ha…/ does not undergo deletion because /-ha/ is quiescent rather than because /čokoruha/ is inalterable. To its right is the catalytic /-lu/. By K’s conventions, syncope should not apply to the /u/ in the anticipatory morpheme because /-ha/, in the left context, is [-env. Vowel Drop], but in fact it does. Z anticipates that one might want to introduce separate left and right context environment features: maybe /-ha/ is [+left env. Vowel Drop, -right env. Vowel Drop]. The following additional issues suggest the very idea is on the wrong track, though.

Seccondly, Z shows that rule environment features cause additional issues if one adopts the SPE conventions. The /-ta/ in /yona-ta-nawa/ [yonatnawa] ‘to paint oneself’ is presumably quiescent because it fails to trigger syncope in /yona/.5  Thus we would expect it to be lexically [-env. Vowel Drop], and for this specification to percolate to the segments /t/ and /a/. (I referred to this as Convention 2 in my previous post, and K adopts this convention.) However, it is a problem for this specification to be present on /t/, since that /t/ is itself in the left context for Vowel Drop, and this would counterfactually block its application to the second /a/! This is schematized below.

(2) Structural description matching for /yona-ta-nawa/

   VCVCV
yonatanawa

As a related point, Z points out that there many cases where under K’s proposal it is arbitrary whether one uses rule or environmental exception features. For instance, in the famous example obesity, the root-final s is part of the structural context so the root could be marked [-rule Trisyllabic Shortening], which would percolate to the focus e, or it could be marked [-env. Trisyllabic Shortening], which would percolate to the right-context s, or both; all three options would derive non-application. This is also schematized below.

(3) Structural description matching for obesity:

  VC-VCV
obes-ity

Z continues to argue that a theory that distinguishes between leftward and rightward contextual exceptionality also will not go through. Sadly, he does not provide a full analysis of the Piro facts in his preferred theory.

Z has a much more to say about the (then-)contemporary literature on rule exceptionality. For example, he discusses an idea, originally proposed by Harms (1968:119f.) and also exemplified by Kenstowicz (1970), that there are exceptions such that a rule applies to morphemes that do not meet its (phonologically defined) structural description. While he does seem to accept this, possible examples for such rules is quite thin on the ground,  and the very idea seems to reflect the mania for minimizing rule descriptions and counting features that—and this is not just my opinion—polluted early generative phonology. If one rejects this frame, it is obvious that the effect desired can be simulated with two rules, applied in any order. The first will be a phonologically general one (with or without negative exceptions); the second will be the same change but targeting certain morphemes using whatever theory of exceptionality one prefers. Indeed, most examples of rules applying where their structural description is not met are already disjunctive, and I doubt whether such rules are really a single rule in the first place.

The ultimate theory Z settles on is one quite similar to that proposed by SPE. First, readjustment rules introduce rule features like [+R] and these handle simple exceptions of the obesity type. Z proposes further that such readjustment rules must be context-free, which clearly rules out using this mechanism for phonologically defined classes of negative exceptions; cf. (4-5) in my previous post.  Secondly, Z proposes that so-called morphological features like Lightner’s [±Russian] will be used for deriving what we might now call “stratal” effects: morphemes that are exceptions to multiple rules. For instance, if we have three rules ABC that all [-Russian] morphemes are exceptions to, then context-free redundancy rules will introduce the following rule features.

(4)
[-Russian] -> {-A}
[-Russian] -> {-B}
[-Russian] -> {-C}

Z replays several arguments from Lightner about why morphological of this sort should be distinguished from rule features; I won’t repeat again them here. Finally, Z derives minor rules via readjustment rules triggered by so-called “alphabet” features. For instance, let us again consider umlauting English plurals like goosegeese. Z supposes, adding some detail to a sketchier portion of the SPE proposal, that morphemes targeted by umlaut are marked [+G] (where G is some arbitrary feature). There are two ways one could imagine doing this.

First, either the underlying form, /guːs/ perhaps, could be underlyingly [+G]. Then, let us assume that umlauting is simply fronting in the context of a Plural morphosyntactic feature, and that subsequent phonological adjustments (like the diphthongization in mousemice) are handled by later rules. Then it is possible to write this as follows:

(5) Umlaut (variant 1): [+Back, +G] -> {-Back} / __ [+Plural]

This rule is phonologically “context-free”, but its application is conditioned by the presence of the alphabet feature specification in the focus and the morphosyntactic feature in the context. I will take up the question of whether such rules are always phonologically context-free in a (much) later post.

I suspect that the analysis in (5) is the one Z has in mind, and it is also seems to be the orthodoxy in Distributed Morphology (henceforth DM); see, e.g., Embick & Marantz 2008 and particularly their (4) for a conceptually similar analysis of the English past tense. Applying their approach strictly would lead us to miss the generalization (if it is in fact a linguistically meaningful generalization) that umlauting plurals all have a null plural suffix. Umlauting plurals have an underlying feature [+G] (there is no “list” per se; it is just), but their rules of exponence also need to “list” these umlauting morphemes as exceptionally selecting the null plural rather than the regular /-z/. It seems to me this is not necessary because the rules of exponence for the plural maybe could be sensitive to the presence or absence of [+G]. This would greatly reduce the amount of “listing” necesssary. (I do not have an analysis of—and thus put aside—the other class of zero plurals in English, mass nouns like corn.)

(6) Rules of exponence for English noun plural (variant 1):

a. [+Plural] <=>  ∅       / __ [+G]
b.                     <=> -ɹɘn / __ {√CHILD, …}
c.                      <=> -ɘn / __ {√OX, …}
d.                      <=> -z   / __

Secondly and more elaborately, one could imagine that [+G] is inserted by—i.e., and perhaps, is the expression of—plurality for umlauting morphemes. In piece-based realizational theories like DM, affixes are said to expone (and thus delete) syntactic uninterpretable features. One possibility (which brings this closer to amorphous theories without completely discarding the idea of morphs) is to treat insertion of [+G] as an exponent of plurality.

(7) Rules of exponence for English noun plural (variant 2):

a. [+Plural] <=> {+G} / __ {√GOOSE, √FOOT, √MOUSE, …}
b.                     <=> -ɹɘn  / __ {√CHILD}
c.                     <=> -ɘn    / __ {√OX}
d.                     <=> -z      / __

(7a) and (7b-d) implicate different types of computations—the former inserts an alphabet feature, the latter inserts vocabulary items—but I am supposing here that they can be put into competition. Under this alternative analysis, umlaut no longer requires a morphosyntactic context:

(8) Umlaut (variant 2): [+Back, +G] -> {-Back}

Beyond precedent, I do not see any reason to prefer analysis (5-6) over (7-8). Either can clearly derive what Lakoff called minor rules, though they differ in how exceptionality information is stored/propagated, and thus may have interesting consequences for how we relate the major/minor class distinction to theories of productivity. I have written enough for now, however, and I’ll have to return to that question and others another day.

Endnotes

  1. I too will refer to this language as Piro, as do Matteson and Kisseberth. It should not be confused with unrelated language known as Piro Pueblo. Some subsequent work on this phenomenon refer to the language as Yine (and say it “was previously known as Piro”), though I also found another source that says that Yine is simply a major variety of Piro. I have been unable to figure out whether there’s a preferred endonyms.
  2. I am not prepared to rule out the possibility that /xipa/ is itself an exception (“inalterable”), but all evidence is consistent with RTL application.
  3. In his endnote 2, K says the rule is even narrower than stated above, since it does not apply to monosyllabic roots. However, he might have failed to note that this condition is implicit in his rule, if we interpret (11) strictly as holding that the left context should be tautomorphemic. Piro requires syllables to be consonant-initial, so the minimal bisyllabic roots is CV.CV. Combining this observation with (1), we see that the shortest root which can undergo vowel deletion is also bisyllabic, since concatenating the left context and target gives us a bisyllabic VCV substring. In fact, things are more complicated because monosyllabic suffixes do undergo syncope; many examples are provided above. Clearly, the deleting vowel need not be tautomorphemic with the preceding vowel, contrary to what a strict reading of the “+” in (1) would seem to imply. According to González, syncope imposes no constraints on the morphological structure of its context except that it only applies in derived environments—CVCVCV trisyllables like /kanawa/ ‘canoe’ surfaces faithfully as [kanawa], not *[kanwa]—and is subject to lexical exceptionality discussed here. 
  4. K glosses this as ‘still, yet’.
  5. As was the case with /xipa/ in endnote 2, we’d like to confirm that /yona/ is mutable rather than inalterable, but one does not simply walk into Matteson 1965.

References

Embick, D. and Marantz, A. 2008. Architecture and blocking. Linguistic Inquiry 39(1): 1-53.
González, H. 2023. An evolutionary account of vowel syncope in Yine. Ms., CUNY Graduate Center.
Harms, R. T. 1968. Introduction to Phonological Theory. Prentice-Hall.
Kenstowicz, M. 1970. Lithuanian third person future. In J. R. Sadock and A. L. Vanek (ed.), Studies Presented to Robert B. Lees by His Students, pages 95-108. Linguistic Research.
Kenstowicz, M. and Kisseberth, C. W. 1977. Topics in Phonological Theory. Academic Press.
Kisseberth, C. W. 1970. The treatment of exceptions. Papers in Linguistics 2: 44-58.
Matteson, E. 1965. The Piro (Arawakan) Language. University of California Press.
Zonneveld, W. 1978. A Formal Theory of Exceptions in Generative Phonology. Peter de Ridder.
Zonneveld, W. 1979. On the failure of hasty phonology: A review of Michael Kenstowicz and Charles Kisseberth, Topics in Phonological TheoryLingua 47: 209-255.

Underspecification in Barrow Inupiaq

Dresher (2009:§7.2.1) discusses an interesting morphophonological puzzle from the Inuit (Canadian) and Inupiaq (Alaskan) dialects of Eskimo-Aleut. These dialects descend from a four-phoneme vowel system *i, *u, *ə, *a, but in most dialects *ə has merged into *i, yielding three surface vowels: [i, a, u]. However, in some dialects (including Barrow Inupiaq), there appears to be a covert contrast between two “flavors” of i: “strong i” triggers palatalization of a following coronal consonant whereas “weak i” does not.

(1) Barrow Inupiaq (Kaplan 1981:§3.22, his 27-29):

a. iglu ‘house’, iglulu ‘and a house’, iglunik ‘houses’
b. ini ‘place’, inilu ‘and a place’, ‘ininik’ places’
c. iki ‘wound’, ikiʎu ‘and a wound’, ikiɲik ‘wounds’

Presumably, the stem-final in (1b) is weak and the one in (1c) is strong.

Following some prior work, Dresher supposes that there is an underlying contrast between weak and strong i. He posits the following featural specification:1

(2) Features for Barrow Inupiaq vowels (to be revised):

strong i: [Coronal, -Low]
weak i: [-Low]
/u/: [Labial, -Low]
/a/: [+Low]

This analysis has a close relationship to the theory of underspecification used in Logical Substance-Free Phonology (henceforth, LP);  I assume familiarity with the assumptions and operations of that theory, which have been discussed at length by Reiss and colleagues (Bale et al. 2020, Reiss 2021), including in an introductory textbook (Bale & Reiss 2018). Just a few modifications are needed, however.

For Dresher, who hypothesizes that non-contrastive features (computed using an algorithm he describes in detail; op. cit.:16) are the only ones which are phonologically active, it is not clear why strong i palatalizes coronal segments: clearly it is not that they are spreading the privative [Coronal], since that certainly would not trigger palatalization! One could, of course, adopt an analysis in which palatalization is not assimilatory. Alternatively, we could identify another feature specification which is characteristic of i and which might trigger palatalization of coronal consonants. Let us suppose this is in fact just [Palatal].2 This gives us the following minimally-modified feature specification:

(2) Features for Barrow Inupiaq vowels (revised):

strong i: [Palatal, -Low]
weak i: [-Low]
/u/: [Labial, -Low]
/a/: [+Low]

According to Kaplan (§1.2), there are both plain and palatal coronal phonemes, so this seems to be a feature-changing process. Following the assumptions of LP that feature-changing processes derive from a deletion rule followed by an insertion rule, two rules are needed here; we give these below.

(3) [+Consonantal] \ {Coronal} / [Palatal] __
(4) [+Consonantal] ⊔ {Palatal} / [Palatal] __

Crucially, vowels other than strong i lack the [Palatal] specification to trigger (3-4).

Endnotes

  1. Dresher’s analysis assumes privative features, but he notes elsewhere in the book that he usually adopts the features of his sources unless there is some relevant reason to dispute them.
  2. If preferred, it is easy to translate the proposed analysis into one in which palatals are [-Back] and plain coronals are [+Back], à la Padgett (2003).

References

Bale, A., Papillon, M., and Reiss, C. 2014. Targeting underspecified segments: a formal analysis of feature-changing and feature-filling rules. Lingua 148: 240-253.
Bale, A., and Reiss, C. 2018. Phonology: a Formal Introduction. MIT Press.
Dresher, B. E. 2009. The Contrastive Hierarchy in Phonology. Cambridge University Press.
Kaplan, L. D. 1981. Phonological issues in North Alaskan Inupiaq. Alaska Native Language Center.
Padgett, J. 2003. Contrast and post-velar fronting in Russian. Natural Language and Linguistic Theory 21: 39-87.
Reiss, C. 2021. Towards a complete Logical Phonology model of intrasegmental changes. Glossa 6:107.

The phenomenology of assimilation

[This is adapted from part of paper I’m working on with Charles Reiss.]

Assimilation is a key notion for many phonological theories, and there are even intimations of it in the Prague School. Hyman provides an early formalization: he defines it as the insertion of a feature specification αF on a segment immediately adjacent to another segment specified αF.

(1) Assimilation schemata (after Hyman 1975: 159):
X → {αF} / __ [αF]
X → {αF} / [αF] __

In autosegmental phonology, assimilation is instead conceptualized as the sharing (rather than the “copying”) of feature specification via the insertion of association lines and phonological tiers provide a more general notion of adjacent, but the basic notion remains the same.

Substance-free phonology (SFP) also makes use of these “Greek letter” coefficients to express segmental identity (or non-identity) between features on various segments. What SFP denies is that there is any need to recognize or formalize notions like assimilation (or dissimilation) in the first place, because SFP rejects the notion of formal markedness. Yes, there are rules that cause an obstruent to agree in voicing with an obstruent to its immediate right, or which delete a glide between identical vowels, or which raise mid vowels before high vowels, and SFP can easily express such rules. However, there are also rules which raise mid vowels before low vowels, before nasals, or before a word boundary. The following principle expresses this position in general terms:

(2) Substance-freeness of structural change: featural specifications changed by rule application need not be present in the rule’s structural environment.

This principle is a claim that proposed phonological rules need not “make sense” in featural terms. It holds that the whatness of a rule, what feature is being added to a segment, is logically independent of the whereness, the triggering environment. This in turn echos Chomsky & Halle’s (1968:428) claim that “the phonological component requires wide latitude in the freedom to change features.” Note that principle (2) is not itself an axiom of SFP, but rather something which is not part of the theory.

References

Chomsky, N. and Halle, M. 1968. The Sound Pattern of English. Harper & Row.
Hyman, L. M. 1975. Phonology: Theory and Analysis. Holt, Rinehart and
Winston.

Lees on underspecification

I know very little about the life of American linguist Robert Lees but he dropped two bangers in the early 1960s: his 1960 book on English nominalizations is heavily cited, and his 1961 phonology of Turkish has a lot of great ideas. In this passage, he seems to presage (though not formally) the idea that underspecified segments do not form singleton natural classes, and he correctly note that that’s a feature, not a bug.
The rest of the details of vowel- and consonant-harmony we shall discuss later; but there is one unavoidable theoretical issue to be settled in connection with this contrast between borrowed and native lexicon. The solution we have proposed for Turkish vowels, namely that they be written “archiphonemically” in all contexts where gravity is predictable by the usual progressive assimilation rules of harmony but be split into grave/acute pairs of “phonemes” elsewhere, will have the following important theoretical consequence. The phonetic rule which ensures the insertion of a “plus” or a “minus” sense for the gravity feature under harmonic assimilation must distinguish between occurrences of columns of features in which gravity is unspecified, as in the case of the “morphophoneme” /E/, from occurrences of the otherwise identical columns of features in the utterance being generated in which gravity has already been specified in a lexical rule, as in the corresponding cases of the “phonemes” /e/ and /a/, for now both the morphophoneme /E/ and the phonemes /e/  and /a/ occur simultaneously in the transcriptions. The gravity rule is intended to apply to /E/, not to /a/ or /e/. But this is tantamount to entering, as it were, a “zero” into the feature table of /E/ to distinguish it from the columns for /a/ and /e/, in which the feature of gravity has already been determined, or perhaps from other columns in which there is simply no relevant indication of this feature.
Thus the system of phonetic decisions will have been rendered trinary rather than the customary binary. The objection to this result is not based on a predilection for binary features, though there are good reasons to prefer a binary system. Rather, it arises because in a system of phonological decisions in which rules may distinguish between columns of binary features differing solely in the presence of absence of a zero for some feature one may also ipso facto always introduce vacuous reductions or simplifications without any empirical knowledge of the phonetic facts.
As a brief illustration of such an empty simplification, we might note that if a rule be permitted in English phonology which distinguishes between the features of /p/ and /b/ on one hand and on the other, the set of these same features with the exception that voice is unspecified (a set which we shall designate by means of the “archiphonemic” symbol /B/), then we could easily eliminate from English phonology, without knowing anything about English pronunciation, the otherwise relevant feature of Voice from all occurrences of either /p/ or /b/, or in fact from all occurrences of any voiced stop. Clearly, if a rule could distinguish /B/ from /p/ by the presence of zero in the voice-feature position, then that feature can be restored to occurrences of /b/ automatically and is thus rendered redundant. The same could then be done for inumerable [sic] other features with no empirical justification required.
Thus, we must assume that any rule which applies to a column of features like /B/ also at the same time applies to every other type of column which contains that same combination of features, such as /p/ and /b/. This is tantamount to imposing the constraint on phonological features that they never be required to identify unspecified, or zero, features. To the best of our present knowledge, there seems to be no other reasonable way to prevent the awkward consequences mentioned above.
To return to Turkish this decision means that the grammar is incapable of distinguishing native vowel-harmonic morphemes from borrowed non-vowel-harmonic morphemes simply be the presence of the archiphoneme /E/ in the former versus /e/ or /a/ in the latter. (Lees 1961: 12-14)

References

Lees, R. B. 1961. The Phonology of Modern Standard Turkish. Indiana University Press.

SPE & Lakoff on exceptionality

[Minor correction: after rereading Zonneveld 1978, I think Lakoff misrepresents the SPE theory slightly, and I repeated his mispresentation in what I wrote. Lakoff writes that the SPE theory could have phonological rules that introduce minus-rule features. In fact C&H say (374-5) that they have found no compelling examples of such rules and that they “propose, tentatively” that such rules “not be permitted in the phonology”; any such rules must be readjustment rules, which are assumed to precede all phonological rules. This means that (4-5) are probably ruled out. Lakoff’s mistake may reflect the fact that the 1970 book is a lightly-adapted version of his 1965 dissertation, for which he drew on a pre-publication version of SPE.]

Recently I have attempted to review and synthesize different theories of what we might call lexical (or morpholexical or morpheme-specific) exceptionality. I am deliberately ignoring accounts that take this to be a property of segments via underspecification (or in a few cases, pre-specification, usually of prosodic-metrical elements like timing slots or moras), since I have my own take on that sort of thing under review now. Some takeaways from my reading thus far:

  • This is an understudied and undertheorized topic.
  • At the same time, it seems at least possible that some of these theories are basically equivalent.
  • Exceptionality and its theories play only a minor role in adjudicating between competing theories of phonological or morphological representation, despite their obvious relevance.
  • Also despite their obvious relevance, theories of exceptionality make little contact with theories of productivity and defectivity.

Since most of the readings are quite old, I will include PDF links when I have a digital copy available.

Today, I’m going to start off with Chomsky & Halle’s (1968) Sound Pattern of English (SPE), which has two passages dealing with exceptionality: §4.2.2 and §8.7. While I attempt to summarize these two passages as if they are one, they are not fully consistent with one another and I suspect they may have been written at different times or by different authors. Furthermore, it seemed natural for me to address, in this same post, some minor revisions proposed by Lakoff (1970: ch. 2). Lakoff’s book is largely about syntactic exceptionality, but the second chapter, in just six pages, provides important revisions to the SPE system. I myself have also taken some liberties filling in missing details.

Chomsky & Halle give a few examples of what they have in mind when they mention exceptionality. There is in English a rule which laxes vowels before consonant clusters, as in convene/conven+tion or (more fancifully) wide/wid+th. However, this generally does not occur when the consonant cluster is split by a “#” boundary, as in restrain#t.1 The second, and more famous, example involves the trisyllabic shortening of the sort triggered by the -ity suffix. Here laxing also occurs (e.g., sereneseren+ityobsceneobscen+ity) though not in the backformation obese-obesity.2 As Lakoff (loc. cit.:13) writes of this example, “[n]o other fact about obese is correlated to the fact that it does not undergo this rule. It is simply an isolated fact.” Note that both of these examples involve underapplication, and the latter passage gives more obesity-like examples from Lightner’s phonology of Russian, where one rule applies only to “Russian” roots and another only to “Church Slavonic” roots.

SPE supposes that by default, that there is a feature associated with each rule. So, for instance, if there is a rule R there exists a feature [±R] as well. A later passage likens these to features for syntactic category (e.g., [+Noun]), intrinsic morpho-semantic properties like animacy, declension or conjugation class features, and the lexical strata features introduced by Lees or Lightner in their grammars of Turkish and Russian. SPE imagine that URs may bear values for [R]. The conventions are then:

(1) Convention 1: If a UR is not specified [-R], introduce [+R] via redundancy rule.
(2) Convention 2: If a UR is [αR], propagate feature specification [αR] to each of its segments via redundancy rule.
(3) Convention 3: A rule R does not apply to segments which are [-R].

Returning to our two examples above, SPE proposes that obese is underlylingly [−Trisyllabic Shortening], which accounts for the lack of shortening in obesity. They also propose rules which insert these minus-rule features in the course of the derivation; for instance, it seems they imagine that the absence of laxing in restraint is the result of a rule like V → {−Laxing} / _ C#C, with a phonetic-morphological context.

Subsequent work in the theory of exceptionality has mostly considered cases like obesity, the rule features are present underlyingly but with one exception, discussed below, the restraint-type analysis, in which rule features are introduced during the derivation, do not seem to have been further studied. It seems to me that the possibility of introducing minus-rule features to a certain phonetic context could be used to derive a rule that applies to unnatural classes. For example, imagine an English rule (call it Tensing) which tenses a vowel in the context of anterior nasals {m, n} and the voiceless fricatives {f, θ, s, ʃ} but not voiced fricatives like {v, ð}.3 Under any conventional feature system, there is no natural class which includes {m, n, f, θ, s, ʃ} but not also {ŋ, v}, etc. However, one could derive the desired disjunctive effect by introducing a -Tensing specification when the vowel is followed by a dorsal, or by a voiced fricative. This might look something like this:

(4) No Tensing 1: [+Vocalic] → {−Tensing} / _ [+Dorsal]
(5) No Tensing 2: [+Vocalic] → {−Tensing} / _ [-Voice, +Obstruent, +Continuant]

This could continue for a while. For instance, I implied that Tensing does not apply before a stop so we could insert a -Tensing  specification when the following segment is [+Obstruent, -Continuant], or we could do something similar with a following oral sonorant, and so on. Then, the actual Tensing rule would need little (or even no) phonetic conditioning.

To put it in other words, these rules allow the rule to apply to a set of segments which cannot be formed conjunctively from features, but can be formed via set difference.4 Is this undesirable? Is it logically distinct from the desirable “occluding” effect of bleeding in regular plural and past tense allomorphy in English (see Volonec & Reiss 2020:28f.)? I don’t know. The latter SPE passage seems to suggest this is undesirable: “…we have not found any convincing example to demonstrate the need for such rules [like my (4-5)–KBG]. Therefore we propose, tentatively, that rules such as [(4-5)], with the great increase in descriptive power that they provide, not be permitted in the phonology.” (loc cit.:375). They propose instead that only readjustment rules should be permitted to introduce rule features; otherwise rule feature specifications must be underlyingly present or introduced via redundancy rule.

As far as I can see, SPE does not give any detailed examples in which rule feature specifications are introduced via rule. Lakoff however does argue for this device. There are rules which seem to apply to only a subset of possible contexts; one example given are the umlaut-type plurals in English like footfeet or goosegeese. Later in the book (loc. cit./, 126, fn. 59) the rules which generate such pairs are referred to these as minor rules. Let us call the English umlauting rule simply Umlaut. Lakoff notes that if one simply applies the above conventions naïvely, it will be necessary to mark a huge number of nouns—at the very least, all nouns which have a [+Back] monophthong in the final syllable and which form a non-umlauting plural—as [-Umlaut]. This, as Lakoff notes, would wreck havoc on the feature counting evaluation metric (see §8.1), and would treat what we intuitively recognize as exceptionality (forming an umlauting plural in English) as “more valued” than non-exceptionality. Even if one does not necessarily subscribe to the SPE evaluation metric, one may still feel that this has failed to truly encode the productivity distinction between minor rules and major rules that have exceptions. To address this, Lakoff proposes there is another rule which introduces [Umlaut], and that this rule (call it No Umlaut) applies immediately before Umlaut. Morphemes which actually undergo Umlaut are underlying -No Umlaut. Thus the UR of an noun with an umlauting plural, like foot, will be specified [No Umlaut], and this will not undergo a rule like the following:

(6) No Umlaut: [ ]  → {Umlaut}

However, a noun with a regular plural, like juice, will undergo this rule and thus the umlauting rule U will not apply to it because it was marked [-U] by (6).

One critique is in order here. It is not clear to me why SPE introduces (what I have called) Convention 2; Lakoff simply ignores it and proposes an alternative version of Convention 3 where target morphemes, rather than segments, must be [+R] to undergo rule R. Of his proposal, he writes: “This system makes the claim that exceptions to phonological rules are morphemic in nature, rather than segmental.” (loc. cit., 18) This claim, while not necessarily its 1970-era implementation, is very much in vogue today. There are some reasons to think that Convention 2 introduces unnecessary complexities, which I’ll discuss in a subsequent post. One example (SPE:374) makes it clear that for Chomsky & Halle, Convention 3 requires that that for rule R the target be [+R], but later on, they briefly consider what if anything happens if any segments in the environment (i.e., structural change) are [-R].5 They claim (loc. cit., 375) there are problems with allowing [-R] specifications in the environment to block application of R, but give no examples. To me, this seems like an issue created by Convention 2, when one could simply reject it and keep the rule features at the morpheme level.

[This post, then, is the first in a series on theories of lexical exceptionality.]

Endnotes

  1. The modern linguist would probably not regard words like restraint as  subject to this rule at all. Rather, they would probably assign #t to the “word” stratum (equivalent to the earlier “Level 2”) and place the shortening rule in the “stem” stratum (roughly equivalent to “Level 1”). Arguably, C&H have stated this rule more broadly than strictly necessary to make the point.
  2. It is said that the exceptionality of this pair reflects its etymology: obese was backformed from the earlier obesity. I don’t really see how this explains anything synchronically, though.
  3. This is roughly the context in which Philadelphia short-a is tense, though the following consonant must be tautosyllabic and tautomorphemic with the vowel. Philadelphia short-a is, however, not a great example since it’s not at all clear to me that short-a tensing is a synchronic process.
  4. Formally, the set in question is something like [−Dorsal] ∖ [+Voice, +Consonantal, +Continuant, −Nasal].
  5. This issue is taken up in more detail by Kisseberth (1970); I’ll review his proposal in a subsequent post.

References

Chomsky, N. and Halle, M. 1968. Sound Pattern of English. Harper & Row.
Kisseberth, C. W. 1970. The treatment of exceptions. Papers in Linguistics 2: 44-58.
Lakoff, G. 1970. Irregularity in Syntax. Holt, Rinehart and Winston.

Representation vs. explanation?

I have often wondered whether detailed representational formalism is somehow in conflict with genuine explanation in linguistics. I have been tangentially involved in the cottage industry that is applying the Tolerance Principle (Yang 2005, 2016) to linguistic phenomena, most notably morphological defectivity. In our paper on the subject (Gorman & Yang 2019), we are admittedly somewhat nonchalant about the representations in question, a nonchalance which is, frankly, characteristic of this microgenre.

In my opinion, however, our treatment of Polish defectivity is representationally elegant. (See here for a summary of the data.) In this language, fused case/number suffixes show suppletion based on the gender—in the masculine, animacy—of the stem, and there is lexically conditioned suppletion between -a and -u, the two allomorphs of the gen.sg. for masculine inanimate nouns. To derive defectivity, all we need to show is that Tolerance predicts that, in the masculine inanimate, there is no default suffix to realize the gen.sg. If there are two realization rules in competition, we can implement this by making both of them lexically conditioned, and leaving nouns which are defective in the gen.sg. off both lexical “lists”. We can even imagine, in theories with late insertion, that the grammatical crash is the result of uninterpretable gen.sg. features which are, in defective nouns, still present at LF.1

It is useful to contrast this with our less-elegatn treatment of Spanish defectivity in the same paper. (See here for a summary of the data.) There we assume that there is some kind of grammatical competition for verbal stems between the rules that might be summarized as “diphthongize a stem vowel when stresssed” and “do not change”. We group the two types of diphthongization (o to ue [we] and to ie [je]) as a single change, even though it is not trivial to make these into a single change.2 This much at least has a venerable precedent, but what does it mean to treat diphthongization as a rule in the first place? The same tradition tends to treat the propensity to diphthongize as a phonological (i.e., perhaps via underspecification or prespecification, à la Harris 1985) or morphophonological property of the stem (a lexical diacritic à la Harris 1969, or competition between pseudo-suppletive stems à la Bermúdez-Otero 2013), and the phonological contents of a stem is presumably stored in the lexicon, and not generated by any sort of rule.3 Rather, our Tolerance analysis seems to imply we have thrown in our lots with Albright and colleagues (Albright et al. 2001, Albright 2003) and Bybee & Pardo (1981), who analyze diphthongization as a purely phonological rule depending solely on the surface shape of the stem. This is despite the fact that we are bitterly critical of these authors for other reasons4 and I would have preferred—aesthetically at least—to adopt an analysis where diphthongization is a latent property of particular stems.

At this point, I could say, perhaps, that the data—combined with our theoretical conception of the stem inventory portion of the lexicon as a non-generative system—is trying to tell me something about Spanish diphthongization, namely that Albright, Bybee, and colleagues are onto something, representationally speaking. But, compared with our analysis of Polish, it is not clear how these surface-oriented theories of diphthongization might generate grammatical crash. Abstracting from the details, Albright (2003) imagines that there are a series of competing rules for diphthongization, whose “strength” derives from the number of exemplars they cover. In his theory, the “best” rule can fail to apply if its strength is too low, but he does not propose any particular threshold and as we show in our paper, his notion of strength is poorly correlated with the actual gaps. Is it possible our analysis is onto something if Albright, Bybee, and colleagues are wrong about the representational basis for Spanish diphthongization?

Endnotes

  1. This case may still be a problem for Optimality Theory-style approaches to morphology, since Gen must produce some surface form.
  2. I don’t have the citation in front of me right now, but I believe J. Harris originally proposed that the two forms of diphthongization can be united insofar as both of them can be modeled as insertion of e triggering glide formation of the preceding mid vowel.
  3. For the same reason, I don’t understand what morpheme structure constraints are supposed to do exactly. Imagine, fancifully, that you had a mini-stroke and the lesion it caused damaged your grammar’s morpheme structure rule #3. How would anyone know? Presumably, you don’t have any lexical entries which violate MSC #3, and adults generally does not make up new lexical entries for the heck of it.
  4. These have to do with what we perceive as the poor quality of their experimental evidence, to be fair, not their analyses.

References

Albright, A., Andrade, A., and Hayes, B. 2001. Segmental environments of Spanish diphthongization. UCLA Working Papers in Linguistics 7: 117-151.
Albright, A. 2003. A quantitative study of Spanish paradigm gaps. In Proceedings of the 22th West Coast Conference on Formal Linguistics, pages 1-14.
Bermúdez-Otero, R. The Spanish lexicon stores stems with theme vowels, not roots with inflectional class features. Probus 25: 3-103.
Bybee, J. L. and Pardo, E. 1981. On lexical and morphological conditioning of alternations: a nonce-prob e experiment with Spanish verbs. Linguistics 19: 937-968.
Gorman,. K. and Yang, C. 2019. When nobody wins. In F. Rainer, F. Gardani, H. C. Luschützky and W. U. Dressler (ed.), Competition in Inflection and Word Formation, pages 169-193. Springer.
Harris, J. W. 1969. Spanish Phonology. MIT Press.
Harris, J. W. 1985. Spanish diphthongisation and stress: a paradox resolved. Phonology 2: 31-45.

Another quote from Ludlow

Indeed, when we look at other sciences, in nearly every case, the best theory is arguably not the one that reduces the number of components from four to three, but rather the theory that allows for the simplest calculations and greatest ease of use. This flies in the face of the standard stories we are told about the history of science. […] This way of viewing simplicity requires a shift in our thinking. It requires that we see simplicity criteria as having not so much to do with the natural properties of the world, as they have to do with the limits of us as investigators, and with the kinds of theories that simplify the arduous task of scientific theorizing for us. This is not to say that we cannot be scientific realists; we may very well suppose that our scientific theories approximate the actual structure of reality. It is to say, however, that barring some argument that “reality” is simple, or eschews machinery, etc., we cannot suppose that there is a genuine notion of simplicity apart from the notion of “simple for us to use.” […] Even if, for metaphysical reasons, we supposed that reality must be fundamentally simple, every science (with the possible exception of physics) is so far from closing the book on its domain it would be silly to think that simplicity (in the absolute sense) must govern our theories on the way to completion. Whitehead (1955, 163) underlined just such a point.

Nature appears as a complex system whose factors are dimly discerned by us. But, as I ask you, Is not this the very truth? Should we not distrust the jaunty assurance with which every age prides itself that it at last has hit upon the ultimate concepts in which all that happens can be formulated. The aim of science is to seek the simplest explanations of complex facts. We are apt to fall into the error of thinking that the facts are simple because simplicity is the goal of our quest. The guiding motto in the life of every natural philosopher should be, Seek simplicity and distrust it.

(Ludlow 2011:158-160)

References

Ludlow, P. 2011. The Philosophy of Generative Grammar. Oxford University Press.
Whitehead, W. N. 1955. The Concept of Nature. Cambridge University Press.

Entrenched facts

Berko’s (1958) wug-test is a standard part of the phonologist’s  toolkit. If you’re not sure if a pattern is productive, why not ask whether speakers extend it to nonce words? It makes sense; it has good face validity. However, I increasingly see linguists who think that the results of wug-tests actually trumps contradictory evidence coming from traditional phonological analysis applied to real words. I respectfully disagree. 

Consider for example a proposal by Sanders (2003, 2006). He demonstrates that an alternation in Polish (somewhat imprecisely called o-raising) is not applied to nonce words. From this he takes o-raising to be handled via stem suppletion. He asks, and answers, the very question you may have on your mind. (Note that his here is the OT constraint hierarchy; you may want to read it as grammar.)

Is phonology obsolete?! No! We still need a phonological H to explain how nonce forms conform to phonotactics. We still need a phonological H to explain sound change. And we may still need H to do more with morphology than simply allow extant (memorized) morphemes to trump nonce forms. (Sanders 2006:10)1

I read a sort of nihilism into this quotation. However, I submit that the fact that 50 million people just speak Polish—and “raise” and “lower” their ó‘s with a high degree of consistency across contexts, lexemes, and so on—is a more entrenched fact than the results of a small nonce word elicitation task. I am not saying that Sander’s results are wrong, or even misleading, just that his theory has escalated the importance of these results to the point where it has almost nothing to say about the very interesting fact that the genitive singular of lód [lut] ‘ice’ is lodu [lɔdu] and not *[ludu], and that tens of millions of people agree.

Endnotes

  1. Sanders’ 2006 manuscript is a handout but apparently it’s a summary of his 2003 dissertation (Sanders 2003), stripped of some phonetic-interface details not germane to the question at hand. I just mention so that it doesn’t look like I’m picking on a rando. Those familiar with my work will probably guess that I disagree with just about everything in this quotation, but kudos to Sanders for saying something interesting enought to disagree with.

References

Berko, J. 1958. The child’s learning of English morphology. Word 14: 150-177.
Sanders, N. 2003. Opacity and sound change in the Polish lexicon. Doctoral dissertation, University of California, Santa Cruz.
Sanders, N. 2006. Strong lexicon optimization. Ms., Williams College and University of Massachusetts, Amherst.

Why binarity is probably right

Consider the following passage, about phonological features:

I have not seen any convicing justification for the doctrine that all features must be underlyingly binary rather than ternary, quaternary, etc. The proponents of the doctrine often realize it needs defending, but the calibre of the defense is not unfairly represented by the subordinary clause devoted to the subject in SPE (297): ‘for the natural way of indicating whether or not an item belongs to a particular category is by means of binary features.’ The restriction to two underlying specifications creates problems and solves none. (Sommerstein 1977: 109)

Similarly, I had a recent conversation by someone who insisted certain English multi-object constructions in syntax are better handled by assuming the possibility of ternary branching.

I disagree with Sommerstein, though: a logical defense of the assumption of binarity—both for the specification of phonological feature polarity and for the arity of syntactic trees—is so obvious that it fits on a single page. Roughly: 1) less than two is not enough, and; 2) two is enough.

Less than two is not enough. This much should be obvious: theories in which features only have one value, or syntactic constituents cannot dominate more than one element, have no expressive power whatsover.1,2

Two is enough. Every time we might desire to use a ternary feature polarity, or a ternary branching non-terminal, there exists a weakly equivalent specification which uses binary polarity or binary branching, respectively, and more features or non-terminals. It is then up to the analyst to determine whether or not they are happy with the natural classes and/or constituents obtained, but this possibility is always available to the analyst. One opposed to the this strategy has a duty to say why the hypothesized features or non-terminals are wrong.

Endnotes

  1. It is important to note in this regard that privative approaches to feature theory (as developed by Trubetzkoy and disciples) are themselves special cases of the binary hypothesis which happen to treat absence as a non-referable. For instance, if we treat the set of nasals as a natural class (specified [Nasal]) but deny the existence of the (admittedly rather diverse) natural class [−Nasal]—and if we further insist rules be defined in terms of natural classes, and deny the possibility of disjunctive specification—we are still working in a binary setting, we just have added an additional stipulation that negated features cannot be referred to by rules.
  2. I put aside the issue of cumulativity of stress—a common critique in the early days—since nobody believes this is done by feature in 2023.

References

Sommerstein, A. 1977. Modern Phonology. Edward Arnold.