On conspiracies

Kisseberth (1970) introduces the notion of conspiracies, cases in which a series of phonological rules in a single language “conspire” to create similar output configurations. Supposedly, Haj Ross chose the term “conspiracy”, and it is perhaps not an accident that the term he chose immediately reminds one of conspiracy theory, which has a strong negative connotation implying that the existence of the conspiracy cannot be proven. Kisseberth’s discovery of conspiracies motivated the rise of Optimality Theory (OT) two decades later—Prince & Smolensky (1993:1) refer to conspiracies as a “conceptual crisis” at the heart of phonological theory, and Zuraw (2003) explicitly links Kisseberth’s data to OT—but curiously, it seemingly had little effect on contemporary phonological theorizing. (A positivist might say that the theoretical technology needed to encode conspiratorial thinking simply did not exist at the time; a cynic might say that contemporaries did not take Kisseberth’s conspiratorial thinking seriously until it became easy to do so.) I discern two major objections to the logic of conspiracies: the evolutionary argument and the prosodic argument, which I’ll briefly review.

The evolutionary argument

What I am calling the evolutionary argument was first made by Kiparsky (1973:75f.) and is presented as an argument against OT by Hale & Reiss (2008:14). Roughly, if a series of rules lead to the same set of output configurations, they must be surface true, or they would not contribute to the putative conspiracy. Since surface-true rules are assumed to be easy to learn, especially relative to opaque rules are assumed to be difficult to learn, and since failure to learn rules would contribute to language change, grammars will naturally accumulate functionally related surface-true rules. I think we should question the assumption (au courant in 1973) that opacity is the end-all of what makes a rule difficult to acquire, but otherwise I find this basic logic sound.

The prosodic argument

At the time Kisseberth was writing, standard phonological theory included few of the prosodic primitives; even the notion of syllable was considered dubious. Subsequent revisions of the theory have introduced rich hierarchies of prosodic primitives. In particular, a subsequent generation of phonologists hypothesized that speakers “build” or “parse” sequences of segments into onsets and rimes, syllables, and feet, with repairs like stray erasure, i.e., deletion, of unsyllabified segmental or epenthesis used to resolve conflicts (McCarthy 1979, Steriade 1982, Itô 1986). It seems to me that this approach accounts for most of the facts of Yowlumne (formerly Yawelmani) reviewed by Kisseberth in his study:

  1. there are no word-initial CC clusters
  2. there are no word-final CC clusters
  3. derived CCCs are resolved either by deletion or i-epenthesis
  4. there are no CCC clusters in underlying form

The relevant observation that links all these facts is simply that Yowlumne does not permit branching onsets or codas, but more specifically, Yowlumne’s syllable-parsing algorithm does not build branching onsets or codas. This immediately accounts for facts #1-2. Assuming the logic of the McCarthy and contemporaries, #3 is also unsurprising: these clusters simply cannot be realized faithfully; the fact that there are multiple resolutions for the *CCC pathology is besides the point. And finally, adopting the logic that Prince & Smolensky (1993:54) were later to call Stampean occultation, the absence of underlying CCC clusters follows from the inability of them to surface, since the generalizations in question are all surface-true. (Here, we are treading closely to Kiparsky’s thoughts on the matter too.) Crucially, the analysis given above does not reify any surface constraints; the facts all follow from the feed-forward derivational structure of prosodically-informed phonological theory current a decade before Prince & Smolensky.

Conclusion

While Prince & Smolensky are right to say that OT provides a principled solution to Kisseberth’s notion of conspiracies, researchers in the ’70s and ’80s treated Kisseberth’s notion as epiphenomena of acquisition (Kiparsky) or prosodic structure-building (McCarthy and contemporaries). Perhaps, then, OT do not deserve credit for solving an unsolved problem in this regard. Of course, it remains to be seen whether the many implicit conjectures in these two objections can be sustained.

References

Hale, M. and Reiss, C. 2008. The Phonological Enterprise. Oxford University Press.
Kiparsky, P. 1973. Phonological representations. In O. Fujimura (ed.), Three Dimensions of Linguistic Theory, pages 1-135. TEC Corporation.
Kisseberth, C. W. 1970. On the functional unity of phonological rules. Linguistic Inquiry 1(3): 291-306.
Itô, J. 1986. Syllable theory in prosodic phonology. Doctoral dissertation, University of Massachusetts, Amherst. Published by Garland Publishers, 1988.
McCarthy, J. 1979. Formal problems in Semitic phonology and morphology. Doctoral dissertation, MIT. Published by Garland Publishers, 1985.
Prince, A., and Smolensky, P. 1993. Optimality Theory: constraint interaction in generative grammar. Rutgers Center for Cognitive Science Technical Report TR-2.
Steriade, D. 1982. Greek prosodies and the Nature of syllabification. Doctoral dissertation, MIT.
Zuraw, K. 2003. Optimality Theory in linguistics. In M. Arbib (ed.), Handbook of Brain Theory and Neural Networks, pages 819-822. 2nd edition. MIT Press.

On the Germanic *tl gap

One “parochial” constraint in Germanic is the absence of branching onsets consisting of a coronal stop followed by /l/. Thus /pl, bl, kl, gl/ are all common in Germanic, but *tl and *dl are not. It is difficult to understand what might gives rise to this phonotactic gap.

Blevins & Grawunder (2009), henceforth B&G, note that in portions of Saxony and points south, *kl has in fact shifted to [tl] and *gl to [dl]. This sound change has been noted in passing by several linguists, going back to at least the 19th century. This change has the hallmarks of a change from below: it does not appear to be subject to social evaluation and is not subject to “correction” in careful speech styles. B&G also note that many varieties of English have undergone this change; according to Wright, it could be found in parts of east Yorkshire. Similarly, no social stigma seems to have attached to this pronunciation, and B&G suggest it may have even made its way into American English. B&G argue that since it has occurred at least twice, KL > TL is a natural sound change in the relevant sense.

Of particular interest to me is B&G’s claim that one structural factor supporting *KL > TL is the absence of TL in Germanic before this change; in all known instances of *KL > TL, the preceding stage of the language lacked (contrastive) TL. While many linguists have argued that TL is universally marked, and that its absence in Germanic is a structural gap in the relevant sense, this does not seem to be borne out by quantitative typology of a wide range of language families.

Of course, other phonotactic gaps, even statistically robust ones, also are similarly filled with ease. I submit that evidence of this sort suggests that phonologists habitually overestimate the “structural” nature of phonotactic gaps.

References

Blevins, J. and Grawunder, S. 2009. *KL > TL sound change in Germanic and elsewhere: descriptions, explanations, and implications. Linguistic Typology 13: 267-303.

The role of phonotactics in language change

How does phonotactic knowledge influence the path taken by language change? As is often the case, the null hypothesis seems to be simply that it doesn’t. Perhaps speakers have projected a phonotactic constraint C into the grammar of Old English, but that doesn’t necessarily mean that Middle English will conform to C, or even that Middle English won’t freely borrow words that flagrantly violate C.

One case comes from the history of English. As is well known, modern English /ʃ/ descends from Old English sk; modern instances of word-initial sk are mostly borrowed from Dutch (e.g., skipper) or Norse (e.g., ski); sky was borrowed from an Old Norse word meaning ‘cloud’ (which tells you a lot about the weather in the Danelaw). Furthermore, Old English forbids super-heavy long vowel-consonant cluster rimes. Because the one major source for /ʃ/ is sk, and because a word-final long vowel followed by sk was unheard of, V̄ʃ# was rare in Middle English and word-final sequences of tense vowels followed by [ʃ] are still rare in Modern English (Iverson & Salmons 2005). Of course there are exceptions, but according to Iverson & Salmons, they tend to:

  • be markedly foreign (e.g., cartouche),
  • to be proper names (e.g., LaRouche),
  • or to convey an “affective, onomatopoeic quality” (e.g., sheesh, woosh).

However, it is reasonably clear that all of these were added during the Middle or Modern period. Clearly, this constraint, which is still statistically robust (Gorman 2014:85), did not prevent speakers from borrowing and coining exceptions to it. However, it is hard to  rule out any historical effect of the constraint: perhaps there would be more Modern English V̄ʃ# words otherwise.

Another case of interest comes from Latin. As is well known Old Latin went through a near-exceptionless “Neogrammarian” sound change, a “primary split” or “conditioned merge” of intervocalic s with r. (The terminus ante quem, i.e., the latest possible date, for the actuation of this change is the 4th c. BCE.) This change had the effect of temporarily eliminating all traces of intervocalic in late Old Latin (Gorman 2014b). From this fact, one might posit that speakers of this era of Latin might project a *VsV constraint. And, one might posit that this would prevent subsequent sound changes from reintroducing intervocalic s. But this is clearly not the case: in the 1st c. BCE, degemination of ss after diphthongs and long monophthongs reintroduced intervocalic s (e.g., caussa > classical causa ’cause’). It is also clear that loanwords with intervocalic s were freely borrowed, and with the exception of the very early Greek borrowing tūs-tūris ‘incense’, none of them were adapted in any way to conform to a putative *VsV constraint:

(1) Greek loanwords: ambrosia ‘id.’, *asōtus ‘libertine’ (acc.sg. asōtum), basis ‘pedestal’, basilica ‘public hall’, casia ‘cinnamon’ (cf. cassia), cerasus ‘cherry’, gausapa ‘woolen cloth’, lasanum ‘cooking utensil’, nausea ‘id.’, pausa ‘pause’, philosophus ‘philosopher’, poēsis ‘poetry’, sarīsa ‘lance’, seselis ‘seseli’
(2) Celtic loanwords: gaesī ‘javelins’, omāsum ‘tripe’
(3) Germanic loanwords: glaesum ‘amber’, bisōntes ‘wild oxen’

References

Gorman, K. 2014a. A program for phonotactic theory. In Proceedings of the 47th Annual Meeting of the Chicago Linguistic Society, pages 79-93.
Gorman, K. 2014b. Exceptions to rhotacism, In Proceedings of the 48th Annual Meeting of the Chicago Linguistic Society, pages 279-293.
Iverson, G. K. and Salmons, J. C. 2005. Filling the gap: English tense vowel plus final
/š/. Journal of English Linguistics 33: 1-15.

Allophones and pure allophones

I assume you know what an allophone is. But what this blog post supposes […beat…] is that you could be more careful about how you talk about them.

Let us suppose the following:

  • the phonemic inventory of some grammar G contains t and d
  • does not contain s or z
  • yet instances of s or z are found on the surface

Thus we might say that /t, d/ are phonemes and [s, z] are allophones (perhaps of /t, d/: maybe in G, derived coronal stop clusters undergo assibilation).

Let us suppose that you’re writing the introduction to a phonological analysis of G, and in Table 1—it’s usually Table 1—you list the phonemes you posit, sorted by place and manner. Perhaps you will place s and in italics or brackets, and the caption will indicate that this refers to segments which are allophones.

I find this imprecise. It suggests that all instances of surface t or d are phonemic (or perhaps more precisely, and more vacuously, are faithful allophones),1 which need not be the case. Perhaps G has a rule of perseveratory obstruent cluster voice assimilation and one can derive surface [pt] from /…p-d…/, or surface [gd] from /…g-t…/, and so on. The confusion here seems to be that we are implicitly treating the sets of allophones and phonemes are disjoint when the former is a superset of the latter. What we seem to actually mean when we say that [s, z] are allophones is rather that they are pure allophones: allophones which are not also phonemes.

Another possible way to clarify the hypothetical table 1 is to simply state what phonemes and z are allophones of, exactly. For instance, if they are purely derived by assibilation, we might write that “the stridents s, z are (pure) allophones of the associated coronal stops /t, d/ respectively”. However, since this might be besides the point, and because there’s no principled upper bound on how many phonemic sources a given (pure or otherwise) allophone might have, I think it should suffice to suggest that s and z are pure allophones and leave it at that.2

This imprecision, I suspect, is a hang-over from structuralist phonemics, which viewed allophony as separate (and arguably, more privileged or entrenched) than alternations (then called morphophonemics). Of course, this assumption does not appear to have any compelling justification, and as Halle (1959) shows, it leads to substantial duplication (in the sense of Kisseberth 1970) between rules of allophony and rules of neutralization.3 Most linguists since Halle seem to have found the structuralist stipulation and the duplication it gives rise to aesthetically displeasing; I concur.

Endnotes

  1. I leave open the question of whether surface representations ever contain phonemes: perhaps vacuous rules “faithfully” convert them to allophones.
  2. One could (and perhaps should) go further into feature logic, and as such, regard both phonemes and pure allophones as mere bundles of features linked to a single timing slot. However, this makes things harder to talk about.
  3. I do not assume that “neutralization” is a grammatical primitive. It is easily defined (see Bale & Reiss 2017, ch. 20) but I see no reason to suppose that grammars distinguish neutralizing processes from other processes.

References

Bale, A. and Reiss, C. 2018. Phonology: A Formal Introduction. MIT Press.
Halle, M. 1959. Sound Pattern of Russian. Mouton.
Kisseberth, C. W. 1970. On the functional unity of phonological rules. Linguistic Inquiry 1(3): 291-306.

The alternation phonotactic hypothesis

The hypothesis

In a recent handout, I discuss the following hypothesis, implicit in my dissertation (Gorman 2013):

(1) Alternation Phonotactic Hypothesis: Let ABC, and D be (possibly-null) string sets. Then, if a grammar G contains a surface-true rule of alternation A → B / C __ D, nonce words containing the subsequence CAD are ill-formed for speakers of G.

Before I continue, note that definition is “phenomenological” in the sense that refers to two notions—alternations and surface-true-ness—which are not generally considered to be encoded directly in the grammar. Regarding the notion of alternations, it is not difficult to formalize whether or not a rule is alternating.

(2) Let a rule be defined by possibly-null string sets A, B, C, and D as in (1). Then if any elements of B are phonemes, then the rule is a rule of alternation.

(3) [ditto] If no elements of B are phonemes, then the rule is a rule of (pure) allophony.

But from the argument against bi-uniqueness in Sound Pattern of Russian (Halle 1959), it follows that we should reject a grammar-internal distinction between rules of alternation and allophony, and subsequent theory provides no way to encode this distinction in the grammar. Similarly, it is not hard to define what it means for a rule to be surface-true.

(4) [ditto] If no instances of CAD are generated by the grammar G, then the rule is surface-true.

But, there does not seem to be much reason for that notion to be encoded in the grammar and the theory does not provide any way to encode it.1 Note further that I am also deliberately stating in (1) that a constraint against CAD has been “projected” from the alternation, rather than treating such constraints as autonomous entities of the theory as is done in Optimality Theory (OT) and friends. Finally, I have phrased this in terms of grammaticality (“are ill-formed”) rather than acceptability.

Why might the Alternation Phonotactic Hypothesis (henceforth, APH) be true? First, I take it as obvious that alternations are more entrenched facts about grammars than pure allophony. For instance, in English, stop aspiration could be governed by a rule of allophony, but it is also plausible that English speakers simply represent aspirated stops as such in their lexical entries since there are no aspiration alternations. This point was made separately by Dell (1973) and Stampe (1973), and motivates the notion of lexicon optimization in OT. In contrast, though, rules of alternation (or someting like them) are actually necessary to obtain the proper surface forms. An English speaker who does not have a rule of obstruent voice assimilation will simply not produce the right allomorphs of various affixes. In contrast, the same speaker need not encode a process of nasalization—which in English is clearly allophonic (see, e.g., Kager 1999: 31f.)—to obtain the correct outputs. Given that alternations are entrenched in the relevant sense, it is not impossible to imagine that speakers might “project” constraints out of alternation generalizations in the manner described above. Such constraints could be used during online processing, assuming a strong isomorphism between grammatical representations used during production and perception.2 Secondly, since not all alternations are surface-true, it seems reasonable to limit this process of projection to those which are. Were one to project non-surface-true constraints in this fashion, the speaker would find themselves in an awkward position in which actual words are ill-formed.3,4

The APH is interesting contrasted with the following:

(5) Lexicostatistic Phonotactic Hypothesis: Let A, C, and be (possibly-null) string sets. Then, if CAD is statistically underrepresented (in a sense to be determined) in the lexicon L of a grammar G, nonce words containing the subsequence CAD are ill-formed for speakers of G. 

According to the LSPH (as we’ll call it), phonotactic knowledge is projected not from alternations but from statistical analysis of the lexicon. The LSPH is at least implicit in the robust cottage industry which uses statistical and/or computational modeling of the lexicon to infer the existence of phonotactic generalizations. It is notable how virtually none of the “cottage industry” of LSPH work discusses anything like the APH. Finally, one should note that APH and the LSPH do not exhaust the set of possibilities. For instance, Berent et al. (2007) and Daland et al. (2011) test for effects of the Sonority Sequencing Principle, a putative linguistic universal, on wordlikeness judgments. And some have denied the mere existence of phonotactic constraints.

Gorman 2013 reviews some prior results which argue in favor of the APH, which I’ll describe below.

Consider the putative English phonotactic constraint *V̄ʃ#, a constraint against word-final sequences of tense vowels followed by [ʃ] proposed by Iverson & Salmons (2005). Exceptions to this generalization tend to be markedly foreign (e.g., cartouche), to be proper names (e.g., LaRouche), or to convey an “affective, onomatopoeic quality” (e.g., sheeshwoosh). As Gorman (2013:43f.) notes, this constraint is statistically robust, but Hayes & White (2013) report that it has no measurable effect on English speakers’ wordlikeness judgments. In contrast, three English alternation rules  (nasal place assimilation, obstruent voice assimilation, and degemination) have a substantial impact on wordlikeness judgments (Gorman 2013, ch. 4).

A secod, more elaborate example comes from Turkish. Lees (1966a,b) proposes three phonotactic constraints in this language: backness harmony, roundness harmony, and labial attraction. All three of these constraints have exceptions, but Gorman (p. 57-60) shows that they are statistically robust generalizations. Thus, under the LSPH, speakers ought to be sensitive to all three.

Endnotes

  1. I note that the CONTROL module proposed by Orgun & Sprouse (1999) might be a mechanism by which this information could be encoded.
  2. Some evidence that phonotactic knowledge is deployed in production comes from the study of Finnish and Turkish, both of which have robust vowel harmony. Suomi et al. (1997) and Vroomen et al. (1998) find that disharmony seemingly acts as a cue for word boundaries in Finnish, and Kabak et al. (2010) find something similar for Turkish, but not in French, which lacks harmony.
  3. Durvasula & Kahng (2019) find that speakers do not necessarily judge a nonce word to be ill-formed just because it fails to follow certain subtle allophonic generalizations, which suggests that the distinction between allophony and alternation may be important here.
  4. I note that it has sometimes been proposed that actual words of G may in fact be gradiently marked or otherwise degraded w.r.t. to grammar G if they violate phonotactic constraints projected from G (e.g., Coetzee 2008). However, the null hypothesis, it seems to me, is that all actual words are also possible words and so it does not make sense to speak of actual words as marked or ill-formed, gradiently or otherwise.

References

Berent, I., Steriade, D., Lennertz, T., and Vaknin, V. 2007. What we know about what we have never heard: evidence from perceptual illusions. Cognition 104: 591-630.
Coetzee, A. W. 2008. Grammaticality and ungrammaticality in phonology. Language 64(2): 218-257. [I critique this briefly in Gorman 2013, p. 4f.]
Daland, R., Hayes, B., White, J., Garellek, M., Davis, A., and Norrmann, I. 2011. Explaining sonority projection effects. Phonology 28: 197-234.
Dell, F. 1973. Les règles et les sons. Hermann.
Durvasula, K. and Kahng, J. 2019. Phonological acceptability is not isomorphic with phonological grammaticality of stimulus. Talk presented at the Annual Meeting on Phonology.
Gorman, K. 2013. Generative phonotactics. Doctoral dissertation, University of Pennsylvania.
Halle, M. 1959. Sound Pattern of Russian. Mouton.
Hayes, B. and White, J. 2013. Phonological naturalness and phonotactic learning. Linguistic Inquiry 44: 45-75.
Iverson, G. K. and Salmons, J. C. 2005. Filling the gap: English tense vowel plus final
/š/. Journal of English Linguistics 33: 1-15.
Kager, R. 1999. Optimality Theory. Cambridge University Press.
Orgun, C. O. and Sprouse, R. 1999. From MPARSE to CONTROL: deriving ungrammaticality. Phonology 16: 191-224.
Kabak, B., Maniwa, K., and Kazanina, N. 2010. Listeners use vowel harmony and word-final stress to spot nonsense words: a study of Turkish and French. Journal of Laboratory Phonology 1: 207-224.
Lees, R. B. 1966a. On the interpretation of a Turkish vowel alternation. Anthropological Linguistics 8: 32-39.
Lees, R. B. 1966b. Turkish harmony and the description of assimilation. Türk Dili
Araştırmaları Yıllığı Belletene 1966: 279-297
Stampe, D. 1973. A Dissertation on Natural Phonology. Garland. [I don’t have this in front of me but if I remember correctly, Stampe argues non-surface true phonological rules are essentially second-class citizens.]
Suomi, K. McQueen, J. M., and Cutler, A. 1997. Vowel harmony and speech segmentation in Finnish. Journal of Memory and Language 36: 422-444.
Vroomen, J., Tuomainen, J. and de Gelder, B. 1998. The roles of word stress and vowel harmony in speech segmentation. Journal of Memory and Language 38: 133-149.

Anatomy of an analogy

I have posted a lightly-revised version of the handout of a talk I gave at Stony Brook University last November here on LingBuzz. In it, I argue that analogical leveling phenomena in Latin previously attributed to pressures against interparadigmatic analogy or towards phonological process overapplication are better understood as the result of Neogrammarian sound change, loss of productivy, and finally covert reanalysis.

What phonotactics-free phonology is not

In my previous post, I showed how many phonological arguments are implicitly phonotactic in nature, using the analysis of the Latin labiovelars as an example. If we instead adopt a restricted view of phonotactics as derived from phonological processes, as I argue for in Gorman 2013, what specific forms of argumentation must we reject? I discern two such types:

  1. Arguments from the distribution of phonemes in URs. Early generative phonologists posited sequence structure constraints, constraints on sequences found in URs (e.g, Stanley 1967, et seq.). This seems to reflect more the then-contemporary mania for information theory and lexical compression, ideas which appear to have lead nowhere and which were abandoned not long after. Modern forms of this argument may use probabilistic constraints instead of categorical ones, but the same critiques remain. It has never been articulated why these constraints, whether categorical or probabilistic, are considered key acquirenda. I.e., why would speakers bother to track these constraints, given that they simply recapitulate information already present in the lexicon. Furthermore, as I noted in the previous post, it is clear that some of these generalizations are apparent even to non-speakers of the language; for example, monolingual New Zealand English speakers have a surprisingly good handle on Māori phonotactics despite knowing few if any Māori words. Finally, as discussed elsewhere (Gorman 2013: ch. 3, Gorman 2014), some statistically robust sequence structure constraints appear to have little if any effect on speakers judgments of nonce word well-formedness, loanword adaptation, or the direction of language change.
  2. Arguments based on the distribution of SRs not derived from neutralizing alternations. Some early generative phonologists also posited surface-based constraints (e.g., Shibatani 1973). These were posited to account for supposed knowledge of “wordlikeness” that could not be explained on the basis of constraints on URs. One example is that of German, which has across-the-board word-final devoicing of obstruents, but which clearly permits underlying root-final voiced obstruents in free stems (e.g., [gʀaːt]-[gʀaːdɘ] ‘degree(s)’ from /grad/). In such a language, Shibatani claims, a nonce word with a word-final voiced obstruent would be judged un-wordlike. Two points should be made here. First, the surface constraint in question derives directly from a neutralizing phonological process. Constraint-based theories which separate “disease” and “cure” posit a  constraint against word-final obstruents, but in procedural/rule-based theories there is no reason to reify this generalization, which after all is a mere recapitulation of the facts of alternation, arguably more a more entrenched source of evidence for grammar construction. Secondly, Shibatani did not in fact validate his claim about German speakers’ in any systematic fashion. Some recent work by Durvasula & Kahng (2019) reports that speakers do not necessarily judge a nonce word to be ill-formed just because it fails to follow certain subtle allophonic principles.

References

Durvasula, K. and Kahng, J. 2019. Phonological acceptability is not isomorphic with phonological grammaticality of stimulus. Talk presented at the Annual Meeting on Phonology.
Gorman, K. 2013. Generative phonotactics. Doctoral dissertation, University of Pennsylvania.
Gorman, K. 2014.  A program for phonotactic theory. In Proceedings of the Forty-Seventh Annual Meeting of the Chicago Linguistic Society: The Main Session, pages 79-93.
Shibatani, M. 1973. The role of surface phonetic constraints in generative phonology. Language 49(1): 87-106.
Stanley, R. 1967. Redundancy rules in phonology. Language 43(2): 393-436.

Towards a phonotactics-free phonology

Early generative phonology had surprisingly little to say about the theory of phonotactics. Chomsky and Halle (1965) claim that English speakers can easily distinguish between real words like brick, well-formed or “possible” nonce words like blick, and ill-formed or “impossible” nonce words like bnick. Such knowledge must be in part language-specific, since, for instance, [bn] onsets are in some languages—Hebrew for instance—totally unobjectionable. But few attempts were made at the time to figure out how to encode this knowledge.

Chomsky and Halle, and later Stanley (1967), propose sequence structure constraints (SSCs), generalizations which encode sequential redundancies in underlying representations.1 Chomsky and Halle (p. 100) hypothesize that such generalizations might account for the ill-formedness of bnick: perhaps English consonants preceded by a word-initial obstruent must be liquids: thus blick but not bnick. Shibatani (1973) claims that not all language-specific generalizations about (im)possible words can derive from restrictions on underlying representations and must (instead or also) be expressed in terms of restrictions on surface form. For instance, in German, obstruent voicing is contrastive but neutralized word-finally; e.g., [gʀaːt]-[gʀaːtɘ] ‘ridge(s) vs. [gʀaːt]-[gʀaːdɘ] ‘degree(s)’. Yet, Shibatani claims that German speakers supposedly judge word-final  voiced obstruents, as in the hypothetical but unattested [gʀaːd], to be ill-formed. Similar claims were made by Clayton (1976). And that roughly exhausts the debate at the time. Many years later, Hale and Reiss can, for instance, deny that that this kind of knowledge is part of the narrow faculty of language.

Even if we, as linguists, find some generalizations in our description of the lexicon, there is no reason to posit these generalizations as part of the speaker’s knowledge of their language, since they are computationally inert and thus irrelevant to the input-output mappings that the grammar is responsible for. (Hale and Reiss 2008:17f.)

Many years later, Charles Reiss (p.c.) proposed to me a brief thought experiment. Imagine that you were to ask a naïve non-linguist monolingual English speaker to discern whether a short snippet of spoken language was either, say, Māori or Czech. Would you not expect that such a speaker would do far better than chance, even if they themselves do not know a single word in either language? Clearly then, (at least some form of) phonotactic knowledge can be acquired extremely indirectly, effortlessly, without any substantial exposure to the language, and does not imply any deep knowledge of the grammar(s) in question.2

In a broader historical context, though, early generativists’ relative disinterest in phonotactic theory is something of a historical anomaly. Structuralist phonologists, in developing phonemicizations, were at least sometimes concerned with positing phonemes that have a restricted distribution. And for phonologists working in strains of thinking that ultimately spawned Harmonic Grammar and Optimality Theory, phonotactic generalizations are to a considerable degree what phonological grammars are made of.

A phonological theory which rejects phonotactics as part of the narrow language faculty—as do Hale and Reiss—is one which makes different predictions than theories which do include it, if only because such an assumption necessarily excludes certain sources of evidence. Such a grammar cannot make reference to generalizations about distributions of phonemes that are not tied to allophonic principles or to alternations. Nor can it make reference to the distribution of contrast except in the presence of neutralizing phonological processes.

I illustrated this point very briefly in Gorman 2014 with a famous case from Sanskrit (the so-called diaspirate roots); here I’d like to provide more detailed example using a language I know much better, namely Latin. Anticipating the conclusions drawn below, it seems that nearly all the arguments mustered in this well-known case are phonotactic in nature and are irrelevant in a phonotactics-free theory of phonology.

In Classical Latin, the orthographic sequence qu (or more specifically <QV>) denotes the sound [kw].Similarly, gu is ambiguously either [gu] as in exiguus [ek.si.gu.us] ‘strict’ or [gw] as in anguis [aŋ.gwis] ‘snake’. For whatever reason, it seems that is gu was pronounced as [gw] if and only if it is preceded by an n. It is not at all clear if this should be regarded as an orthographic generalization, a phonological principle, or a mere accident of history.

How should the labiovelars qu and (post-nasal) gu be phonologized? This topic has been the subject of much speculation. Devine and Stephens (1977) devoted half a lengthy book to the topic, for instance. More recently, Cser’s (2020: 22f.) phonology of Latin reconsiders the evidence, revising an earlier presentation (Cser 2013) of these facts. In fact three possibilities are imaginable: qu, for instance, could be unisegmental /kʷ/, bisegmental /kw/, or even /ku/ (Watbled 2005), though as Cser correctly observes, the latter does not seem to be workable. Cser reluctantly concludes that the question is not yet decidable. Let us consider this question briefly, departing from Cser’s theorizing only in the assumption of a phonotactics-free phonology.

  1. Frequency. Following Devine and Stephens, Cser notes that the lexical frequency of qu greatly exceeds that of k and glide [w] (written u) in general. They take this as evidence for unisegmental /kʷ, gʷ/. However, it is not at all clear to me why this ought to matter to the child acquiring Latin. In a phonotactics-free phonology, there is no simply reason for the learner to attend to this statistical discrepancy. 
  2. Phonetic issuesCser reviews testimonia from ancient grammarians suggesting that the “[w] element in <qu> was less consonant-like than other [w]s” (p. 23). However, as he points out, this is trivially handled in the unisegmental analysis and is a trivial example of allophony in the bisegmental analysis. 
  3. Geminates. Cser points out that the labiovelars, unlike all consonants but [w], fail to form intervocalic geminates. However, phonotactics-free phonology has no need to explain which underlying geminates are and are not allowed in the lexicon.
  4. Positional restrictions. Under a bisegmental interpretation, the labiovelars are “marked” in that obstruent-glide sequences are rare in Latin. On the other hand, under a unisegmental interpretation, the absence of word-final labiovelars is unexpected. However, both of thes observations have no status in phonotactics-free phonology.
  5. The question of [sw]. The sequence [sw] is attested initially in a few words (e.g., suāuis ‘sweet’). Is [sw] uni- or bisegmental?  Cser notes that were one to adopt a unisegmental analysis for the labiovelars qu and gu, [sw] is the only complex onset in which [w] may occur. However, an apparently restricted distribution for [w] has no evidentiary status in phonotactics-free phonology; it can only be a historical accident encoded implicitly in the lexicon.
  6. Verb root structure. Devine and Stephens claim that verb roots ending in a three-consonant sequence are unattested except for roots ending in a sonorant-labiovelar sequence (e.g., torquere ‘to turn’, tinguere ‘to dip’). While this is unexplained under a bisegmental analysis, this is an argument based on distributional restrictions that have no status in phonotactics-free phonology. 
  7. Voicing contrast in clusters. Voicing is contrastive in Latin nasal-labiovelar clusters, thus linquam ‘I will/would leave’ (1sg. fut./subj. act.) linguam ‘tongue’ (acc.sg.). According to Cser, under the biphonemic analysis this would be the only context in which a CCC cluster has contrastive voicing, and “[t]his is certainly a fact that points towards the greater plausibility of the unisegmental interpretation of labiovelars” (p. 27). It is is not clear that the distribution of voicing contrasts ought to be taken into account in a phonotactics-free theory, since there is no evidence for a process neutralizing voicing contrasts in word-internal trisegmental clusters.
  8. Alternations. In two verbs, qu alternates with cū [kuː] in the perfect participle (ppl.): loquī ‘to speak’ vs. its ppl. locūtus and sequī ‘to follow’ vs. its ppl. secūtus. Superficially this resembles alternations in which [lv, bv, gv] alternate with [luː, buː, guː] in the perfect participle. This suggests a bisegmental analysis, and since this is based on patterns of alternation, is consistent with a phonotactics-free theory. On the other hand, qu also alternates with plain c [k]. For example, consider the verb coquere ‘to cook’, which has a past participle coctus. Similarly, the verb relinquere ‘to leave’ has a perfect participle relictus, but the loss of the Indo-European “nasal insert” (as it is known) found in the infinitive may suggest an alternative—possibly suppletive—analysis. Cser concludes, and I agree, that this evidence is ambiguous.
  9. ad-assimilation. The prefix ad- variably assimilates in place and manner to the following stem-initial consonant. Cser claims that this is rare with qu-initial stems (e.g., unassimilated adquirere ‘to acquire’ is far more frequent than assimilated acquirere in the corpus). This is suggestive of a bisegmental analysis insofar as ad-assimilation is extremely common with [k]-initial stems. This seems to weakly supports the bisegmental analysis.5
  10. Diachronic considerations. Latin qu is a descendent of the Indo-European *kʷ, one member of a larger labiovelar series. All members of this series appear to be unisegmental in the proto-language. However, as Cser notes, this is simply not relevant for the synchronic status of qu and gu.
  11. Poetic licence. Rarely the poets used a device known as diaeresis, the reading of [w] as [u] to make the meter. Cser claims this does not obtain for qu. This is weak evidence for the unisegmental analysis because the labial-glide portion of /kʷ/ would not obviously be in the scope of diaeresis.
  12. The distribution of gu. As noted above the voiced labiovelar gu is lexically quite rare, and always preceded by n. In a phonological theory which attends to phonotactic constraints, this is an explanandum crying out for an explanans. Cser argues that it is particularly odd under the unisegmental analysis because there is no other segment so restricted. But in phonotactics-free phonology, there is no need to explain this accident of history.

Cser concludes that this series of arguments are largely inconclusive. He takes (7, 11) to be evidence for the unisegmental analysis, (3, 5, 8, 9) to be evidence for the bisegmental analysis, and all other points to be largely inconclusive. Reassessing the evidence in a phonotactics-free theory, only (9) and (11), both based on rather rare evidence, remain as possible arguments for the status of the labiovelars. I too have to regard the evidence as inconclusive, though I am now on the lookout for diaeresis of qu and gu, and hope to obtain a better understanding of prefix-final consonant assimilation.

Clearly, working phonologists are heavily dependent on phonotactic arguments, and rejecting them as explanations would substantially limit the evidence base used in phonological inquiry.

Endnotes

  1. In part this must reflect the obsession with information theory in linguistics at the time. Of this obsession Halle (1975) would later write that this general approach was “of absolutely no use to anyone working on problems in linguistics” (532).
  2. As it happens, monolingual English-speaking New Zealanders are roughly as good at discriminating between “possible” and “impossible” Māori nonce words as are Māori speakers (Oh et al. 2020).
  3. I write this phonetically as [kw] rather than [kʷ] because it is unclear to me how the latter might differ phonetically from the former. These objections do not apply to the phonological transcription /kʷ/, however.
  4. Recently Gouskova and Stanton (2021) have revived this theory and applied it to a number of case studies in other languages. 
  5. It is at least possible that that unassimilated spellings are “conservative” spelling conventions and do not reflect speech. If so, one may still wish to explain the substantial discrepency in rates of (orthographic) assimilation to different stem-initial consonants and consonant clusters. 

References

Chomsky, N. and Halle, M. 1965. Some controversial questions in phonological theory. Journal of Linguistics 1(2): 97-138.
Clayton, M. L. 1976. The redundance of underlying morpheme-structure conditions. Language 52(2): 295-313.
Cser, A. 2013. Segmental identity and the issue of complex segments. Acta Linguistica Hungarica 60(3): 247-264.
Cser, A. 2020. The Phonology of Classical Latin. John Wiley & Sons.
Devine, A. M. and Stephens, L. D. 1977. Two Studies in Latin Phonology. Anma Libri.
Gorman, K. 2013. Generative phonotactics. Doctoral dissertation, University of Pennsylvania.
Gorman, K. 2014. A program for phonotactic theory. In Proceedings of the Forty-Seventh Annual Meeting of the Chicago Linguistic Society: The Main Session, pages 79-93.
Gouskova, M. and Stanton, Juliet. 2021. Learning complex segments. Language 97(1):151-193.
Hale, M. and Reiss, C. 2008. The Phonological Enterprise. Oxford University Press.
Halle, M. 1975. Confessio grammatici. Language 51(3): 525-535.
Oh, Y., Simon, T., Beckner, C., Hay, J., King, J., and Needle, J. 2020. Non-Māori-speaking New Zealanders have a Māori proto-lexicon. Scientific Reports 10: 22318.
Shibatani, M. 1973. The role of surface phonetic constraints in generative phonology. Language 49(1): 87-106.
Stanley, R. 1967. Redundancy rules in phonology. Language 43(2): 393-436.
Watbled, J.-P. 2005. Théories phonologiques et questions de phonologie latine. In C. Touratier (ed.), Essais de phonologie latine, pages 25-57. Publications de l’Université de Provence.

Thought experiment #2

In an earlier post, I argued that for the logical necessity of admitting some kind of “magic” to account for lexically arbitrary behaviors like Romance metaphony or Slavic yers. In this post I’d like to briefly consider the consequences for the theory of language acquisition.

If mature adult representations have magic, infants’ hypothesis space must also include the possibility of positing magical URs (as Jim Harris argues for Spanish or Jerzy Rubach argues for Polish). What might happen the hypothesis space was not so specified? Consider the following thought experiment:

The Rigelians from Thought Experiment #1 did not do a good job sterilizing their space ships. (They normally just lick the flying saucer real good.) Specks of Rigelian dust carry a retrovirus that infects human infants and modifies their their faculty of language so that they no longer entertain magical analyses.

What then do we suppose might happen to Spanish and Polish patterns we previously identified as instances of magic? Initially, the primary linguistic data will not have changed, just the acquisitional hypothesis space. What kind of grammar will infected Spanish-acquiring babies acquire?

For Harris (and Rubach), the answer must be that infected babies cannot acquire the metaphonic patterns present in the PLD. Since there is reason to think (see, e.g., Gorman & Yang 2019:§3) that the diphthongization is the minority pattern in Spanish, it seems most likely that the children will acquire a novel grammar in which negar ‘to deny’ has an innovative non-alternating first person singular indicative *nego rather than niego ‘I deny’.

Not all linguists agree. For instance, Bybee & Pardo (1981; henceforth BP) claim that there is some local segmental conditioning on diphthongization, in the sense that Spanish speakers may be able to partially predict whether or not a stem diphthongizes on the basis of nearby segments.1 Similarly, Albright, Andrade, & Hayes (2001; henceforth AAH) develop a computational model which can extract generalizations of this sort.2 For instance, BP claim that an e followed by __r, __nt, or __rt are more likely to diphthongize, and AAH claim that a following stem-final __rr (the alveolar trill [r], not the alveolar tap [ɾ]) and a following __mb also favor diphthongization. BP are somewhat fuzzy about the representational status of these generalizations, but for AAH, who reject the magical segment analysis, they are expressed by a series of competing rules.

I am not yet convinced by this proposal. Neither BP nor AAH give the reader any general sense of the coverage of the segmental generalizations they propose (or in the case of AAH, that their computational model discovers): I’d like to know basic statistics like precision and recall for existing words. Furthermore, AAH note that their computational model sometimes needs to fall back on “word-specific rules” (their term), rules in which the segmental conditioning is an entire stem, and I’d like to know how often this is necessary.3 Rather than reporting coverage, BP and AAH instead correlate their generalizations with the results of wug-tasks (i.e., nonce word production tasks) by Spanish-speaking adults. The obvious objection here is that no evidenceor even an explicit linking hypothesislinks adults’ generalizations about nonce words in a lab to childrens’ generalizations about novel words in more naturalistic settings.

However, I want to extend an olive branch to linguists who are otherwise inclined to agree with BP and AAH. It is entirely possible that children do use local segmental conditioning to learn the patterns linguists analyzed with magical segments and/or morphs, even if we continue to posit magic segments or morphs. It is even possible that sensitivity to this segmental conditioning persists into adulthood as reflected in the aforementioned wug-tasks. Local segmental conditioning might be an example of domain-general pattern learning, and might be likened to sound symbolism—such as the well-known statistical tendency for English words beginning in gl– to relate to “light, vision, or brightness” (Charles Yang, p.c.)insofar as both types of patterns reduce apparent arbitrariness of the lexicon. I am also tempted to identify both local segmental conditioning and sound symbolism as examples of third factor effect (in the sense of Chomsky 2005). Chomsky identifies three factors in the design of language: the genetic endowment, “experience” (the primary linguistic data), and finally “[p]rinciples not specific to the faculty of language”. Some examples of third factorsas these principles not specific to the faculty of language are calledgiven in the paper include domain-general principles of “data processing” or “data analysis” and biological constraints, whether “architectural”, “computational”, or “developmental”. I submit that general-purpose pattern learning might be an example of of domain-general “data analysis”.

As it happens, we do have one way to probe the coverage of local segmental conditioning. Modern sequence-to-sequence neural networks, arguably the most powerful domain-general string pattern learning tool known to us, have been used for morphological generation tasks. For instance, in the CoNLL-SIGMORPHON 2017 shared task, neural networks are used to predict the inflected form of various words given some citation form  and a morphological specification. For instance, given the pair (dentar, V;IND;PRS;1;SG) the models have to predict diento ‘I am teething’. Very briefly, these models, as currently designed, are much like babies infected with the Rigelian retrovirus: their hypothesis space does not include “magic” segments or lexical diacritics and they must rely solely on local segmental conditioning. It is perhaps not surprising, then, that they misapply diphthongization in Spanish (e.g., *recolan for recuelan ‘they re-strain’; Gorman et al. 2019) or yer deletion in Polish, when presented with previously unseen lemmata. But it is an open question how closely these errors pattern like those made by children, or with adults’ behaviors in wug™-tasks.

Acknowledgments

I thank Charles Yang for drawing my attention to some of the issues discussed above.

Endnotes

  1. Similarly, Rysling (2016) argues that Polish yers are epenthesized to avoid certain branching codas, though she admits that their appearance is governed in part by magic (according to her analysis, exceptional morphs of the Gouskova/Pater variety).
  2. Later versions of this model developed by Albright and colleagues are better known for popularizing the notion of “islands of reliability”.
  3. Bill Idsardi (p.c.) raises the question of whether magical URs and morpholexical rules are extensionally equivalent. Good question.

References

Albright, A., Andrade, A., and Hayes, B. 2001. Segmental environments of Spanish diphthongization. UCLA Working Papers in Linguistics 7: 117-151.
Bybee, J., and Pardo, E. 1981. Morphological and lexical conditioning of rules: experimental evidence from Spanish. Linguistics 19: 937-968.
Chomsky, N. 2005. Three factors in language design. Linguistic Inquiry 36(1): 1-22.
Gorman, K. and Yang, C. 2019. When nobody wins. In Franz Rainer, Francesco Gardani, Hans Christian Luschützky and Wolfgang U. Dressler (ed.), Competition in inflection and word formation, pages 169-193. Springer.
Gorman, K., McCarthy, A.D., Cotterell, R., Vylomova, E., Silfverberg, M., Markowska, M. 2019. Weird inflects but okay: making sense of morphological generation errors. In Proceedings of the 23rd Conference on Computational Natural Language Learning, pages 140-151.
Rysling, A. 2016. Polish yers revisited. Catalan Journal of Linguistics 15: 121-143.