UDTube now supports dependency parsing

Almost two years ago we announced UDTube, our neural morphological analyzer. As of v0.2, we have added a dependency parser as an additional head/task. UDTube’s parser head uses a deep biaffine parser and the Chiu-Liu-Edmonds maximum spanning tree algorithm for decoding, meaning that the results are globally optimal and the parser supports non-projective dependencies. Check it out.

On auto-coding

An unlikely series confusions and mismaps in my early career resulted in my brief involvement with the forced alignment-industrial complex. I’m grateful to people like the excellent Michael Wagner who supported my work on this topic, but I’m glad to see other people who have a deeper interest in acoustic phonetics methodology (like the also-excellent Michael McAuliffe) take over the enterprise. Phonetics has not been a major research interest for me for some time now.

I published a single two-page paper (Gorman et al. 2011) on forced alignment in 2011, and somehow, it’s my most cited work. Perhaps for that reason, I receive a lot of requests to review work involving forced aligners. Two frustrating “tricks” are extremely common in this literature.

The first involves manipulating the phonetic dictionary as a way to auto-code (socio)phonetic variables; I’ll refer to this method as dictionary hacking. For instance, Yuan & Liberman (2011) studied American English ‘g-dropping’ using this method. For each word in ending in ing [ɪŋ] they add a competing pronunciation variant [ɪn]. As a result, the final phonemic alignment contains information about whether the overall model took each ing rendition to be [ɪŋ] or [ɪn]. This sort of works (neat!), but I don’t think it’s a particularly good way to auto-code. First, good HMM accoustic models represent phones (or diphones, or triphones) using mixtures of multi-variant Gaussians (GMMs), and such models are capable of representing phonetically disparate renditions as instances of the same mixture; they don’t really reflect linguists’ intuitions about allophony. Secondly, and specifically to the Yuan & Liberman approach, they ignore a third possibility for this variable: [in], a tense high front vowel with an apical nasal. In my “style a” (i.e., low attention paid to my speech), this variant alternates with lax [ɪn]; I rarely produce [ɪŋ]. Dealing with differnet variants is hard using the dictionary-hacking method. There is of course a simple solution here. You use the forced aligner as is to find reasonably good timestamps of the relevant intervals, you extract acoustic features from those intervals, and you feed them into a discriminative supervised machine learning system trained on a small amount of labeled data. (In some cases, relevant corpora already have sufficiently detailed phonetic transcriptions so no additional labeling is necessary.) Done right, this will produce strictly better (more human-annotator-like) results than dictionary hacking: discriminative models optimized specifically for the coding task at hand, provided with appropriate acoustic features, will be more accurate than the forced aligner’s generative HMM-GMM system optimized for an objective only distantly related to the question at hand.

The second trick involves using (mono-, di-, or tri-)phone GMMs from different dialects or languages to auto-code. I’ll refer to this as phone hacking. For example, if one has a Montreal French acoustic model and an American English acoustic model, one can use the forced aligner to determine whether a rendition of Scottish English r is are more like the Montreal French or American English r. Milne (2011, 2014), for exapmle, describes some early work of this type. Once again, this sort of works (jeepers!) but it has all the same sorts of problems, problems which could be fixed by once again using the forced aligner for approximate timing information (it’s reasonably good at that), extracting phonetic features from the relevant intervals, and then feeding them into discriminative models optimized to code whatever variants of r you’re interested in. There’s no excuse, really for using Montreal French acoustic models on your Scottish data. 

In my opinion, dictionary hacking and phone hacking are unnecessarily lazy, sloppy solutions to coding problems that aren’t really all that hard in the first place, and I tell the editors as much when asked to review papers using these techniques. The discriminative approach is not only relatively easy for a computationally sophisticated phonetician, but was almost as easy a full two decades ago. Since I don’t really work in this area anymore, I don’t know if there’s a library for discriminative auto-coding as well-designed or well-documented as the Montreal Forced Aligner, but if not, something like this is greatly needed.

References

Gorman, K., Howell, J. and Wagner, M. 2011. Prosodylab-Aligner: A tool for forced alignment of laboratory speech. Journal of the Canadian Acoustical Association 39(3): 192-193.
Milne, P. 2011. The effects of syllable position on allophonic variation in Québec French /ʀ/: A corpus analysis using a modified version of the Penn Phonetics Lab Forced Aligner. Paper presented at NWAV 40.
Milne, P. 2014. The variable pronunciations of word-final consonant clusters in a force aligned corpus of spoken French. Doctoral dissertation, University of Ottawa.
Yuan, J., and Liberman, M. 2011. Automatic detection of “g-dropping” in American English using forced alignment. In 2011 IEEE workshop on automatic speech recognition & understanding, pages 490-493.

 

Nothing in nature is “all the way down”

I recently saw a talk where the speaker was endorsing construction grammar (which type, I’m not sure) and in particular, a view of language as “constructions all the way down”. On the contrary: nothing in nature is all the way down.

Consider the material world, in many things can be described as particles: discrete, bounded entities which occupy positions in space and which can interact with other particles. And indeed, some particles are made up of other particles: solids contain molecules, which contain atoms, which contain subatomic particles, which are made up of fermions and bosons. But at some point, this really does break down: electrons, fermions, and bosons are not known to be made of smaller particles.

I take this to be an property of the natural world: many natural types are recursive, but the recursion must ultimately terminate.

The same is true for constructions. Certainly we can imagine a theory of constructions where larger constructions contain smaller ones, but when we get to atomic units that non-construction-sympathetic grammarians recognize —phonemic features, syntactic heads, etc.—we eventually have to stop subdividing, and the recursion terminates. One could define, say, terminals as syntactic constructions, but one cannot do so if a essential property of constructions is that they are constructed from smaller parts, since terminals are not so constructed in any relevant sense. This weaker sense of “all the way down” is a property common to any recursively defined data structure, including, say, the standard definition of context-free grammars and the constituency trees they build; it is nothing special about construction grammar as a discipline.

More Pynchonian eye dialect

Twelve years ago I wrote a bit about Pynchon’s use of eye dialect in his underappreciated 2013 novel Bleeding Edge. In that book, the dialogue of Californian woman (Vyrna McElmo) is stylized so that her -ings are spelled -een, presumably denoting [in]; e.g., “I’m still, like, vibrateen“. I am now working through Vineland (1990). In that book, another Californian, DEA agent Hector Zuñiga, uses a different eye dialect take on the same variable: his -ings are spelled -ín, presumably denoting something similar, as in the following passage (p. 28):

All of you are still children inside, livín your real life back then. Still waitín for that magic payoff. […] Rill puzzlín.

I wonder if there’s a prosodic difference between (Caucasian) McElmo and (Latino) Zuñiga’s renditions of -ing in Pynchon’s mind, though.

Linguistics beach reads

Since I started grad school, I have made a practice of reading books, and pop-science linguistics books in particular. I genuinely think I’ve gotten a lot out of it over the years. Let me make a few recommendations for your summer beach reading, focusing on lighter fare.

  • The Riddle of the Labyrinth: The Quest to Crack an Ancient Code (Margalit Fox, 2013) is a breezy take on the decipherment of Linear B, with particular emphasis on crucial early work done by Brooklyn College professor Alice Kober, who was in heavy correspondence with amateur Michael Ventris, who announced the decipherment just eighteen months after her untimely death at age 43. (Ventris himself died even younger, at 34, in a car accident that some think a concealed suicide.) The Linear B saga is a neverending source of interest, and Fox is both good on the drama (she used to write the obituaries in the Times) and the linguistics (she has a master’s degree from Stony Brook).
  • Chinese Characters across Asia: How the Chinese Script Came to Write Japanese, Korean, and Vietnamese (Zev Handel, 2025) talks amateurs through the history of writing in East Asia, summarizing his much more technical 2019 book on the same topic for a non-linguistic audience. 
  • Patterns In The Mind: Language And Human Nature (Ray Jackendoff, 1994) is my favorite of Language Instinct-alikes. It is focused more or less on selling the idea of UG to normies, and on those terms, it succeeds mightily. 
  • Because Internet: Understanding the New Rules of Language (Gretchen McCulloch, 2019) does a good job summarizing disparate threads in the sociolinguistics of computer-mediated language with just enough humor to lighten the mood.
  • Language and Problems of Knowledge: The Managua Lectures (Noam Chomsky, 1987) is the text of five lectures given to a lay audience in Nicaragua, illustrating the core ideas of the generative program. Most of the examples are based on comparing the syntax of English and Spanish, and the book is easily the most accessible thing Chomsky has written (and far more relevant to current thinking than, say, the equally-accessible Syntactic Structures). 

I of course welcome other suggestions in the comments section. 

Inverting microfiche negatives

My university’s library has issues of many older journals currently only available on microfiche, which are 4×6″ sheets of film used to store  analogue copies of pages. These sheets are highly compact and durable (if you ignore the fact that film is quite flammable) and while they sort of look messy, they tend to preserve contrast well and thus are quite readable.

When I request a paper available on microfiche, the librarian uses a machine which scans the microfiche and generates a PDF. Unfortunately, the resulting PDF is a negative of the text, which I find slightly difficult to read. Thus I have been using UNIX tools to invert the negative into a positive image. One solution that works is based on the widely-available ImageMagick. For instance, the following command does this at 300 DPI without any loss of quality:

magick -density 300 input.pdf -negate -quality 100 output.pdf

Just sharing in case this is useful for anyone else.

Sonority sequencing is a zombie

In the last few years, I have seen a number of talks and papers which tested for an effect of sonority sequencing on various phenomena, either synchronic or diachronic. Without exception, I think all these studies found a null effect, but in each case I was struck by how conciliatory and uncritical the author(s) were of the idea that sonority sequencing exists in the first place, given that they failed to find any effect of sonority sequencing in a domain where they expected to find one. I submit that the sonority sequencing principle is something of a zombie idea, a bad idea that just keeps coming back.

The idea of sonority itself is over a century old, but it has proved extremely difficult to ground in any physical reality or to provide a precise, generalizable definition of what exactly it is. This is probably because it doesn’t exist. At best, the construct we are trying to measure may be some kind of perceptual salience, which is weakly correlated with sound change (and thus with synchronic phonological processes which arise from sound change), but which is highly contextual and just one of many contingencies governing sound change.

The idea that sonority is a scale (properly, a ratio measurement in the sense of Stevens) is itself decades old as well, and gives rise to the idea that grammars constrain the differences in sonority between adjacent phones in specific ways, as in the principle of sonority sequencing. If we focus our attention on languages that permit tautosyllabic consonant clusters of any sort, I have yet to see a single case where syllable phonotactics are cleanly described by imposing thresholds on this principle. In nearly every case I have seen—Turkish is a famous example—there are many systematic gaps which cannot be explained with reference to sonority or to any other known cause beyond historical contingency (e.g., in Turkish, the absence of coda *[rn], *[lm] despite their favorable sonority profile). In such cases, I see no reason to give any credit to the sonority sequencing principle.

Sometimes, theoretical progress involves not just the introduction of good new ideas, but also the rejection of old, bad ideas. I think sonority is one of those old, bad ideas, and I think phonologists should view it with a much more critical lens than they currently do.

Why we armchair

In the last two years or so I have gradually transitioned away from experimental-behavioral and computational work towards a larger proportion of what used to be called “pencil-and-paper” research: the development of theories, formalisms, and analyses. (“Description” also is pencil-and-paper in the relevant sense, but I am not really trained as a descriptive linguist.) While there are several reasons for this, one is the rather poor state of science funding in US, which suggest that we may be entering a moneyball era for linguistics.

When describing or presenting my pencil-and-paper work on phonology and its interfaces with morphology (which, to be fair, is mostly done on my trusty desktop computer, and which is sometimes quantitative), a few colleagues have suggested that I ought to be doing fine-grained acoustic or articulatory phonetics instead. I find this suggestion vexing. Consider something like my analysis of Spanish “raising verbs” in Gorman & Reiss (in press), which in turn is used to illustrate a series of formal-theoretical proposals under the umbrella of the theory of Logical Phonology. What could phonetic analysis contribute to this discussion? It’s obvious that, e.g., p[i]do ‘I ask’ has the same surface vowel as in v[i]ivo ‘I live’ whereas p[e]dir ‘to ask’ has a different surface vowel, one that is the same as the surface vowel in sum[e]rgir ‘to submerge’.  There are of course are subtle differences betwene renditions and speakers, but there’s no reason to think those differences are relevant to the analysis of raising verbs. Anyone reading this is welcome to show that I’m wrong, but I for one think it’d be a waste of time to so much as check. 

Similarly, a few colleagues have suggested that I ought to be doing human subjects experiments to figure out how such things (i.e., Spanish raising verb alternations) work. I again find this vexing. Now one can do a wug-test, and people have, but it’s not really clear what one gains from this, since we don’t have an agreed-upon linking hypothesis. Indeed, the most likely hypothesis is that adults are using a mix of task models, some of which might be relevant to our account of raising verbs, but some of which surely aren’t. How any of this might link up to the relevant linguistic notions like underspecification, morphophonological rules, suppletion, etc.—whatever you think the relevant notions might be—is unknown. A colleague thought the task ought to be some kind of online processing experiment, exploiting an unspoken form of what we might call a “derivational theory of complexity”, a totally discredited idea from before I was born. Similar issues plague neuroimaging work. The sorts of things that we can instrument in the brain at present—single-neuron firing rates as measured by single-cell recordings, magnetic current in boxes of 50,000 or so neurons as measured by MEG, blood oxygen levels in boxes of million or so neurons as measured by fMRI,  the smeared electrical currents measured by EEG, and so on—simply do not match the “grain size” of the linguistic constructs we are interested in (Poeppel & Embick 2005): a single neuron is almost surely too small to store a “raising rule” (whatever sort of thing that is), and a fMRI voxel far, far too large.

I am happy to have phonetician, experimental linguist, and neurolinguist colleagues, I just think that it’s sort of their j-o-b to figure out how to translate interesting linguistic ideas into something their tools can test, and I somewhat resent the implication that I am leaving hanging any phonetic or experimental low-hanging fruit. In those rare cases where I myself have ideas that I think can be tested using phonetic analysis or human-subjects experiments with clear linking hypotheses, I do phonetic analysis or human-subjects experiments. Indeed, the august National Science Foundation has even funded some of these experiments. But most of the time, I don’t—we don’t—and so I theorize, formalize, or analyze instead. 

References

Gorman, K. and Reiss, C. In press. Metaphony in Substance-Free Logical Phonology. Phonology to appear.

Poeppel, D. and Embick, D. 2005. Defining the relation between linguistics and neuroscience. In Cutler, A. (ed.), Twenty-First Century Psycholinguistics: Four Cornerstones, pages 103-118. Routledge.

Italian palatalization

In his Phonology of Italian, Krämer (2009:§4.2.1) is interested in the productivity of velar palatlization before /i/-initial suffixes, such as the masculine noun plural /-i/. Palatalization obtains in, for example, in amico-amici [aˈmiːko, aˈmiːtʃi] ‘friend(s)’, but not in cuoco-cuochi [ˈkwɔːko, ˈkwɔːki] `cook(s)’. Krämer (henceforth K) further claims that non-palatalization has much higher type frequency.

K performs a small experiment in which ten adult native speakers are presented with nonce words in the singular and asked to complete sentence which requires them to form the /-i/ plural. Four subjects never palatalized; one palatalized all plurals; and five others produced a mix of the two strategies. Summarizing this result, Krämer (2012:125) concludes: “Thus, in Italian it is a personal decision whether velar palatalization is productive or not.”

I am not sure I agree. The most straightforward interpretation of this data, I think, is that the subjects used a mix of different task models. Some subjects may have been reasoning on whether palatalization is actually productive (a true “grammatical task model”), which for me means that the generalization is encoded (or not encoded, as seems more likely here) so as to apply to arbitrary words. Others may have been guessing based on form similarity to existing words (a “dictionary task model”), and others may have used a mix of the two strategies. It is perhaps not surprising that that adults can make use of the dictionary task model, because one can, with some conscious effort, think of phonemically or semantically related real words, and it’s easy to imagine deciding on whether or not to palatalize a nonce word based on the behavior of similar real words.

I think, unfortunately, that this is an unavoidable problem when wug-testing adults. I submit that the conscious analogizing abilities of adults are probably not relevant to questions of productivity and I think that because I don’t think that’s what productivity is. But I don’t know of any way to prevent adult participants from using a dictionary task model. Thus linguists and reviewers should be more skeptical about the utility of adult wug-tasks.

Schütze (2005) makes a similar point with what we might call wugrating tasks. In such tasks, speakers are asked to assign a wellformedness rating (e.g., on a Likert scale) to candidate inflected forms of nonce words. Arguably, this setting encourage speakers to adopt a highly permissive variant of the dictionary task model, which might be framed as asking “could such a word ever have such a plural?” An answer to such questions are often interesting to the linguist, but I think it quite distinct from the question of productivity that K and others wish to study.

References

Krämer, M. 2009. The Phonology of Italian. Oxford University Press.
Krämer, M. 2012. Underlying Representations. Cambridge University Press.
Schütze, C. 2005. Thinking about what we are asking speakers to do. In Kepser, S. and Reis, M. (ed.), Linguistic Evidence: Empirical, Theoretical, and Computational Perspectives, pages 457-485. De Gruyter Mouton.

Two conjectures about exceptionality

Kisseberth’s (1970) theory of exceptionality is arguably one of the most expressive yet proposed. Roughly, Kisseberth proposes that for every rule R, every morpheme bears two equipollent features, one indicating whether the morpheme is a potential target (±R Target) and another indicating whether the morpheme is a potential trigger (±R Trigger). R then applies if and only if its structural description is met, when the target morpheme is +R target, and the trigger morpheme is +R Trigger.1

I conjecture that Inkelas and colleagues’ notion of inalterability as prespecification, as implemented in Logical Phonology (LP), completely eliminates the need for morphemic ±R Target. Rather, particular morphemes’ ability to undergo R can be encoded via underespecification of individual target segments in those morphemes, rendering the segments mutable to feature-filling processes in contrast to fully-specified inalterable segments.2 There are at least a few cases—e.g., Turkish ternary voice alternations (Inkelas & Orgun 1995) and k-deletion (Gorman & Reiss in press a), Polish yer deletion (Rubach 2013)—where it seems that target exceptionality cannot be expressed as a morpheme-level property, so we have good reason to prefer the “exceptional segments” of IAP/LP to “exceptional morphemes” with respect to targeting.3

LP generalizes the IAP notion from targets to triggers, using underspecification to render possible triggers quiescent in contrast to fully-specified catalytic segments (see, e.g., Gorman & Reiss in press b). However, I conjecture that a complete theory will still need rules which are triggered in the context of specific morphemes or morphosyntactic contexts.

For example, consider umlaut in Standard German. Umlaut targeting is implemented by leaving o and u (which mutate to ö [ø] and ü [y], respectively) underspecified for Back; some additional complexities are raised by umlauting au and (which mutate to äu [ɔʏ] and ä [ɛ], respectively). The primary umlaut rule is thus a unification rule which specifies these segments as -Back; separate rules fill in additional details for au and a.

Umlaut triggering is more complex. The triggers are particular suffixes: noun plurals in -er (e.g., Würmer ‘worms’), -e (Nüsse ‘nuts’), and zero (Mütter ‘mothers’), the diminutive -chen (Häuschen ‘little house’), comparatives and superlatives of adjectives (größerer ‘bigger’, am größten ‘biggest’), and 2nd/3rd singular present indicative (du fängst ‘you catch’, er fängt ‘s/he catches’), and a few others.  These suffixes have nothing in common morphosyntactically, and exclude related suffixes like noun plural -(e)n or diminutive -lein. And crucially, the triggering suffixes have no common segments on the surface. It is true that many of these suffixes once contained an *i, but many others never did, and Janda (1998) argues umlaut triggering had a morphemic characteristic in even the earliest written German. LP could of course posit these suffixes contain /i/-triggers which never surface—such a grammar is computable, and Bach & King (1970) try to make a proposal of this form work—but Gorman & Reiss (2025) suggest that such analyses are not considered by the language acquisition device (LAD).4  Thus we must admit the possibility that umlaut is triggered by specific morphemes, in line with Kisseberth’s ±R Trigger.

A counterexample to the first conjecture would involve some case where targeting must be a morphemic property—what such an example would look like, I don’t know—and a counterexample to the second conjecture would involve an argument that all apparent morphemic triggering is in fact computed within the narrow phonology.

Endnotes

  1.  One might imagine that some of these specifications are filled in by redundancy rules. For example, if R is productive (however that’s encoded…), maybe +R target and +R trigger are defaults but the opposite is true if a morpheme lacks the phonological or morphosyntactic properties needed to target and/or trigger R respectively. But Kisseberth doesn’t discuss this matter.
  2. In contrast, when R is a segment deletion rule, a segment targeted by R is fully-specified for reasons we discuss in Gorman & Reiss in press a.
  3. Of course, LP also assumes that children are epistemically bound to provide a narrow phonological analysis (like the IAP pattern), so this does not require further motivation.
  4. Gorman & Reiss (2025) specifically propose a LAD principle no wandering targets; to rule out the /i/-deletion analysis, one would want to generalize that principle from targets to triggers. I see no obstacles to doing so.

References

Bach, E. and King, R. D. 1970. Umlaut in Modern German. Glossa 4:3-21.
Gorman, K. and Reiss, C. 2025. How not to acquire exchange rules in Logical Phonology.
In Proceedings of the 2025 annual conference of the Canadian Linguistic Association.
Gorman, K. and Reiss, C. In press a. Natural class reasoning in segment deletion rules. Paper presented at the 56th annual meeting of the North East Linguistic Society, to appear in the proceeedings.
Gorman, K. and Reiss, C. In press b. Metaphony in Substance-Free Logical Phonology. Phonology to appear.
Inkelas, S. and Orgun, C. O. 1995. Level ordering and economy in the lexical phonology of Turkish. Language 71: 763-793.
Janda, R. D. 1998. German umlaut: Morpholexical all the way down from OHG to NHG (Two Stützepunkte for Romance metaphony). Rivista di Linguistica 10: 1563-232.
Kisseberth, C. W. 1970. The treatment of exceptions. Papers in Linguistics 2: 44-58.
Rubach, J. 2013. Exceptional segments in Polish. Natural Language & Linguistic Theory 31: 1139-1163.