Action, not ritual

It is achingly apparent that an overwhelming amount of research in speech and language technologies considers exactly one human language: English. This is done so unthinkingly that some researchers seem to see the use of English data (and only English) as obvious, so obvious as to require no comment. This is unfortunate in part because English is, typologically speaking, a bit of an outlier. For instance, it has uncommonly impoverished inflectional morphology, a particularly rigid word order, and rather large vowel inventory. It is not hard to imagine how lessons learned designing for—or evaluating on—English data might not generalize to the rest of the world’s languages. In an influential paper, Bender (2009) encourages researchers to be more explicit about the languages studied, and this, framed as an imperative, is has come to be called the Bender Rule.

This “rule”, and the aforementioned observations underlying it, have taken on an almost mythical interpretation. They can easily be seen as a ritual granting the authors a dispensation to continue their monolingual English research. But this is a mistake. English hegemony is not merely bad science, nor is it a mere scientific inconvenience—a threat to validity.

It is no accident of history that the scientific world is in some sense an English colony. Perhaps you live in a country that owes an enormous debt to a foreign bank, and the bankers are demanding cuts to social services or reduction of tariffs: then there’s an excellent chance the bankers’ first language is English and that your first language is something else. Or maybe, fleeing the chaos of austerity and intervention, you find yourself and your children in cages in a foreign land: chances are you in Yankee hands. And, it is no accident that the first large-scale treebank is a corpus of English rather than of Delaware or Nahuatl or Powhatan or even Spanish, nor that the entire boondoggle was paid for by the largest military apparatus the world has ever known.

Such material facts respond to just one thing: concrete actions. Rituals, indulgences, or dispensations will not do. We must not confuse the act of perceiving and naming the hegemon with the far more challenging act of actually combating it. It is tempting to see the material conditions dualistically, as a sin we can never fully cleanse ourselves of. But they are the past and a more equitable world is only to be found in the future, a future of our own creation. It is imperative that we—as a community of scientists—take  steps to build the future we want.

References

Bender, Emily M. 2009. Linguistically naïve != language independent: why NLP needs linguistic typology. In EACL Workshop on the Interaction Between Linguistics and Computational Linguistics, pages 26-32.

Is formal phonology in trouble?

I recently attended the 50th meeting of the North East Linguistics Society (NELS), which is not much of a society as a prestigious generative linguistics conference. In recognition of the golden jubilee, Paul Kiparsky gave a keynote in which he managed to reconstruct nearly all of the NELS 1 schedule, complete with at least one handout, from a talk by Anthony Kroch and Howard Lasnik. Back then, apparently, handouts were just examples: no prose.

In his talk, Paul showed a graph showing that phonology accounts for an increasingly small number of paper at NELS, and in fact the gap has actually gotten worse over the last few decades. Paul proposed something of an explanation: that the introduction of Optimality Theory (OT) and its rejection of “derivational” explanations has forever introduced a schism between phonology and other subareas, and that syntacticians and semanticists are simply uncomfortable with the non-derivational nature of modern phonological theorizing.

With all due respect, I do not find this explanation probable. As he admits, most OT theorizing (including his own) now actually rejects the earlier rejection of derivational explanations. And on the other hand, modern syntactic theories are a heady brew of derivational (phases, copy theory, etc.) and non-derivational (move α, uninterpretable feature matching, etc.) thinking. And finally it’s not really clear why the aesthetic preferences of syntacticians (if that’s all they are) should produce the data, i.e., fewer phonology papers at NELS.

But I do agree that OT is the elephant in the room, responsible for an enormous amount of fragmentation in phonological theorizing.

I would liken Prince & Smolensky’s “founding document” (1993) to Martin Luther’s Ninety-five Theses. Scholars believe that Luther wished to start a scholarly theological debate rather than a popular revolution, and I suspect the founders of OT were similarly surprised with the enormous impact their proposal had on the field. Luther’s magnificient heresy may have failed to move the Church in the directions he wished, but he is the father of hundreds if not thousands of Protestant sects, each with their own new and vibrant “heresies”. The founders of OT, I think, are similarly unable to put the cat back into the bag (if they wish to at all).

In my opinion, OT’s early rejection of derivationalism has been an enormous empirical failure, and the full-blown functionalistic-externalist thinking—one of the first post-OT heresies (let’s liken it to Calvinism)—is, in my opinion, ontologically incoherent. That said, I would encourage OT believers to try more theory-comparison. The article on “Christian denominations” in Diderot’s & d’Alembert’s Encyclopédie begins with the obviously insincere suggestion that someone ought to study which of the various Protestant sects is most likely to lead to salvation. But I would sincerely love to find out which variant of OT is in fact most optimal.

[Thanks to Charles Reiss for discussion.]