On “alternative” grammar formalisms

A common suggestion to graduate students in linguistics (computational or otherwise) is to study “alternative” grammar formalisms [not my term-KBG]. The implication is that the student is only familiar with formal grammars inspired by the supposedly-hegemonic generativist tradition—though it is not clear if we’re talking about the GB-lite of Penn Treebank, the minimalist grammars (MGs) of Ed Stabler, or perhaps something else—and that the set of “alternatives” includes lexical-functional grammars (LFGs), tree-adjoining grammars (TAGs), combinatory categorial grammars (CCGs), head-driven phrase structure grammar (HPSG), or one of the various forms of construction grammar. I would never say that students should study less rather than more, but I am not convinced this diversity of formalism is key to training well-rounded students. TAGs and CCGs are known to be strongly equivalent (Schiffer & Maletti 2021), and the major unification-based grammar systems (which includes CCGs and HPSGs, and formal forms of construction grammars too) are equivalent to MGs. I speculate that maybe we should be emphasizing similarities rather than differences insofar as those differences are not represented in relative generative capacity.

Another useful way to determine the relative utility of alternative formalisms is to look at their actual use in wide-coverage computational grammars, since as Chomsky (1981: 6) says, it is possible to put systems to the test “only to the extent that we have grammatical descriptions that are reasonably compelling in some domain…”. Or put another way, grammar frameworks both hegemonic and alternative can be assessed for coverage (which can be extensive, in some languages and domains) or general utility rather than for the often-spicy rhetoric of their proponents.

Finally, it is at least possible that some alternative frameworks are simply losers of a multi-agent coordination game and at least some consolidation is desirable.

References

Chomsky, N. 1981. Lectures in Government and Binding. Foris.
Schiffer, L. K. and Maletti, A. 2021. Strong equivalence of TAG and CCG. Transactions of the Association for Computational Linguistics 9: 707-720.

Academic reviewing in NLP

It is obvious to me that NLP researchers are, on average, submitting manuscripts far earlier and more often than they ought to. The average manuscript I review is typo-laden, full of figures and tables far too small to actually read or intruding on the margins, with an unusable bibliography that the authors have clearly never inspected. Sometimes I receive manuscripts whose actual titles are transparently ungrammatical.

There are several reasons this is bad, but most of all it is a waste of reviewer time, since the reviewers have to point out (in triplicate or worse) minor issues that would have been flagged by proof-readers, advisors, or colleagues, were they involved before submission. Then, once these issues are corrected, the reviewers are again asked to read the paper and confirm they have been addressed. This is work the authors could have done, but which instead is pushed onto committees of unpaid volunteers.

The second issue is that the reviewer pool lacks relevant experience. I am regularly tasked with “meta-reviewing”, or critically summarizing the reviews. This is necessary in part because many, perhaps a majority, of the reviewers simply do not know how to review an academic paper, having not received instruction on this topic from their advisors or mentors, and their comments need to be recast in language that can be quickly understood by conference program committees.

[Moving from general to specific.]

I have recently been asked to review an uncommonly large collection of papers on the topic of prompt engineering. Several years ago, it became apparent that neural network language models, trained on enormous amounts of text data, could often provide locally coherent (though rarely globally coherent) responses to prompts or queries. The parade example of this type of model is GPT-2. For instance, if the prompt was:

Malfoy hadn’t noticed anything.

the model might continue:

“In that case,” said Harry, after thinking it over, “I suggest you return to the library.”

I assume this is because there’s fan fiction in the corpus, but I don’t really know. Now it goes without saying that at no point will, Facebook, say, launch a product in which a gigantic neural network is allowed to regurgitate Harry Potter fan fiction (!) at their users. However, researchers persist for some reason (perhaps novelty) to try to “engineer” clever prompts that produce subjectively “good” responses, rather than attempting to understand how any of this works. (It is not an overstatement to say that we have little idea why neural networks, and the methods we use to train them in particular, work at all.) What am I to do when asked to meta-review papers like this? I try to remain collegial, but I’m not sure this kind of work ought to exist at all. I consider GPT-2 a billionaire plaything, a rather wasteful one at that, and it is hard for me to see how this line of work might make the world a better place.

Is linguistics “unusually vituperative”?

The picture of linguistics one can get from books like The Linguistics Wars (Harris 1993) and press coverage of l’affaire du Pirahã suggests it is a quite nasty sort of field, full of hate and invective. Is linguistics really, as an engineer colleague would have it, “unusually vituperative”?

In my opinion it is not, for I object to the modifier unusually. Indeed, while such stories rarely make the nightly news, the sciences have never been without a heft dose of vituperation. For instance, anthropologist Napoleon Chagnon was accused, slanderously and at book length, of causing a measles epidemic among indigenous peoples of the Amazon. And entomologist E.O. Wilson had a pitcher of water poured on his head at a lecture because, according to a lone audience member, his research on ants implied support for eugenics. And even gentleman Darwin was not above keeping an ill-tempered bulldog.

References

Harris, R. A. 1993. The Linguistics Wars: Chomsky, Lakoff, and the Battle over Deep Structure. Oxford University Press. [I don’t recommend this book: Harris, instead of explaining the issues at stake, focuses on “horse race” coverage, quoting extensively from interviews with America’s grumpiest octogenarians.]

The 24th century Universal Translator is unsupervised and requires minimal resources

The Star Trek: Deep Space Nine episode “Sanctuary” pretty clearly establishes that by the 24th century, the Star Trek universe’s Universal Translator works in an unsupervised fashion and requires only a (what we in the real 21st century would consider) minimal monolingual corpus and a few hours of processing to translate Skrreean, a language new to Starfleet and friends. Free paper idea: how does the Universal Translator’s capabilities (in the 22nd through the 24th century, from Enterprise to the original series to the 24th century shows) map onto known terms of art in machine translation in our universe?

On being scooped

Some of my colleagues have over the years expressed concern their ongoing projects are in danger of being “scooped”, and as a result, they need to work rapidly to disseminate the projects in question. This concern is particularly prominent among the fast-moving (and unusually cargo-cultish) natural language processing community, though I have occasionally heard similar concerns in the company of theoretical linguists. Assuming this is not merely hysteria caused by material conditions like casualization and pandemic-related isolation, there is a simple solution: work on something else, something you yourself deem to be less obvious. If you’re in danger of being scooped, it suggest that you’re taking obvious next steps—you’re engaging in what Kuhn calls normal science—that you lack a competitive advantage (such as rare-on-the-ground expertise, special knowledge, proprietary or unreleased data, etc.) that would help you in particular advance the state of knowledge. If you find yourself in this predicament, you should consider allowing somebody else to carry the football across the goal-line. Or don’t, but then you might just get scooped after all.

How to write linguistic examples

There is a standard, well-designed way in which linguists write examples, and failure to use it in a paper about language is a strong shibboleth suggesting unfamiliarity with linguistics as a field. In brief, it is as follows:

  • When an example (affix, word, phrase, or sentence) appears in the body (i.e., the middle of a sentence):
    • if written in Roman, it should be italicized.
    • if written in non-Roman, but alphabetic scripts like Cyrillic, italicization is optional. (Cyrillic italics are, like the Russian cursive hand they’re based on, famously hard for Western amateurs like myself to read.)
    • if written in a non-alphabetic script, it can just be written as is, though you’re welcome to experiment.
    • Examples should never be underlined, bolded, or placed in single or double quotes, regardless of the script used.
  • When an example is set off from the body (i.e., as a numbered example or in a table), it need not be italicized.
  • Any non-English example should be immediately followed with a gloss.
    • A gloss should always be single-quoted.
    • Don’t intersperse words like “meaning”, as in “…kitab meaning ‘book’…”, just write “…kitab ‘book’…”
  • If using morph-by-morph or word-by-word glossing, follow the Leipzig glossing conventions.

How to write numbers

A lot of students—and increasingly, given how young the field of NLP is—don’t know how to write numbers in papers. Here are a few basic principles (some of these are loosely based off the APA guidelines):

  • Use the same number of decimals every time and don’t omit trailing zeros after the decimal. Thus “.50” or “.5000” and not “.5”.
  • Round to a small number of decimals: 2, 4, or 6 are all standard choices.
  • Omit leading zeros before the decimal if possible values of whatever quantity are always within [0, 1], thus you might say you got “.9823” accuracy.
  • (For LaTeX users) put the minus sign in math mode, too, or it’ll appear as a hyphen (ASCII char 45), which is quite a bit shorter and just looks wrong.
  • Use commas to separate the hundreds and thousands place (etc.) in large integers, and try not to use too many large exact integers; rounding is fine once they get large.
  • Expressions like “3k”, “1.3m” and “2b” are too informal; just write “3,000”, “1.3 million”, and “2 billion”.
  • Many evaluation metrics can either be written as (pseudo-)probabilities or percentages. Pick one or the other format and stick with it.

A few other points about tables with numbers (looking at you LaTeX users):

  • Right-align numbers in tables.
  • Don’t put two numbers (like mean and standard deviation or a range) in a single cell; the alignment will be all wrong. Just use more cells and tweak the intercolumnar spacing. 
  • Don’t make the text of your tables smaller than the body text, which makes the table hard to read. Just redesign the table instead.

Moneyball Linguistics

[This is just a fun thought experiment. Please don’t get mad.]

The other day I had an intrusive thought: the phrase moneyball linguistics. Of course, as soon as I had a moment to myself, I had to sit down and think what this might denote. At first I imagined building out a linguistics program on a small budget like Billy Beane and the Oakland A’s. But it seems to me that linguistics departments aren’t really much like baseball teams—they’re only vaguely competitive (occasionally for graduate students or junior faculty), there’s no imperative to balance the roster, there’s no DL list (or is just sabbatical?), and so on—and the metaphor sort of breaks down. But the ideas of Beane and co. do seem to have some relevance to talking about individual linguists and labs. I don’t have OBP or slugging percentage for linguists, and I wouldn’t dare to propose anything so crude, but I think we can talk about linguists and their research as a sort of “cost center” and identify two major types of “costs” for the working linguist:

  1. cash (money, dough, moolah, chedda, cheese, skrilla, C.R.E.A.M., green), and
  2. carbon (…dioxide emissions).

I think it is a perfectly fine scientific approximation (not unlike competence vs. performance) to treat the linguistic universe as having a fixed amount of cash and carbon, so that we could use this thinking to build out a roster-department and come in just under the pay cap. While state research budgets do fluctuate—and while our imaginings of a better world should also include more science funding—it is hard to imagine near-term political change in the West would substantially increase it. And similarly, while there is roughly 1012 kg of carbon in the earth’s crust, climate scientists agree that the vast majority of it really ought to stay there. Finally, I should note that maybe we shouldn’t treat these as independent factors, given that there is a non-trivial amount of linguistics funding via petrodollars. But anyways, without further ado, let’s talk about some types of researchers and how they score on the cash-and-carbon rubric.

  • Armchair research: The armchairist is clearly both low-cash (if you don’t count the sports coats) and low-carbon (if you don’t count the pipe smoke).
  • Field work: “The field” could be anywhere, even the reasonably affordable, accessible, and often charming Queens, the archetypical fieldworker is flying in, first on a jet and then maybe reaches their destination via helicopter or seaplane. Once you’re there though, life in the field is often reasonably affordable, so this scores as low-cash, high-carbon.
  • Experimental psycholinguistics: Experimental psycholinguists have reasonably high capital/startup costs (in the form of eyetracking devices, for instance) and steady marginal costs for running subjects: the subjects themselves may come from the Psych 101 pool but somebody’s gotta be paid to consent them and run them through the task. We’ll call this medium-cash, low-carbon.
  • Neurolinguistics: The neurolinguistic imaging technique du jour, magnetoencephalography (or MEG), requires superconducting coils cooled to a chilly 4.2 K (roughly −452 °F); this in turn is accomplished with liquid helium. Not only is the cooling system expensive and power-hungry, the helium is mostly wasted (i.e., vented to the atmosphere). Helium is itself the second-most common element out there, but we are quite literally running out of the stuff here on Earth. So, MEG, at least, is high-cash, high-carbon.
  • Computational linguistics: there was a time not so long ago when I would said that computational linguists were a bunch of hacky-sackers filling up legal pads with Greek letters (the weirder the better) and typing some kind of line noise they call “Haskell” into ten-year-old Thinkpads. But nowadays, deep learning is the order of the day, and the substantial carbon impact from these methods are well-documented, or at least well-estimated (e.g., Strubell et al. 2019). Now, it probably should be noted that a lot of the worst offenders (BigCos and the Quebecois) locate their data centers near sources of plentiful hydroelectric power, but not all of us live within the efficient transmission zones for hydropower. And of course, graphics processing units are expensive too. So most computational linguistics is, increasingly, high-cash, high-carbon.

On a more serious note, just so you know, unless you run an MEG lab or are working on something called “GPT-G6”, chances are your biggest carbon contributions are the meat you eat, the cars you drive, and the short-haul jet flights you take, not other externalities of your research.

References

Strubell, M., Ganesh, A. and McCallum, A. 2019. Energy and policy considerations for deep learning in NLP. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3645-3650.