Robot autopsies

I don’t really understand the exhuberance for studying whether neural networks know syntax. I have a lot to say about this issue—I’ll return to it later—but for today I’d like to briefly discuss this passage from a recent(ish) paper by Baroni (2022). The author expresses great surprise that few formal linguists have cited a particular paper (Linzen et al. 2016) about the ability of neural networks to learn long-distance agreement phenomena. (To be fair, Baroni is not a coauthor of said paper.) He then continues:

While it is possible that deep nets are relying on a completely different approach to language processing than the one encoded in human linguistic competence, theoretical linguists should investigate what are the building blocks making these systems so effective: if not for other reasons, at least in order to explain why a model that is supposedly encoding completely different priors than those programmed into the human brain should be so good at handling tasks, such as translating from a language into another, that should presuppose sophisticated linguistic knowledge. (Baroni 2022: 11).

I think this passage is a useful stepping-off point for what I think. I want to be clear: I am not “picking on” Baroni, who is probably far more senior to and certainly better known than me anyways; this is just a particularly clearly written claim, and I just happen to disagree.

Baroni says it is “possible that deep nets are relying on a completely different approach to language processing…” than humans; I’d say it’s basically certain that they are. We simply have no reason to think they might be using similar mechanisms since humans and neural networks don’t contain any of the same ingredients. Any similarities will naturally be analogies, not homologies.

Without a strong reason to think neural models and humans share some kind of cognitive homologies, there is no reason for theoretical linguists to investigate them; as artifacts of human culture they are no more in the domain of study for theoretical linguists than zebra finches, carburetors, or the perihelion of Mercury. 

It is not even clear how one ought to poke into the neural black box. Complex networks are mostly resistent to the kind of proof-theoretic techniques that mathematical linguists (witness the Delaware school or even just work by, say, Tesar) actually rely on, and most of the results are both negative and of minimal applicability: for instance, we know that there always exists a single-layer network large enough to encode, with arbitrary precision, any function a multi-layer network encodes, but we have no way to figure out how big is big enough for a given function.

Probing and other interpretative approaches exist, but have not yet proved themselves, and it is not clear that theoretical linguists have the relevant skills to push things forward anyways. Quality assurance, and adversarial data generation, is not exactly a high-status job; how can Baroni demand Cinque or Rizzi (to choose two of Baroni’s well-known countrymen) to put down their chalk and start doing free or poorly-paid QA for Microsoft?

Why should theoretical linguists of all people be charged with doing robot autopsies when the creators of the very same robots are alive and well? Either it’s easy and they’re refusing to do the work, or—and I suspect this is the case—it’s actually far beyond our current capabilities and that’s why little progress is being made.

I for one am glad that, for the time being, most linguists still have a little more self-respect. 

References

Baroni, M. 2022. On the proper role of linguistically oriented deep net analysis in linguistic theorising. In S. Lappin (ed). Algebraic Structures in Natural Language, pages 1-16. Taylor & Francis.
Linzen, T., Dupoux, E., and Goldberg, Y. 2016. Assessing the ability of LSTMs to learn syntax-sensitive dependencies. Transactions of the Association for Computational Linguistics 4: 521-535.

Isaacson and Lewis

It’s amusing to me that Walter Isaacson and Michael Lewis—who happened to go to the same elite private high school in New Orleans, just a few years apart—are finally having their oeuvres as favorable stenographers for the rich and powerful reassessed more or less simultaneously. Isaacson clearly met his match with Elon Musk, a deeply incurious abuser who gave Isaacson quite minimal access; Lewis does seem to be one of a handful of people who actually believed in that ethical altruism nonsense Sam Bankman-Fried was  cooking up. Good riddance, I say.

Use the minus sign for feature specifications

LaTeX has a dizzying number of options for different types of horizontal dash. The following are available:

  • A single - is a short dash appropriate for hyphenated compounds (like encoder-decoder).
  • A single dash in math mode,$-$, is a longer minus sign
  • A double -- is a longer “en-dash” appropriate for numerical ranges (like 3-5).
  • A triple --- is a long “em-dash” appropriate for interjections (like this—no, I mean like that).

My plea to linguists is to actually use math mode and the minus sign when they are writing binary features. If you want to turn this into a simple macro, you can please the following in your preamble:

\newcommand{feature}[2]{\ensuremath{#1}\textsc{#2}}

and then write \feature{-}{Back} for nicely formatted feature specifications.

Note that this issue has an exact parallel in Word and other WYSIWYG setups: there the issue is as simple as selecting the Unicode minus sign (U+2212) from the inventory of special characters (or just googling “Unicode minus sign” and copying and pasting what you find). 

Trust me, I’m a linguist

Grice’s maxim of quantity requires that one give no more information than is strictly required. This is somestimes misunderstood as a firm constraint, but one intuition you may have—and which is nicely expressed by rational speech act theory—is that apparent violations of this maxim by an otherwise cooperative speaker may actually tell you that seemingly-irrelevant information is, in the mind of the speaker, of great relevance to the discourse.


I recently read two interviews in which the subject—crucially, not a working linguist—drew attention to their linguistics education.

The first is this excellent profile of Joss Sackler, a woman who married into the Sackler opioid fortune. To be fair, she does hold a PhD in Hispanic & Luso-Brazilian literatures & languages (dissertation here), which ought to qualify one, but her response to the Town & Country reporter asking some bad press is exactly the kind of non-sequitur a rational speaker ought not to make: “They’re going to regret fucking with a linguist. They already do.”

The second comes from interviews with Nicole Daedone, the co-founder of an organization (it’s hard to describe, just read about it if you’re interested) called OneTaste. I’ve now read several profiles of her (the first was in the New York Times, I think, years ago, but I can’t find it anymore), and in each they mention that she studied linguistics in San  Francisco; one source says she has a bachelor’s degree in “gender communications and semantics” from San Francisco State University, another says she was at some point working on a linguistics PhD there. The relevance was again unclear to me, but later I read the very interesting book Future Sex, which also profiles her. There, there is a brief discussion of the lexical semantics of pussy; Daedone, who is (it’s complicated) a sex educator, proposes that it fills a lexical lacuna, by providing a single term that refers to the human vulva and vagina as a whole.


This all makes me wonder whether to the general public, linguistics really connotes brilliance, and, perhaps, tenacity. And it makes me wonder whether one could actually wield their linguistics education as a shield against criticisms having nothing to do with language per se.

Linguistics and prosociality

It is commonly said that linguistics as a discipline has enormous prosocial potential. What I actually suspect is that this potential is smaller than some linguists imagine. Linguistics is of course essential to the deep question of “what is human nature”, but we are up against our own epistemic bounds in answering these questions and the social impact of answering this question is not at all clear to me. Linguistics is also essential to the design of speech and language processing technologies (despite what you may have heard: don’t believe the hype), and while I find these technologies exciting, it remains to be seen whether they will be as societically transformative as investors think. And language documentation is transformative to some of society’s most marginalized. But I am generally skeptical of linguistics’ and linguists’ ability to combat societal biases more generally. While I don’t think any member of society should be considered well-educated until they’ve thought about the logical problems of language acquisition, considered the idea of language as something that exists in the mind rather than just in the ether, or confronted standard language ideologies, I have to question whether the broader discipline has been very effective here getting these messages out.

Online poisoning

One of my working theories for why natural language processing feels unusually contentious at present is, yes, social media. The outspoken researchers speak, more or less constantly, to a large social media audience, and use this forum as the primary way to form and disseminate opinions. For instance, there is a very strong correlation between being an “ACL thought leader”, if not an officer, and tweeting often and aggressively. People of my age understand the addictive and corrosive nature of presenting oneself for online kudos (and jeers), but some people of the older generations lack the appropriate internet literacy to use these tools in moderation, and some people of the younger generations lack the maturity to do the same. Such people have online poisoning. Side-effects include outing oneself as the subject of a subtweet and complaining to a student’s advisor. If you have any of these symptoms, please log off immediately and touch grass.

A prediction

You didn’t build that. – Barack Obama, July 13, 2012

Connectionism originates in psychology, but the “old connectionists” are mostly gone, having largely failed to pass on their ideology to their trainees, and there really aren’t many “young connectionists” to speak of. But, I predict that in the next few years we’ll see a bunch of psychologists of language—the ones who define themselves by their opposition to internalism, innateness, and generativism—become some of the biggest cheerleaders for large language models (LLMs). In fact, psychologists have not made substantial contributions to neural network modeling in many years. Virtually all the work on improving neural networks over the last few decades has been done by computer scientists who cared not a whit whether they had anything to do with human brains or cognitive plausibility.1 (Sometimes they’ll put things like “…inspired by the human brain…” in the press releases, but we all know that’s just fluff.) At this point, psychology as a discipline has no more claim to neural networks than the Irish do to Gaul, and in the rather unlikely case that LLMs do end up furnishing deep truths about cognition, psychology as a discipline will have failed us by not following up on a promising lead. I think it will be particularly revealing if psychologists who previously worshipped at the Church of Bayes suddenly lose all interest in mathematical rigor and find themselves praying to the great Black Box. I want to say it now: if this happens—and I am starting to see signs that it will—those people will be cynics, haters, and trolls, and you shouldn’t pay them any mind.

Endnotes

  1. I am also critical of machine learning pedagogy, and it is therefore interesting to see that those same computer scientists pushing things forward don’t seem to care much for machine learning as an academic discipline either.

Industry postdocs

I find the very idea of industry postdocs funny (funny-sad, though). Sure, it makes sense for the academy, with all of its scarcities, to make use of precarious, casualized post-graduate labor, but to extend this to the tech sector is vaguely monstrous. It’s extra funny (but funny-sad too) when you hear of a senior professor doing an industry postdoc at a company with a name like baz.ly during their sabbatical.

Neurolinguistic deprogramming

I venture to say most working linguists would reject—outright—strong versions of linguistic relativity and the Sapir-Whorf hypothesis, and would regard neuro-linguistic programming as pseudoscientific rubbish. This is of course in contrast to the general public: even the highly-educated take linguistic relativity as an obvious description of human life. Yet, it is not uncommon for the same linguists to endorse beliefs in the power of renaming that is hard to reconcile with the general disrepute of the vulgar Whorfian view the power of renaming assumes.

For instance, George Lakoff’s work on “framing” in politics argued that renaming social programs was the one weird trick needed to get Howard Dean into the White House. While this seems quaint in retrospect, his proposal was widely debated at the time. Pinker’s (sigh) takedown is necessary reading. The problem, of course, is that Lakoff ought to have provided, and ought to have been expected to provide, any evidence at all for a view of language widely regarded as untutored by his colleagues.

The case of renaming languages is a grayer one. I believe that one ought to call people what they want to be called, and that if stakeholders would prefer their language to be referred to as Tohono Oʼodham rather than Pápago, I am and will remain happy to oblige.1 If African American Vernacular English is renamed to African American Language (as seems to be increasing common in scholarship), I will gladly follow suit. But I can’t imagine how it could be the case that the renaming represents a reconceptualization of either the language itself, or a change in how we study it. Indeed, it would be strange for the name of any language to reflect any interesting property of said language. French by any other name would still have V-to-T movement and liaison.

It may be that these acts of renaming have power. Indeed, I hope they do. But I have to suspect the opposite: they’re the sort of fiddling one does when one is out power, when one is struggling to believe that a better world is possible. And if I’m wrong, who is better suited to show that than the trained linguist?

Endnotes

  1. Supposedly, the older name of the language comes from a pejorative used by a neighboring tribe, the Pima. Ba꞉bawĭkoʼa means, roughly ‘tepary bean eater’. The Spanish colonizers adapted this as Pápago. I feel like the gloss sounds like a cutting insult in English too, so I get why this exonym has fallen in disrepute.