RoboCop

I like a lot of different types of films, but my favorite are the subtextually rich, nuance-light action/science fiction films of the late 1970s, 1980s, and early 1990s, made by directors like Cameron, Carpenter, Cronenberg, McTiernan, Scott, and Verhoeven. Perhaps the most prescient of all of these is RoboCop (1984). The film’s feel is set by over-the-top comic sex and violence and silly diagetic TV clips. In less deft hands, it could easily have become the sort of campy farce best described (or perhaps, denigrated) as a “cult classic”. (This usually means a film is just bad.) But Verhoeven wields sex and violence like a master wields a paintbrush. (I take this to be a sort of self-critique of his childhood aesthetic appreciation of the violence he saw as a boy growing up in Nazi-occupied Holland, not far from the V-2 launch sites.) The film is thematically rich, so much so that one can easily forgive Verhoeven’s apparent decision to leave out (in what is probably the most “dated” element of the film) any overt criticism of policing as an institution. It is ruthlessly critical of what we’d now call neoliberalism, of corporatism, and has much to say about the nature of the self. The theme that strikes me as most prescient is how the film hinges on the very modern realization that, to a striking degree, what we call “AI” is fundamentally just “other people”, alienated and dehumanized by contractual labor relations. Verhoeven could somehow see this coming decades before anything that could reasonably be called AI.

1-on-1 Zoom

If you’re just doing a “meeting” with one other person located in the same country, I don’t see the point of using Zoom. Ordinary phone lines are more reliable and have more familiar acoustic qualities (this is why VoIP sounds worse: unless you’re quite young, you’re probably far more familiar with the 8kHz sampling rate and whatever compression curve the phone system uses). Just call people on the phone!

ACL Workshop on Computation and Written Language

The first ACL Workshop on Computation and Written Language (CAWL) will be held in conjunction with ACL 2023 in Toronto, Canada, on July 13th or 14th 2023 (TBD). It will feature invited talks by Mark Aronoff (Stony Brook University) and Amalia Gnanadesikan (University of Maryland, College Park). We welcome submissions of scientific papers to be presented at the conference and archived in the ACL Anthology. Information on submission and format will be posted at https://cawl.wellformedness.com shortly.

Generalized capitalist realism

One of the most memorable books I’ve read over the last decade or so is Mark Fisher’s Capitalist Realism: Is There No Alternative? (2009). The book is a slim, 81-page pamphlet describing the feeling that “not only is capitalism the only viable political and economic system, but also that it is now impossible even to imagine a coherent alternative to it.” As Fisher explains, a lot of ideological work is done to prevent us from imagining alternatives, including the increasingly capitalist sheen of anti-capitalism, and there are a few areas—the overall non-response to climate change and biosphere-scale threats, for example—where capitalist realism ideology has failed to co-opt dissent, suggesting at least the possibility of an alternative on the horizon, even if Fisher himself does not imagine or present one.

A very clear example of capitalist realism can be found in the ethical altruism (EA) movement, which focuses on getting charity to the less well-off via existing capitalist structures. Singer (2015), the moment’s resident philosopher, justifies this by setting the probability of a viable alternative to capitalism surfacing in any reasonable time frame to be zero. Therefore the most good one can do is to ruthlessly accumulate wealth in the metropole and then give it away where it is most needed. Any synergies between the wealth of the first world and the dire economic conditions in the third world simply have to set aside.

Fisher’s term capitalist realism is a sort of pun on socialist realism, a term for idealized, realistic, literal art from 20th century socialist countries. His use of the term realism is (deliberately, I think) ironic, since both capitalist and socialist realism apply firm ideological filters to the real world. The continental philosophy stuff that this ultimately gets down to is a bit above my pay grade, but I think we can generalize the basic idea: X realism is an ideology that posits and enforces the hypothesis that there is no alternative to X.

If one is willing to go along with this, we can easily talk about, for instance, neural realism, which posits that there is simply no alternative to neural networks for machine learning. You can see this for instance in the debate between “deep learning fundamentalists” like LeCun and the rigor police like Rahimi (see Sproat 2022 for an entertaining discussion): LeCun does seem believe there to be no alternative to employing methods we do not understand with the scientific rigor that Rahimi demands, when it seems obvious that these technologies remain a small part of the overall productive economy. An even clearer example is the term foundation model, which has the fairly obvious connotation that they are crucial to the future of AI. Foundation model realism would also necesarily posit that there is no alternative and discard any disconfirming observation.

References

Fisher, M. 2009. Capitalist Realism: Is There No Alternative? Zero Books.
Singer, P. 2015. The Most Good You Can Do. Yale University Press.
Sproat, R. 2022. Boring problems are sometimes the most interesting. Computational Linguistics 48(2): 483-490.

Codon math

It well-known that there are twenty “proteinogenic” amino acids—those capable of creating proteins—in eukaryotes (i.e., lifeforms with nucleated cells). When biologists first began to realize that DNA synthesizes RNA, which synthesizes amino acids, it was not yet known how many DNA bases (the vocabulary being A, T, C, and G) were required to code an animo acid. It turns out the answer is three: each codon is a base triple, each corresponding to an amino acid. However, one might have deduced that answer ahead of time using some basic algebra, as did Soviet-American polymath George Gamow. Given that one needs at least 20 aminos (and admitting that some redundancy is not impossible), it should be clear that pairs of bases will not suffice to uniquely identify the different animos: 4² = 16, which is less than 20 (+ some epsilon). However, triples will more than suffice: 4³ = 64. This holds assuming that the codons are interpreted consistently independently of their context (as Gamow correctly deduced) and whether or not the triplets are interpreted as overlapping or not (Gamow incorrectly guessed that they overlapped, so that a six-base sequence contains four triplet codons; in fact it contains no more than two).

All of this is a long way to link back to the idea of counting entities in phonology. It seems to me we can ask just how many features might be necessary to mark all the distinctions needed. At the same time, Matamoros & Reiss (2016), for instance, following some broader work by Gallistel & King (2009), take it as desirable that a cognitive theory involve a small number of initial entities that give rise to a combinatoric explosion that, at the etic level, is “essentially infinite”. Surely similar thinking can be applied throughout linguistics.

References

Gallistel, C. R., and King, A. P.. 2009. Memory and the Computational
Brain: Why Cognitive Science Will Transform Neuroscience. Wiley-Blackwell.
Matamoros, C. and Reiss, C. 2016. Symbol taxonomy in biophonology. In A. M. Di Sciullo (ed.), Biolinguistic Investigations on the Language Faculty, pages 41-54. John Benjmanins Publishing Company.

Foundation models

It is widely admitted that the use of language in terms like formal language and language model tend to mislead neophytes, since they suggest the common-sense notion (roughly, e-language) rather than the narrow technical sense referring to a set of strings. Scholars at Stanford have been trying to push foundation model as an alternative to what were previously called large language models. But I don’t really like the implication—which I take to be quite salient—that such models ought to serve as the foundation for NLP, AI, whatever. I use large language models in my research, but not that often, and I actually don’t think they have to be part of every practitioner’s toolkit. I can’t help thinking that Stanford is trying to “make fetch happen”.

Stress transcription

I have recently encountered several published works which claim to use IPA-style transcriptions, but mark stress immediately before or after the vowel. This is wrong. The IPA guidelines clearly state that stress is marked at the start of the syllable; it thus acts as an indication of syllable boundary. The more you know…

Is NLP stuck?

I can’t help but feel that NLP is once again stuck.

From about 2011 to 2019, I can identify a huge step forward just about every year. But the last thing that truly excited me is BERT, which came out in 2018 and was published in 2019. For those not in the know, the idea of BERT is to pre-train a gigantic language model, with either monolingual or multilingual data. The major pre-training task is masked language model prediction: we pretend some small percentage (usualyl 15%) of the words in a sentence are obscured by noise and try to predict what they were. Ancillary tasks like predicting whether two sentences are adjacent or not (or if they were, what was their order) are also used, but appear to be non-essential. Pre-training (done a single time, at some expense, at BigCo HQ), produces a contextual encoder, a model which can embed words and sentences in ways that are useful for many downstream tasks. But then one can also take this encoder and fine-tune it to some other downstream task (an instance of transfer learning). It turns out that the combination of task-general pre-training using free-to-cheap ordinary text data and a small amount of task-specific fine-tuning using labeled data results in substantial performance gains over what came before. The BERT creators gave away both software and the pre-trained parameters (which would be expensive for an individual or a small academic lab to reproduce on their own), and an entire ecosystem of sharing pre-trained model parameters has emerged. I see this toolkit-development ecosysytem as a sign of successful science.

From my limited perspective, very little has happened since then that is not just more BERTology—that is, exploiting BERT and similar models. The only alternative on the horizon, in the last 4 years now, are pre-trained large language models without the encoder component, of which the best known are the GPT family (now up to GPT-3). These models do one thing well: they take a text prompt and produce more text that seeminly responds to the prompt. However, whereas BERT and family are free to reuse, GPT-3’s parameters and software are both closed source and can only be accessed at scale by paying a licensing fee to Microsoft. That itself is a substantial regression compared to BERT. More importantly, though, the GPT family are far less expressive tools than BERT, since they don’t really support fine-tuning. (More precisely, I don’t see any difficult technical barriers to fine-tuning GPT-style models; it’s just not supported.) Thus they can be only really used for one thing: zero-shot text generation tasks, in which the task is “explained” to the model in the input prompt, and the output is also textual. Were it possible to simply write out, in plain English, what you want, and then get the output in a sensible text format, this of course would be revolutionary, but that’s not the case. Rather, GPT has spawned a cottage industry of prompt engineering. A prompt engineer, roughly, is someone who specializes in crafting prompts. It is of course impressive that this can be done at all, but just because an orangutan can be taught to make an adequate omelette doesn’t mean I am going to pay one to make breakfast. I simply don’t see how any of this represents an improvement over the BERT ecosystem, which at least has an easy-to-use free and open-source ecosystem. And as you might expect, GPT’s zero-shot approach is quite often much worse than what one would obtain using the light supervision of the BERT-style fine-tuning approach.

Phonological nihilism

One might argue that phonology is in something of a crisis period. Phonology seems to be going through early stages of grief for what I see as the failure of teleological, substance-rich, constraint-based, parallel-evaluation approaches to make headway, but the next paradigm shift is yet to become clear to us. I personally think that logical, substance-free, serialist approaches ought to represent our next i-phonology paradigm, with “evolutionary”-historical thinking providing the e-language context, but I may be wrong and altogether different paradigm may be waiting in the wing. The thing that troubles me is that phonologists from these still-dominant constraint-based traditions seem to have less and less faith in the tenets of their theories, and in the worst case this expresses itself as a sort of nihilism. I discern two forms of this nihilism. The first is the phonologist who thinks we’re doing “word sudoku”, playing games of minimal description that produce generalizations without a shred of cognitive support. The second is the phonologist who thinks that everything is memorized, so that the actual domain of phonological generalization are just Psych 101 subject pool nonce word experiments. My pitch to both types of nihilists is the same: if you truly believe this, you ought to spend more time at the beach and less in the classroom, and save some space in the discourse for those of us who believe in something.

On the past tense debate; Part 3: the overestimation of overirregularization

One final, and still unresolved, issue in the past tense debate is the role of so-called overirregularization errors.

It is well-known that children acquiring English tend to overregularize irregular verbs; that is, they apply the regular -d suffix to verbs which in adult English form irregular pasts, producing, e.g., *thinked for thought. Maratsos (2000) estimates that children acquiring English very frequently overregularize irregular verbs; for instance, Abe, recorded roughly 45 minutes a week from ages 2;5 to 5;2, overregularizes rare irregular verbs as much as 58% of the time, and even the most frequent irregular verbs are overregularized 18% of the time. Abe appears to have been exceptional in that he had a very large receptive vocabulary for his age (as measured by the Peabody Picture Vocabulary Test), giving him more opportunities (and perhaps more grammatical motivation) for overregularization,¹ but Maratsos estimates that less-precocious children have lower but overall similar rates of overregularization.

In contrast, it is generally agreed that overirregularization, or the application of irregular patterns (e.g., in English, of ablaut, shortening, etc.) are quite a bit rarer. The only serious attempt to count overirregularizations is by Xu & Pinker (1995; henceforth XP). They estimate that children produce such errors no more than 0.2% of the time, which would make overirregularizations roughly two orders of magnitude rarer than overregularizations. This is a substantial difference. If anything, I think that XP overestimate overirregularizations. For instance, XP count brang as an overirregularization, even though this form does exist quite robustly in adult English (though it is somewhat stigmatized). Furthermore, XP count *slep for *slept as an overirregularization, though this is probably just ordinary (td)-deletion, a variable rule that is attested already in early childhood (Payne 1980). But by any account, overirregularization is extremely rare. The same is found in nonce word elicitation experiments such as those conducted by Berko (1958): both children and adults are loath to generate irregular past tenses for nonce verbs.²

This is a problem for most existing computational models. Nearly all of them—Albright & Hayes’ (2003) rule-based model (see their §4.5.3), O’Donnell’s (2015) rules-plus-storage system, and all analogical models and neural networks I am aware of—not only overregularize, like children do, but also overirregularize at rates far exceeding what children do. I submit that any computational model which produces substantial overirregularization is simply on the wrong track.

Endnotes

It is amusing to note that Abe is now, apparently, a trial lawyer and partner at a white-shoe law firm.
As I mentioned in a previous post, this is somewhat obscured by ratings tasks, but that’s further evidence we should disregard such tasks.

References

Albright, A. and Hayes, B. 2003. Rules vs. analogy in English past tenses: a computational/experimental study. Cognition 90(2): 119-161.
Berko, J. 1958. The child’s learning of English morphology. Word 14: 150-177.
Maratsos, M. 2000. More overregularizations after all: new data and discussion on Marcus, Pinker, Ullman, Hollander, Rosen & Xu. Journal of Child Language 27: 183-212.
O’Donnell, T. 2015. Productivity and Reuse in Language: a Theory of Linguistic Computation and Storage. MIT Press.
Payne, A. 1980. Factors controlling the acquisition of the Philadelphia dialect by out-of-state children. In W. Labov (ed.), Locating Language in Time and Space, pages 143-178. Academic Press.
Xu, F. and Pinker, S. 1995. Weird past tense forms. Journal of Child Language 22(3): 531-556.