Growing consensus

Any time I read a paper that begins, roughly, “there is a growing consensus that P“, there is not in fact, as far as I can tell, a growing consensus in support of P.

Optionality as acquirendum

A lot of work deals with the question of acquiring “optional” or “variable” grammatical rules, and my impression is that different communities are mostly talking at cross-purposes. I discern at least three ways linguists conceive of optionality as something which the child must acquire.

Some linguists assume—I think without much evidence—that optionality is mere “free variation”, so that the learner simply needs to infer which rules bear a binary [optional] feature. This is an old idea, going back to at least Dell (1981); Rasin et al. (2021:35) explicitly state the problem in this form.
Variationist sociolinguists focus on the differential rates at which grammatical rules apply. They generally recognize the acquirenda as essentially conditional probability distributions which give the probability of rule application in a given grammatical context. Bill Labov is a clear avatar of this strain of thinking (e.g., Labov 1989). David Adger and colleagues have attempted to situate this within modern syntactic frameworks (e.g., Adger 2006).
Some linguists believe that optionality is not statable within a single grammar, and must reflect the competing grammars. The major proponent of this approach is Anthony Kroch (e.g., Kroch 1989). While this conception might license some degree of “nihilism” about optionality, it also has led to some interesting work which hypothesizes interesting substantive constraints on grammar-internal constraints on variation as in the work of Laurel MacKenzie and colleagues (e.g., MacKenzie 2019). This work is also very good at ridding the (2) of some of its unfortunate “externalist” thinking.

I have to reject (1) as overly simplicistic. I find (2) and (3) both compelling in some way but a lot of work remains to synthesize or adjudicate between them.

References

Adger, D. 2006. Combinatorial variability. Journal of Linguistics 42(3): 503-530.
Dell, F. 1981. On the learnability of optional phonological rules. Linguistic Inquiry 12(1): 31-37.
Kroch, A. 1989. Reflexes of grammar in patterns of language change. Language Variation & Change 1(1): 199-244.
Labov, W. 1989. The child as linguistic historian. Language Variation & Change 1(1): 85-97.
MacKenzie, L. 2019. Perturbing the community grammar: Individual differences and community-level constraints on sociolinguistic variation. Glossa 4(1): 28.
Rasin, E., Berger, I., Lan, R., Shefi, I., and Katzir, R. 2021. Approaching explanatory adequacy in phonology using Minimum Description Length. Journal of Language Modelling 9(1): 17-66.

Kill yr darlings…

…or at least make them more rigorous. In the field of computational phonology, there were three mid-pandemic articles that presented elaborate computational “theories of everything” in phonology: Ellis et al. (2022), Rasin et al. (2021), and Yang & Piantadosi (2022).¹ I am quite critical of all three offerings. All three provide computational models evaluated for their ability to acquire phonological patterns—with varying amounts overheated rhetoric about what this means for generative grammar—and in each case, there is a utter lack of rigor. None of the papers prove, or even conjecture, anything hopeful or promising about the computational complexity of the proposed models, how long they take to converge (or if they do), or whether there is any bound on the kinds of mistakes the models might make once they converge. What they do instead is demonstrate that the models produce satisfactory results on toy problem sets. One might speculate that these three papers are the result of lockdown-era hyperfocus on thorny passion projects. But I think it’s unfortunate that the authors (and doubly so the reviewers and editors) considered these projects complete before providing formal characterization of the proposed models’ substantive properties.² By stating this critique here, I hopefully commit myself to align actions with my values in my future work, and I challenge the aforementioned authors to study these properties.

Endnotes

To be fair, Yang and Piantadosi claims to be a theory of not just phonology…
I am permitted to state that I reviewed one of these papers—my review was “signed” and made public, along with the paper—and my review was politely negative. However, it was clear to me that the editor and other reviewers had a very high opinion of this work and there was no reason for me to fight the inevitable.

References

Ellis, K., Albright, A., Solar-Lezama, A., Tenenbaum, J. B., and O’Donnell, T. J. 2022. Synthesizing theories of human language with bayesian program induction. Nature Communications 2022:1–13. Rasin, E., Berger, I., Lan, N., Shefi, I. and Katzir, R. 2021. Approaching explanatory adequacy in phonology using Minimum Description Length. Journal of Language Modelling 9:17–66. Yang, Y. and Piantadosi, S. T. 2022. One model for the learning of language. Proceedings of the National Academy of Sciences 119:e2021865119.

Yet more on the Pirahã debate

I just read a draft of Geoff Pullum’s paper on the Pirahã controversy, presented at a workshop of the recent LSA meeting.

It’s not a particularly interesting paper to me, since it has nothing to say about the conflicting data claims at the center of the controversy. No one has ever given an explanation of how one might integrate the evidence for clausal embedding in Everett 1986 (etc.) with the writings of Everett from 2005 onward. These two Everetts are in mortal conflict. Everett (1986), for example gives examples of embedded clauses, Everett (2005) denies that the language has clausal embedding, and Everett (2009), faced with the contradiction, has decided to gloss this same example (Nevins et al. 2009, ex. 13, reproduced from Everett 1986, ex. 232) as two sentences, with no argument provided for why earlier Everett was wrong. While one ought not to reason from one’s own limited imagination, it’s hard for me to fathom anything other than incompetence in 1987 or dishonesty 2005-present. Either way, it suggests that additional attention is probably needed on other specific claims about this language, such as the presence of rare phonetic elements (Everett 1988a) and the presence of ternary metrical feet (Everett 1988b); and on these topics there is far less room for creative hermeneutics.

If people have been nasty to Everett—and this seems to be the real complaint from Pullum—it’s because the whole thing stinks to high heaven; it’s a shame Pullum can’t smell the bullshit.

References

Everett, D. L. 1986. Pirahã. In Handbook of Amazonian Languages, vol. 1, D. C. Derbyshire and G. K. Pullum (ed.), pages 200-326. Mouton de Gruyter.
Everett, D. L. 1988a. Phonetic rarities in Pirahã. Journal of the International Phonetic Association 12: 94-96.
Everett, D. L. 1988b. On metrical constituent structure in Pirahã. Natural Language & Linguistic Theory 6: 207-246.
Everett, D. L. 2005. Cultural constraints on grammar and cognition in Pirahã: another look at the design features of human language. Current Anthropology 46: 621-646.
Everett, D. L. 2009. Pirahã culture and grammar: a response to some criticisms. Language 85: 405-442.
Nevins, A., Pesetsky, D., and Rodrigues, C. 2009. Pirahã exceptionality: a reassessment. Language 85: 355-404.

Streaming decompression for the Reddit dumps

I was recently working with the Reddit comments and submission dumps from PushShift (RIP).¹ These are compressed in Zstandard .zstformat. Unfortunately, Python’s extensive standard library doesn’t have native support for this format, and the some of the files are quite large,² so a streaming API is necessary.

After trying various third-party libraries, I finally found one that worked with a minimum of fuss: pyzstd, available from PyPI or Conda. This appears to be using ~~Facebook~~Meta’s reference C implementation as the backend, but more importantly, it provides a stream API like the familiar gzip.open, bz2.open, and lzma.open for .gz, .bz2 and .xz files, respectively. There’s one nit: PushShift’s Reddit dumps were compressed with an uncommonly large window size (2 << 31), and one has to inform the decompression backend. Without this, I was getting the following error:

_zstd.ZstdError: Unable to decompress zstd data: Frame requires too much memory for decoding.

All I have to do to fix this is to pass the relevant parameter:

PARAMS = {pyzstd.DParameter.windowLogMax: 31}

with pystd.open(yourpath, "rt", level_or_options=PARAMS) as source:
    for line in source:
        ...

Then, each line is a JSON message with the post (either a comment or submission) and all the metadata.

Endnotes

Psst, don’t tell anybody, but… while these are no longer being updated they are available through December 2023 here. We have found them useful!
Unfortunately, they’re grouped first by comments vs. submissions, and then by month. I would have preferred the files to be grouped by subreddit instead.

Alt-lingfluencers

It’s really none of my business whether or not a linguist decides to leave the field. Several people I consider friends have, and while I miss seeing them at conferences, none of them were close collaborators. Reasonable people can disagree about just how noble it is to be a professor (I think it is, or can be, but it’s not a major part of my self-worth), and I certainly understand why one might prefer a job in the private sector. At the same time, I think linguists wildly overestimate how easy it is to get rewarding, lucrative work in the private sector, and also overestimate how difficult that work can be on a day-to-day basis. (Private sector work, like virtually everything else in the West, has gotten substantially worse—more socially alienating, more morally compromising—in the last ten years.)

In this context, I am particularly troubled by the rise of a small class of “alt-ac” ex-linguist influencers. I realize there is a market for advice on how to transition careers, and there are certainly honest people working in this space. (For instance, my department periodically invites graduates from our program to talk about their private sector jobs.) But what the worst of the alt-lingfluencers do in actuality is farm for engagement and prosecute grievances from their time in the field. If they were truly happy with their career transitions, they simply wouldn’t care enough—let alone have the time—to post about their obsessions for hours every day. These alt-lingfluencers were bathed in privilege when they were working linguists, so to see them harangue against the field is a bit like listening to a lottery winner telling you not to play. These are deeply unhappy people, and unless you know them well enough to check in on their well-being from time to time, you should pay them no mind. You’d be doing them a favor, in the end. Narcissism is a disease: get well soon.

Chomsky & Katz (1974) on language diversity

Chomsky and others, Stich asserts do not really study a broad range of languages in attempting to construct theories about universal grammatical structure and language acquisition, but merely speculate on the basis of “a single language, or at best a few closely related languages” (814). Stich’s assertion is both false and irrelevant. Transformational grammarians have investigated languages drawn from a wide range of unrelated language families. But this is beside the point, since even if Stich were right in saying that all but a few closely related languages have been neglected by transformational grammarians, this would imply only that they ought to get busy studying less closely related languages, not that there is some problem in relating grammar construction to the study of linguistic universals. (Chomsky & Katz 1974:361)

References

Chomsky, N. and Katz, G. 1974. What the linguist is talking about. Journal of Philosophy 71(2): 347-367.

Another quote from Ludlow

Indeed, when we look at other sciences, in nearly every case, the best theory is arguably not the one that reduces the number of components from four to three, but rather the theory that allows for the simplest calculations and greatest ease of use. This flies in the face of the standard stories we are told about the history of science. […] This way of viewing simplicity requires a shift in our thinking. It requires that we see simplicity criteria as having not so much to do with the natural properties of the world, as they have to do with the limits of us as investigators, and with the kinds of theories that simplify the arduous task of scientific theorizing for us. This is not to say that we cannot be scientific realists; we may very well suppose that our scientific theories approximate the actual structure of reality. It is to say, however, that barring some argument that “reality” is simple, or eschews machinery, etc., we cannot suppose that there is a genuine notion of simplicity apart from the notion of “simple for us to use.” […] Even if, for metaphysical reasons, we supposed that reality must be fundamentally simple, every science (with the possible exception of physics) is so far from closing the book on its domain it would be silly to think that simplicity (in the absolute sense) must govern our theories on the way to completion. Whitehead (1955, 163) underlined just such a point.

Nature appears as a complex system whose factors are dimly discerned by us. But, as I ask you, Is not this the very truth? Should we not distrust the jaunty assurance with which every age prides itself that it at last has hit upon the ultimate concepts in which all that happens can be formulated. The aim of science is to seek the simplest explanations of complex facts. We are apt to fall into the error of thinking that the facts are simple because simplicity is the goal of our quest. The guiding motto in the life of every natural philosopher should be, Seek simplicity and distrust it.

(Ludlow 2011:158-160)

References

Ludlow, P. 2011. The Philosophy of Generative Grammar. Oxford University Press.
Whitehead, W. N. 1955. The Concept of Nature. Cambridge University Press.

A page from Ludlow (2011)

Much writing in linguistic theory appears to be driven by a certain common wisdom, which is that the simplest theory either is the most aesthetically elegant or has the fewest components, or that it is the theory that eschews extra conceptual resources. This common wisdom is reflected in a 1972 paper by Paul Postal entitled “The Best Theory,” which appeals to simplicity criteria for support of a particular linguistic proposal. A lot of linguists would wholeheartedly endorse Postal’s remark (pp. 137–138) that, “[w]ith everything held constant, one must always pick as the preferable theory that proposal which is most restricted conceptually and most constrained in the theoretical machinery it offers.”

This claim may seem pretty intuitive, but it stands in need of clarification, and once clarified, the claim is much less intuitive, if not obviously false. As an alternative, I will propose that genuine simplicity criteria should not involve appeal to theoretical machinery, but rather a notion of simplicity in the sense of “simplicity of use”. That is, simplicity is not a genuine property of the object of investigation (whether construed as the human language faculty or something else), but is rather a property that is entirely relative to the investigator, and turns on the kinds of elements that the investigator finds perspicuous and “user friendly.”

Let’s begin by considering Postal’s thesis that the simplest (and other things being equal the best) theory is the one that utilizes less theoretical machinery. It may seem natural to talk about “theoretical machinery,” but what exactly is theoretical machinery? Consider the following questions that arise in cross-theoretical evaluation of linguistic theories of the sort discussed in Chapter 1. Is a level of linguistic representation part of the machinery? How about a transformation? A constraint on movement? A principle of binding theory? A feature? How about an algorithm that maps from level to level, or that allows us to dispense with levels of representation altogether? These questions are not trivial, nor are they easy to answer. Worse, there may be no theory neutral way of answering them.

The problem is that ‘machinery’ can be defined any way we choose. The machinery might include levels of representation, but then again it might not (one might hold that the machinery delivers the level of representation, but that the level of representation itself is not part of the machinery). Alternatively, one might argue that levels of representation are part of the machinery (as they are supported by data structures of some sort), but that the mapping algorithms which generate the levels of representation are not (as they never have concrete realization). Likewise one might argue that constraints on movement are part of the machinery (since they constrain other portions of the machinery), or one might argue that they are not (since they never have concrete realizations).

Even if we could agree on what counts as part of the machinery, we immediately encounter the question of how one measures whether one element or another represents more machinery. Within a particular well-defined theory it makes perfect sense to offer objective criteria for measuring the simplicity of the theoretical machinery, but measurement across theories is quite another matter. (Ludlow 2011: 153)

References

Ludlow, P. 2011. The Philosophy of Generative Grammar. Oxford University Press.
Postal, P. 1972. The best theory. In S. Peters (ed.), Goals of Linguistic Theory., pages 131-179. Prentice-Hall.

Lottery winners

It is commonplace to compare the act of securing a permanent faculty position in linguistics to winning the lottery. I think this is mostly unfair. There are fewer jobs than interested applicants, but the demand is higher— and the supply lower—than students these days suppose. And my junior faculty colleagues mostly got to where they are by years of dedicated, focused work. Because there are a lot of pitfalls on the path to the tenure track, their egos are often a lot smaller than one might suppose.

I wonder if the lottery ticket metaphor might be better applied to graduate trainees in linguistics finding work in the tech sector. I have held both types of positions, and I think I had to work harder to get into tech than to get back into the academy. Some of the “alt-ac influencers” in our field—the ones who ended up in tech, at least—had all the privileges in the world, including some reasonably prestigious teaching positions, before they made the jump. Being able to stay and work in the US—where the vast majority of this kind of work is—requires a sort of luck too, particularly when you reject the idea that “being American” is some kind of default. And finally demand for linguist labor in the tech sector varies enormously from quarter to quarter, meaning that some people are going to get lucky and others won’t.