My colleague Richard Sproat and I recently went through it with “general science” journal editors regarding an overhyped- and undercooked paper about the origins of writing on Easter Island. Language Log has the scoop.
Author: Kyle Gorman
Growing consensus
Any time I read a paper that begins, roughly, “there is a growing consensus that P“, there is not in fact, as far as I can tell, a growing consensus in support of P.
Self-taught C++
I have recently fielded a few requests from students about self-directed learning of C++. I thought I’d combine my notes here. So, compared to Python for instance, C++ is a very large language both in terms of syntactic richness and the large standard library. Secondly, it has been popular for at least two decades longer than Python, so there is a lot of really dated material out there that doesn’t incorporate the huge positive changes to the language made in C++11.
I recommend two books. First and most importantly is the 4th edition of (C++ creator) Bjarne Stroustrup’s The C++ Programming Language. This is a gigantic hardback textbook that basically covers everything you need to know through C++11. It does not cover C++14, C++17, C++20, or C++23, but those are all pretty minor changes by comparison, and you’ll catch on. Stroustrup is actually a pretty good technical writer, too. (If a 5th edition ever comes out, get that one instead.) The other one I recommend is the Scott Myers’ Effective Modern C++, a smaller book which focuses on the newer C++11 and C++14 features. Myers’ book is structured like a series of essays about when and how to incorporate these new features.
There are two other things I recommend that aspiring C++ users use. The first is a good style guide. C++ just isn’t very opinionated, but good code is. I definitely recommend the widely-used Google C++ style guide, but I’m sure there are other good ones out there. The second is Godbolt, an incredible website that combines the functionality of a pastebin with an in-browser compiler.
Optionality as acquirendum
A lot of work deals with the question of acquiring “optional” or “variable” grammatical rules, and my impression is that different communities are mostly talking at cross-purposes. I discern at least three ways linguists conceive of optionality as something which the child must acquire.
- Some linguists assume—I think without much evidence—that optionality is mere “free variation”, so that the learner simply needs to infer which rules bear a binary [optional] feature. This is an old idea, going back to at least Dell (1981); Rasin et al. (2021:35) explicitly state the problem in this form.
- Variationist sociolinguists focus on the differential rates at which grammatical rules apply. They generally recognize the acquirenda as essentially conditional probability distributions which give the probability of rule application in a given grammatical context. Bill Labov is a clear avatar of this strain of thinking (e.g., Labov 1989). David Adger and colleagues have attempted to situate this within modern syntactic frameworks (e.g., Adger 2006).
- Some linguists believe that optionality is not statable within a single grammar, and must reflect the competing grammars. The major proponent of this approach is Anthony Kroch (e.g., Kroch 1989). While this conception might license some degree of “nihilism” about optionality, it also has led to some interesting work which hypothesizes interesting substantive constraints on grammar-internal constraints on variation as in the work of Laurel MacKenzie and colleagues (e.g., MacKenzie 2019). This work is also very good at ridding the (2) of some of its unfortunate “externalist” thinking.
I have to reject (1) as overly simplicistic. I find (2) and (3) both compelling in some way but a lot of work remains to synthesize or adjudicate between them.
References
Adger, D. 2006. Combinatorial variability. Journal of Linguistics 42(3): 503-530.
Dell, F. 1981. On the learnability of optional phonological rules. Linguistic Inquiry 12(1): 31-37.
Kroch, A. 1989. Reflexes of grammar in patterns of language change. Language Variation & Change 1(1): 199-244.
Labov, W. 1989. The child as linguistic historian. Language Variation & Change 1(1): 85-97.
MacKenzie, L. 2019. Perturbing the community grammar: Individual differences and community-level constraints on sociolinguistic variation. Glossa 4(1): 28.
Rasin, E., Berger, I., Lan, R., Shefi, I., and Katzir, R. 2021. Approaching explanatory adequacy in phonology using Minimum Description Length. Journal of Language Modelling 9(1): 17-66.
Kill yr darlings…
Endnotes
- To be fair, Yang and Piantadosi claims to be a theory of not just phonology…
- I am permitted to state that I reviewed one of these papers—my review was “signed” and made public, along with the paper—and my review was politely negative. However, it was clear to me that the editor and other reviewers had a very high opinion of this work and there was no reason for me to fight the inevitable.
References
Ellis, K., Albright, A., Solar-Lezama, A., Tenenbaum, J. B., and O’Donnell, T. J. 2022. Synthesizing theories of human language with bayesian program induction. Nature Communications 2022:1–13. Rasin, E., Berger, I., Lan, N., Shefi, I. and Katzir, R. 2021. Approaching explanatory adequacy in phonology using Minimum Description Length. Journal of Language Modelling 9:17–66. Yang, Y. and Piantadosi, S. T. 2022. One model for the learning of language. Proceedings of the National Academy of Sciences 119:e2021865119.Yet more on the Pirahã debate
I just read a draft of Geoff Pullum’s paper on the Pirahã controversy, presented at a workshop of the recent LSA meeting.
It’s not a particularly interesting paper to me, since it has nothing to say about the conflicting data claims at the center of the controversy. No one has ever given an explanation of how one might integrate the evidence for clausal embedding in Everett 1986 (etc.) with the writings of Everett from 2005 onward. These two Everetts are in mortal conflict. Everett (1986), for example gives examples of embedded clauses, Everett (2005) denies that the language has clausal embedding, and Everett (2009), faced with the contradiction, has decided to gloss this same example (Nevins et al. 2009, ex. 13, reproduced from Everett 1986, ex. 232) as two sentences, with no argument provided for why earlier Everett was wrong. While one ought not to reason from one’s own limited imagination, it’s hard for me to fathom anything other than incompetence in 1987 or dishonesty 2005-present. Either way, it suggests that additional attention is probably needed on other specific claims about this language, such as the presence of rare phonetic elements (Everett 1988a) and the presence of ternary metrical feet (Everett 1988b); and on these topics there is far less room for creative hermeneutics.
If people have been nasty to Everett—and this seems to be the real complaint from Pullum—it’s because the whole thing stinks to high heaven; it’s a shame Pullum can’t smell the bullshit.
References
Everett, D. L. 1986. Pirahã. In Handbook of Amazonian Languages, vol. 1, D. C. Derbyshire and G. K. Pullum (ed.), pages 200-326. Mouton de Gruyter.
Everett, D. L. 1988a. Phonetic rarities in Pirahã. Journal of the International Phonetic Association 12: 94-96.
Everett, D. L. 1988b. On metrical constituent structure in Pirahã. Natural Language & Linguistic Theory 6: 207-246.
Everett, D. L. 2005. Cultural constraints on grammar and cognition in Pirahã: another look at the design features of human language. Current Anthropology 46: 621-646.
Everett, D. L. 2009. Pirahã culture and grammar: a response to some criticisms. Language 85: 405-442.
Nevins, A., Pesetsky, D., and Rodrigues, C. 2009. Pirahã exceptionality: a reassessment. Language 85: 355-404.
Streaming decompression for the Reddit dumps
I was recently working with the Reddit comments and submission dumps from PushShift (RIP).1 These are compressed in Zstandard .zst
format. Unfortunately, Python’s extensive standard library doesn’t have native support for this format, and the some of the files are quite large,2 so a streaming API is necessary.
After trying various third-party libraries, I finally found one that worked with a minimum of fuss: pyzstd, available from PyPI or Conda. This appears to be using FacebookMeta’s reference C implementation as the backend, but more importantly, it provides a stream API like the familiar gzip.open
, bz2.open
, and lzma.open
for .gz
, .bz2
and .xz
files, respectively. There’s one nit: PushShift’s Reddit dumps were compressed with an uncommonly large window size (2 << 31), and one has to inform the decompression backend. Without this, I was getting the following error:
_zstd.ZstdError: Unable to decompress zstd data: Frame requires too much memory for decoding.
All I have to do to fix this is to pass the relevant parameter:
PARAMS = {pyzstd.DParameter.windowLogMax: 31} with pystd.open(yourpath, "rt", level_or_options=PARAMS) as source: for line in source: ...
Then, each line
is a JSON message with the post (either a comment or submission) and all the metadata.
Endnotes
Alt-lingfluencers
It’s really none of my business whether or not a linguist decides to leave the field. Several people I consider friends have, and while I miss seeing them at conferences, none of them were close collaborators. Reasonable people can disagree about just how noble it is to be a professor (I think it is, or can be, but it’s not a major part of my self-worth), and I certainly understand why one might prefer a job in the private sector. At the same time, I think linguists wildly overestimate how easy it is to get rewarding, lucrative work in the private sector, and also overestimate how difficult that work can be on a day-to-day basis. (Private sector work, like virtually everything else in the West, has gotten substantially worse—more socially alienating, more morally compromising—in the last ten years.)
In this context, I am particularly troubled by the rise of a small class of “alt-ac” ex-linguist influencers. I realize there is a market for advice on how to transition careers, and there are certainly honest people working in this space. (For instance, my department periodically invites graduates from our program to talk about their private sector jobs.) But what the worst of the alt-lingfluencers do in actuality is farm for engagement and prosecute grievances from their time in the field. If they were truly happy with their career transitions, they simply wouldn’t care enough—let alone have the time—to post about their obsessions for hours every day. These alt-lingfluencers were bathed in privilege when they were working linguists, so to see them harangue against the field is a bit like listening to a lottery winner telling you not to play. These are deeply unhappy people, and unless you know them well enough to check in on their well-being from time to time, you should pay them no mind. You’d be doing them a favor, in the end. Narcissism is a disease: get well soon.
Chomsky & Katz (1974) on language diversity
Chomsky and others, Stich asserts do not really study a broad range of languages in attempting to construct theories about universal grammatical structure and language acquisition, but merely speculate on the basis of “a single language, or at best a few closely related languages” (814). Stich’s assertion is both false and irrelevant. Transformational grammarians have investigated languages drawn from a wide range of unrelated language families. But this is beside the point, since even if Stich were right in saying that all but a few closely related languages have been neglected by transformational grammarians, this would imply only that they ought to get busy studying less closely related languages, not that there is some problem in relating grammar construction to the study of linguistic universals. (Chomsky & Katz 1974:361)
References
Chomsky, N. and Katz, G. 1974. What the linguist is talking about. Journal of Philosophy 71(2): 347-367.
It’s “Penn”
This is probably a losing battle at this point, but the University of Pennsylvania’s short name is and has always been Penn and UPenn is something of a shibboleth (probably derived from the the URL upenn.edu).