Starting about two years ago, I got an unexpected medical bill in the mail. The amount wasn’t very high, but I was quite frustrated and annoyed. First, this was from a local College of Dentistry, where most procedures are free for the insured (and probably not insured too); there was no “explanation of benefits” that explained this was a co-pay, or that my insurance only covered some portion. Secondly, I hadn’t been to the College of Dentistry in quite a while, so I had no idea which of the various procedures this was or even what day I received the billed service. Third, there was no way to get more information: the absolute worst thing about this provider is that the administrative staff are some of the most overloaded and overworked people I have ever seen, and I have witnessed them just let the phone ring because they’re dealing with a huge line of in-person patients (some of whom are bleeding from their mouth). So I didn’t pay it. After a while though, the bills continued and I started to worry. Was I wasting paper for no reason? Would this harm my credit score? So I put about an hour into finding a way to actually get in touch with the billing office: turns out this was a Google Form buried somewhere on a website, and if you fill it out, a someone calls you (in my case, within the hour!), looks up your chart, and can tell you the date of service and why you were billed. Why they didn’t just include this in the bill in the first place? I have to imagine this makes it ever harder for the College to actually collect on these debts.
Category: Presentation of self in everyday life
“Indic” considered harmful
Indic is an adjective referring to the Indo-Aryan languages such as Hindi-Urdu or Bengali. These languages are spoken mostly in the northern parts of India, as well as in Bangladesh, Pakistan, Sri Lanka, Nepal, and the Maldives. This term can be confusing, because hundreds of millions of people in the Indian subcontinent (and nearby island nations) speak non-Indic first languages: over 250 million people, particularly in the south of India and the north of Sri Lanka, speak Dravidian languages, which include Malayalam, Tamil, and Telugu. Austronesian, Tibeto-Burman, and Tai-Kadai languages, and many language isolates, are also spoken in the India and the other nations of subcontinent, as is English (and French, and Portuguese). Unfortunately, there is now a trend to use Indic to mean ‘languages of the subcontinent’. See here for a prominent example. This is a new sense for Indic, and while there is probably a need for such a lexeme to express the notion (language of India or subcontinental language would work), reusing Indic, which already has a distinct and well-established sense, just adds unnecessary confusion.
A minor syntactic innovation in English: “BE crazy”
I recently became aware of an English syntactic construction I hadn’t noticed before. It involves the predicate BE crazy, which itself is nothing new, but here the subject of that predicate is, essentially, quoted speech from a second party. I myself am apparently a user of this variant. For example, a friend told me of someone who describes themselves (on an online dating platform) as someone who …likes travel and darts, and I responded, simply, Likes darts is crazy. That is to say, I am making some kind of assertion that the description “likes darts”, or perhaps the speech act of describing oneself as such, is itself a bit odd. Now in this case, the subject is simply the quotation (with the travel and part elided), and while this forms a constituent, a tensed VP, we don’t normally accept them as the subject of predicates. And I suspect constituenthood is not even required. So this is distinct from the ordinary use of BE crazy with a nominal subject.
I suspect, though I do not have the means to prove, this is a relatively recent innovation; I hear it from my peers (i.e., those of similar age, not my colleagues at work, who may be older) and students, but not often elsewhere. I also initially thought it might be associated with the Mid-Atlantic but I am no longer so sure.
Your thoughts are welcome.
“Segmented languages”
In a recent paper (Gorman & Sproat 2023), we complain about conflation of writing systems with the languages they are used to write, highlighting the nonsense underlying common expressions like “right-to-left language”, “syllabic language” or “ideographic” language found in the literature. Thus we were surprised to find the following:
Four segmented languages (Mandarin, Japanese, Korean and Thai) report character error rate (CER), instead of WER… (Gemini Team 2024:18)
Since the most salient feature of the writing systems used to write Mandarin, Japanese, Korean, and Thai is the absence of segmentation information (e.g., whitespace used to indicate word boundaries), presumably the authors mean to say that the data they are using has already been pre-segmented (by some unspecified means). But this is not a property of these languages, but rather of the available data.
[h/t: Richard Sproat]
References
Gemini Team. 2023. Gemini: A family of highly capable multimodal models. arXiv preprint 2312.11805. URL: https://arxiv.org/abs/2312.11805.
Gorman, K. and Sproat, R.. 2023. Myths about writing systems in speech & language technology. In Proceedings of the Workshop on Computation and Written Language, pages 1-5.
Growing consensus
Any time I read a paper that begins, roughly, “there is a growing consensus that P“, there is not in fact, as far as I can tell, a growing consensus in support of P.
Yet more on the Pirahã debate
I just read a draft of Geoff Pullum’s paper on the Pirahã controversy, presented at a workshop of the recent LSA meeting.
It’s not a particularly interesting paper to me, since it has nothing to say about the conflicting data claims at the center of the controversy. No one has ever given an explanation of how one might integrate the evidence for clausal embedding in Everett 1986 (etc.) with the writings of Everett from 2005 onward. These two Everetts are in mortal conflict. Everett (1986), for example gives examples of embedded clauses, Everett (2005) denies that the language has clausal embedding, and Everett (2009), faced with the contradiction, has decided to gloss this same example (Nevins et al. 2009, ex. 13, reproduced from Everett 1986, ex. 232) as two sentences, with no argument provided for why earlier Everett was wrong. While one ought not to reason from one’s own limited imagination, it’s hard for me to fathom anything other than incompetence in 1987 or dishonesty 2005-present. Either way, it suggests that additional attention is probably needed on other specific claims about this language, such as the presence of rare phonetic elements (Everett 1988a) and the presence of ternary metrical feet (Everett 1988b); and on these topics there is far less room for creative hermeneutics.
If people have been nasty to Everett—and this seems to be the real complaint from Pullum—it’s because the whole thing stinks to high heaven; it’s a shame Pullum can’t smell the bullshit.
References
Everett, D. L. 1986. Pirahã. In Handbook of Amazonian Languages, vol. 1, D. C. Derbyshire and G. K. Pullum (ed.), pages 200-326. Mouton de Gruyter.
Everett, D. L. 1988a. Phonetic rarities in Pirahã. Journal of the International Phonetic Association 12: 94-96.
Everett, D. L. 1988b. On metrical constituent structure in Pirahã. Natural Language & Linguistic Theory 6: 207-246.
Everett, D. L. 2005. Cultural constraints on grammar and cognition in Pirahã: another look at the design features of human language. Current Anthropology 46: 621-646.
Everett, D. L. 2009. Pirahã culture and grammar: a response to some criticisms. Language 85: 405-442.
Nevins, A., Pesetsky, D., and Rodrigues, C. 2009. Pirahã exceptionality: a reassessment. Language 85: 355-404.
Alt-lingfluencers
It’s really none of my business whether or not a linguist decides to leave the field. Several people I consider friends have, and while I miss seeing them at conferences, none of them were close collaborators. Reasonable people can disagree about just how noble it is to be a professor (I think it is, or can be, but it’s not a major part of my self-worth), and I certainly understand why one might prefer a job in the private sector. At the same time, I think linguists wildly overestimate how easy it is to get rewarding, lucrative work in the private sector, and also overestimate how difficult that work can be on a day-to-day basis. (Private sector work, like virtually everything else in the West, has gotten substantially worse—more socially alienating, more morally compromising—in the last ten years.)
In this context, I am particularly troubled by the rise of a small class of “alt-ac” ex-linguist influencers. I realize there is a market for advice on how to transition careers, and there are certainly honest people working in this space. (For instance, my department periodically invites graduates from our program to talk about their private sector jobs.) But what the worst of the alt-lingfluencers do in actuality is farm for engagement and prosecute grievances from their time in the field. If they were truly happy with their career transitions, they simply wouldn’t care enough—let alone have the time—to post about their obsessions for hours every day. These alt-lingfluencers were bathed in privilege when they were working linguists, so to see them harangue against the field is a bit like listening to a lottery winner telling you not to play. These are deeply unhappy people, and unless you know them well enough to check in on their well-being from time to time, you should pay them no mind. You’d be doing them a favor, in the end. Narcissism is a disease: get well soon.
It’s “Penn”
This is probably a losing battle at this point, but the University of Pennsylvania’s short name is and has always been Penn and UPenn is something of a shibboleth (probably derived from the the URL upenn.edu).
Lottery winners
It is commonplace to compare the act of securing a permanent faculty position in linguistics to winning the lottery. I think this is mostly unfair. There are fewer jobs than interested applicants, but the demand is higher— and the supply lower—than students these days suppose. And my junior faculty colleagues mostly got to where they are by years of dedicated, focused work. Because there are a lot of pitfalls on the path to the tenure track, their egos are often a lot smaller than one might suppose.
I wonder if the lottery ticket metaphor might be better applied to graduate trainees in linguistics finding work in the tech sector. I have held both types of positions, and I think I had to work harder to get into tech than to get back into the academy. Some of the “alt-ac influencers” in our field—the ones who ended up in tech, at least—had all the privileges in the world, including some reasonably prestigious teaching positions, before they made the jump. Being able to stay and work in the US—where the vast majority of this kind of work is—requires a sort of luck too, particularly when you reject the idea that “being American” is some kind of default. And finally demand for linguist labor in the tech sector varies enormously from quarter to quarter, meaning that some people are going to get lucky and others won’t.
Citation practices
In a previous post I talked about an exception to the general rule that you should expand acronyms: sometimes what the acronym expands to is a clear joke made up after the fact. This is an instance of a more general principle: you should provide, via citations, information the reader needs to know or stands to benefit from. To that point, nobody has ever really cared about the mere fact that you “used R (R Core Team 2021)”. It’s usually not relevant. R is one of hundreds of Turing-complete programming environments, and most of the things it can do can be done in any other language. Your work almost surely can be replicated in other environments. It might be interesting to mention this if a major point of your paper is that wrote, say, a new open-source software package for R: there the reader needs to know what platform this library targets. But otherwise it’s just cruft.