Feature maximization and phonotactics

[This is a quick writing exercise for in-progress work with Charles Reiss. Sorry if it doesn’t make sense out of context.]

An anonymous reviewer asks:

I wonder how the author(s) would reconcile this learning model with the evidence that both children and adults seem to aggressively generalize phonotactic restrictions from limited data (e.g. just [p]) to larger, unobserved natural classes (e.g. [p f b v]). See e.g. the discussion in Linzen & Gallagher (2017). If those results are credible, they seem much more consistent with learning minimal feature specifications for natural classes than learning maximal ones.

First, note that Linzen & Gallagher’s study is a study of phonotactic learning, whereas our proposal concerns induction of phonological rules. We have been, independently but complementarily, quite critical of the naïve assumptions inherent in prior work on this topic (e.g., Gorman 2013, ch. 2; Reiss 2017, §6); we have both argued that knowledge of phonotactic generalizations may require much less grammatical knowledge than is generally believed.

Secondly, we note that Linzen & Gallagher’s subjects are (presumably; they were recruited on Mechanical Turk and were paid $0.65 USD for their efforts) adults briefly exposed to an artificial language. While we recognize that adult “artificial language learning” studies are common practice in psycholinguistics, it is not clear what such studies contribute to our understanding of phonotactic acqusition (whatever the phonotactic acquirenda turn out to be) by children robustly exposed to realistic languages in situ.

Third, the reviewer is incorrect; the result reported by Linzen & Gallagher (henceforth L&G) is not consistent with minimal generalization. Let us grant—for sake of argument—that our proposal about rule induction in children is relevant to their work on rapid phonotactic learning in adults. One hypothesis they entertain is that their participants will construct “minimal classes”:

For example, when acquiring the phonotactics of English, learners may first learn that both [b] and [g] are valid onsets for English syllables before they can generalize to other voiced stops (e.g., [d]). This generalization will be restricted to the minimal class that contained the attested onsets (i.e., voiced stops), at least until a voiceless stop onset is encountered.

If by a “minimal class” L&G are referring to a natural class which is consistent with the data and has an extension with the fewest members, then presumably they would endorse our proposal of feature maximization, since the class that satisfies this definition is the most fully specified empirically adequate class. However, it is an open question whether or not such a class would actually contain [d]. For instance, if one assumes that major place features are bivalent, then the intersection of the features associated with [b, g] will contain the specification [−coronal], which rules out [d].

Interestingly, the matter is similarly unclear if we interpret “minimal class” intensionally, in terms of the number of features, rather than in terms of the number of phonemes the class picks out. The (featurewise-)minimal specification for a single phone (as in the reviewer’s example) is the empty set, which would (it is generally assumed) pick out any segment. Then, we would expect that any generalization which held of [p], as in the reviewer’s example, to generalize not just to other labial obstruents (as the reviewer suggests), but to any segment at all. Minimal feature specification cannot yield a generalization from [p] to any proper subset of segments, contra the anonymous reviewer and L&G. An adequate minimal specification which picks out [p] will pick out just [p].; L&G suggest that maximum entropy models of phonotactic knowledge may have this property, but do not provide a demonstration of this for any particular implementation of these models.

We thank the anonymous reviewer for drawing our attention to this study and the opportunity their comment has given us to clarify the scope of our proposal and to draw attention to a defect in L&G’s argumentation.

References

Gorman, K. 2013. Generative phonotactics. Doctoral dissertation, University of Pennsylvania.
Linzen, T., and Gallagher, G. 2017. Rapid generalization in phonotactic learning. Laboratory Phonology: Journal of the Association for Laboratory Phonology 8(1): 1-32.
Reiss, C. 2017. Substance free phonology. In S.J. Hannahs and A. Bosch (ed.), The Routledge Handbook of Phonological Theory, pages 425-452. Routledge.

Journal websites

It is now 2023, and virtually every journal I review for has a broken website, which further penalizes me for volunteer work I ought to be paid for. This is really unacceptable. Maybe some of the big publishers can take a tiny bite out of their massive revenues (Springer Nature apparently pulled down 1.72b USD in revenue in 2021) and invest it into actually testing their the CRUD apps.

Large LMs and disinformation

I have never understood the idea that large LMs are uniquely positioned to enable the propagation of disinformation. Let us stipulate, for sake of argument, that large LMs can generate high-quality disinformation and that its artificial quality (i.e., not generated by human writers) cannot be reliably detected either by human readers nor by computational means. At the same time, I know of no reason to suppose that large LMs can generate better (less detectable, more plausible) disinformation than can human writers. Then, it is hard to see what advantage there is to using large LMs for disinformation generation beyond a possible economic benefit realized by firing PR writers and replacing them with “prompt engineers”. Ignoring the dubious economics—copywriters are cheap, engineers are expensive—there is a presupposition that disinformation needs to scale, i.e., be generated in bulk, but I see no reason to suppose this either. Disinformation, it seems to me, comes to us either in the form of “big lies” from sources deemed reputable by journalists and lay audiences (think WMDs), or increasingly, from the crowds (think Qanon).

e- and i-France

It will probably not surprise the reader to see me claim that France and French are both sociopolitical abstractions. France is, like all states, an abstraction, and it is hard to point to physical manifestations of France the state. But we understand that states are a bundle of related institutions with (mostly) shared goals. These institutions give rise to our impression of the Fifth Republic, though at other times in history conflict between these institutions gave rise to revolution. But currently the defining institutions share a sufficient alignment that we can usefully talk as if they are one. This is not so different from the i-language perspective on languages. Each individual “French” speaker has a grammar projected by their brain, and these are (generally speaking) sufficiently similar that we can maintain the fiction that they are the same. The only difference I see is that linguists can give a rather explicit account of any given instance of i-French whereas it’s difficult to describe political institutions in similarly detailed terms (though this may just reflect my own ignorance about modern political science). In some sense, this explicitness at the i-language level makes e-French seem even more artificial than e-France.

Caffeine

I recently stopped consuming caffeine on a daily basis. For at least a dozen years, I’d had a cup of fully caffeinated coffee first thing pretty much every morning. And over the last few years, I also found myself getting a lot of pleasure out of a 3pm espresso shot. I quit because I hoped to improve my sleep. I understand from browsing the literature that caffeine actually has a reasonably long half- and quarter-life, and a morning cup really does negatively impact your sleep 14 hours later. I also understand that caffeine does not “give” you energy; it just temporarily causes your body to consume energy stores at a higher rate. This seems to have worked; I am certainly more refreshed in the morning than I used to be, and I am as active as ever. Only negative thinking and parties keep me up late now.

Having tried to quit caffeine before, I knew that I would have to titrate down gradually to avoid painful headaches. I therefore reduced my consumption gradually, over the course of two weeks, and didn’t experience much pain. I understood, of course, that there is a low-level addictive component to caffeine, the sort of thing that gives you transitory headaches if you don’t get your fix. What I didn’t understand, however, is the degree to which my addiction to caffeine (and that’s the right word here) had seeped into my higher-level consciousness. I found my mind coming up with elaborate justifications for why I needed caffeine. During the first few weeks, my mind was telling me that perhaps I’m just not as smart, handsome, clever, or strong without it. I recognize this as classic addict talk.

I have kept up my coffee ritual. As I have for many years, I start every morning by grinding 10g of fresh roasted beans, heating water to 205°, and using these to prepare about 12 oz of hot coffee. However, this coffee has no more than a tiny trace of caffeine thanks to the solvent-free “Swiss Water” diffusion process. My roaster provides a decent sample of different coffees prepared with this process (with no real markup over the caffeinated variety), including a nice fair trade Sumatran. I am also allowing myself to have one caffeinated cup (at least until I run out of caffeinated beans) a week on Friday morning just before I go the gym to lift weights.

I think I have to recommend going through this detox, if you’re in a state of mind where you can exert a bit of will power.

Character-based speech technology

Right now everyone seems to be moving to character-based speech recognizers and synthesizers. A character-based speech recognizer is an ASR system in which there is no explicit representation of phones, just Unicode codepoints on the output side. Similarly, a character-based synthesizer is a TTS engine without an explicit mapping onto pronunciations, just orthographic inputs. It is generally assumed that the model ought to learn this sort of thing implicitly (and only as needed).

I genuinely don’t understand why this is supposed to be better. Phonemic transcription really does carry more information than orthography, in the vast majority of languages, and making it an explicit target is going to do a better job of guiding the model than hoping the system automatically self-organizes. Neural nets trained for language tasks often have a implicit representation of some linguistically well-defined feature, but they often do better when that feature is made explicit.

My understanding is that end-to-end systems have potential advances over feed-forward systems when information and uncertainty from previous steps can be carried through to help later steps in the pipeline. But that doesn’t seem applicable here. Building these explicit mappings from words to pronunciations and vice versa is not all that hard, and the information used to resolve ambiguity is not particularly local. Cherry-picked examples aside, it is not at all clear that these models can handle locally conditioned pronunciation variants (the article a pronounced uh or aye), homographs (the two pronunciations of bass in English), or highly deficient writing systems (think Perso-Arabic) better than the ordinary pipeline approach. One has to suspect the long tail of these character-based systems are littered with nonsense.

RoboCop

I like a lot of different types of films, but my favorite are the subtextually rich, nuance-light action/science fiction films of the late 1970s, 1980s, and early 1990s, made by directors like Cameron, Carpenter, Cronenberg, McTiernan, Scott, and Verhoeven. Perhaps the most prescient of all of these is RoboCop (1984). The film’s feel is set by over-the-top comic sex and violence and silly diagetic TV clips. In less deft hands, it could easily have become the sort of campy farce best described (or perhaps, denigrated) as a “cult classic”. (This usually means a film is just bad.) But Verhoeven wields sex and violence like a master wields a paintbrush. (I take this to be a sort of self-critique of his childhood aesthetic appreciation of the violence he saw as a boy growing up in Nazi-occupied Holland, not far from the V-2 launch sites.) The film is thematically rich, so much so that one can easily forgive Verhoeven’s apparent decision to leave out (in what is probably the most “dated” element of the film) any overt criticism of policing as an institution. It is ruthlessly critical of what we’d now call neoliberalism, of corporatism, and has much to say about the nature of the self. The theme that strikes me as most prescient is how the film hinges on the very modern realization that, to a striking degree, what we call “AI” is fundamentally just “other people”, alienated and dehumanized by contractual labor relations. Verhoeven could somehow see this coming decades before anything that could reasonably be called AI.

1-on-1 Zoom

If you’re just doing a “meeting” with one other person located in the same country, I don’t see the point of using Zoom. Ordinary phone lines are more reliable and have more familiar acoustic qualities (this is why VoIP sounds worse: unless you’re quite young, you’re probably far more familiar with the 8kHz sampling rate and whatever compression curve the phone system uses). Just call people on the phone!

ACL Workshop on Computation and Written Language

The first ACL Workshop on Computation and Written Language (CAWL) will be held in conjunction with ACL 2023 in Toronto, Canada, on July 13th or 14th 2023 (TBD). It will feature invited talks by Mark Aronoff (Stony Brook University) and Amalia Gnanadesikan (University of Maryland, College Park). We welcome submissions of scientific papers to be presented at the conference and archived in the ACL Anthology. Information on submission and format will be posted at https://cawl.wellformedness.com shortly.

Generalized capitalist realism

One of the most memorable books I’ve read over the last decade or so is Mark Fisher’s Capitalist Realism: Is There No Alternative? (2009). The book is a slim, 81-page pamphlet describing the feeling that “not only is capitalism the only viable political and economic system, but also that it is now impossible even to imagine a coherent alternative to it.” As Fisher explains, a lot of ideological work is done to prevent us from imagining alternatives, including the increasingly capitalist sheen of anti-capitalism, and there are a few areas—the overall non-response to climate change and biosphere-scale threats, for example—where capitalist realism ideology has failed to co-opt dissent, suggesting at least the possibility of an alternative on the horizon, even if Fisher himself does not imagine or present one.

A very clear example of capitalist realism can be found in the ethical altruism (EA) movement, which focuses on getting charity to the less well-off via existing capitalist structures. Singer (2015), the moment’s resident philosopher, justifies this by setting the probability of a viable alternative to capitalism surfacing in any reasonable time frame to be zero. Therefore the most good one can do is to ruthlessly accumulate wealth in the metropole and then give it away where it is most needed. Any synergies between the wealth of the first world and the dire economic conditions in the third world simply have to set aside.

Fisher’s term capitalist realism is a sort of pun on socialist realism, a term for idealized, realistic, literal art from 20th century socialist countries. His use of the term realism is (deliberately, I think) ironic, since both capitalist and socialist realism apply firm ideological filters to the real world. The continental philosophy stuff that this ultimately gets down to is a bit above my pay grade, but I think we can generalize the basic idea: X realism is an ideology that posits and enforces the hypothesis that there is no alternative to X.

If one is willing to go along with this, we can easily talk about, for instance, neural realism, which posits that there is simply no alternative to neural networks for machine learning. You can see this for instance in the debate between “deep learning fundamentalists” like LeCun and the rigor police like Rahimi (see Sproat 2022 for an entertaining discussion): LeCun does seem believe there to be no alternative to employing methods we do not understand with the scientific rigor that Rahimi demands, when it seems obvious that these technologies remain a small part of the overall productive economy. An even clearer example is the term foundation model, which has the fairly obvious connotation that they are crucial to the future of AI. Foundation model realism would also necesarily posit that there is no alternative and discard any disconfirming observation.

References

Fisher, M. 2009. Capitalist Realism: Is There No Alternative? Zero Books.
Singer, P. 2015. The Most Good You Can Do. Yale University Press.
Sproat, R. 2022. Boring problems are sometimes the most interesting. Computational Linguistics 48(2): 483-490.