A Morris Halle memory

Morris Halle passed away earlier today. Morris was an absolute giant in the field of linguistics. His work in the 1950s and 1960s completely revolutionized phonological theory. He did this, primarily, by rejecting an axiom of the previous century’s work.
The theory of phonology was so utterly transformed by his argument against the principle of biuniqueness that the very concept is rarely even taught in the 21st century.
And this was just one of his earliest scientific contributions.

I could say a lot more about Morris’s work, but instead let me tell a short anecdote. In 2010 or so I happened to be in the Boston area and my advisor kindly arranged for me to meet Morris. After getting coffee we walked to his spare shared office. The only thing of note was a single wall-mounted bookshelf containing three books: Morris’ own Sound Pattern of Russian and Sound Pattern of English—with the dust cover removed so as to exhibit the unique bas-relief cover designed by Morris’s wife, a talented visual artist—and of course, Walker’s rhyming dictionary. For whatever reason, we started to discuss Latin. Working with the legal pad, Morris first showed me a novel analysis of thematic vowels. Ignoring a few irregular (“athematic”) stems, all Latin verb stems have a characteristic final vowel: -ā- in the first conjugation, -ē- in the second, a vowel of varying quality (usually e or i) in the third, and -ī- in the fourth. In the first conjugation and most of the third conjugation, this vowel disappears in the first person singular active indicative verb, which is marked with an suffix. Thus for the second conjugation verb docēre ‘teach’, we have doceō ‘I teach’, with the theme vowel preserved, and similarly for the fourth conjugation. In contrast, for the first conjugation verb amāre ‘love’, we have amō ‘I love’, with the theme vowel omitted, and similarly for the majority of the third conjugation. This much I already knew. To me it was just one of those conjugational quirks one has to memorize when learning Latin but Morris suggested that it was not necessarily so. What if, he argued, the first conjugation -ā- was deleted by a following ? (Certainly that rule is surface-true, except for a handful of Greek loanwords like chaos.) But what about the third conjugation? Morris suggested that he had long believed the underlying form of the third conjugation theme vowel was [+back], something like /ɨ/, and he proceeded to lay out the necessary allophonic rules, and finally a rule which deletes the first of two [+back] segments! I was floored.

I then showed him an analysis I was working on at the time. Once again ignoring a few irregulars, Latin masculines and feminine nouns of the third declension are characterized by a nominative singular suffix -s. When the verb stem is athematic and ends in a /t, d/, this consonant is deleted in the nominative singular (e.g., frons, frontis ‘forehead’). I argued that this rule ought to be extended to also target /r/ so as to account for the so-called “rhotic” stems like honōs, honōris ‘honor’ (e.g., /honōr-s/ → [honōs]). To make this work, one must write the rule so that it bleeds its own application (see here for the full analysis), and as one of several opaque rules. This is something which is possible in the rule-application framework proposed by Morris and colleagues, but which cannot be straightforwardly implemented in more recent theoretical frameworks. I must have hesitated for a moment as I was talking through this, because Morris grabbed my hand and said to me: “Young man, remember always to speak clearly and to never apologize for your rule ordering.” And then he bid me adieu.

When should we call it “terrorism”?

According to White House Press Secretary Sarah Huckabee Sanders, a recent spate of serial bombings targeting prominent African-Americans in Austin, TX, has “no apparent nexus to terrorism at this time”. I want to make a pedantic lexicographic point about the definition of terrorism (and terrorist) regarding this. There is certainly a sense of terrorism which just involves random lethal violence against civilians, and by that definition this absolutely qualifies. But, that is not the definition used by the state (or mass media). Rather, they favor an alternative sense which emphasizes the way in which the violence undermines the authority of the state. This is in fact encoded in the (deeply evil) PATRIOT Act, which defines terrorism as an attempt to “…to influence the policy of a government by intimidation or coercion; or to affect the conduct of a government by mass destruction, assassination, or kidnapping.” Let’s assume, as seems likely though by no means certain, that the bomber(s) are white supremacists targeting African-American communities. You’d be hard-pressed to argue that terrorizing people of color undermines the authority of a deeply racist society and its institutions any more than say, trafficking crack cocaine in African-American communities to support right-wing death squads abroad. Terrorizing people of color is absolutely in line with US domestic and foreign policy, and the language chosen by the White House (and parroted by the media) naturally reflects that.

Making high-quality graphics in R

There are a lot of different ways to make an R graph for TeX; this is my workflow.

In R

I use cairo_pdf to write a graph to disk. This command takes arguments for image size and for font size and face. If you’re on a Mac, you will need to install X11.

Image size

I always specify graph size by hand, in inches. For manuscripts and handouts, I usually set the width to be the printable width. If you’re using 1″ margins, that’s 6.5″. Then, I adjust height until a pleasing form emerges.

Fonts

I match the font face of the manuscript (whatever I’m using) and graph labels by passing the font name as the argument to family. This matters most if you’re writing a handout, and matters less if you’re sending it to, say, Oxford University Press, who will redo your text anyways. I found out the hard way that the family keyword argument is absent in older versions of R, so you may need to upgrade. By default, image font are 12pt. This is generally fine, but can be adjusted with the pointsize argument.

Graphing

This is a no-brainer: use ggplot2.

All together now

cairo_pdf('mygraph.pdf', width=6.5, height=4, family='Times New Roman')
qplot(X, Y, data=dat)
dev.off()

In TeX

Add usepackage{graphicx} to your preamble, if it’s not already there. In the body, includegraphics{mygraph.pdf}.

TeX tips for linguists

I’ve been using TeX to write linguistics papers for nearly a decade now. I still think it’s the best option. Since TeX is a big, complex ecosystem and not at all designed with linguists in mind, I thought it might be helpful to describe the tools and workflow I use to produce my papers, handouts, and abstracts.

Michael Becker‘s notes are recommended as well.

Software

I use xetex (this is the same as xelatex) from XeTeX. It has two advantages over the traditional pdflatex and related tools. First, you can use system fonts via fontspec and mathspec. If you are using Computer Modern or packages like txfonts or times, etc., it’s time to join the modern world.

Secondly, it expects UTF-8. If you are using tipa, or multi-character sequences to enter non-ASCII characters, then you probably have ugly transcriptions. (Don’t want to name names…)

Fonts

Linguists generally demand the following types of characters:

  • Alphabetic small caps
  • The complete IPA, especially IPA [g] (which is not English “g”)
  • “European” extensions to ASCII: enye (año), diaresis (coöperation, über), acute (résumé), grave (à), macron (māl), circumflex (être), haček (očudit), ogonek (Pająk), eth (fracoð), thorn (þæt), eszet (Straße), cedilla (açai), dotted g (ealneġ), and so on, for Roman-like writing systems

The only font I’ve found that has all this is Linux Libertine. It has nothing to do with Linux, per se. In general, it’s pretty handsome, especially when printed small (though the Q is ridiculously large). If you can deal without small caps (and arguably, linguists use them too much), then a recent version of Times New Roman (IPA characters were added recently) also fits the bill. Unfortunately, if you’re on Linux and using the “MS Core Fonts”, that version of Times New Roman doesn’t have the IPA characters.

This is real important: do not allow your mathematical characters to be in Computer Modern if your paper is not in Computer Modern. It sticks out like a sore thumb. What you do is put something like this in the preamble:

\usepackage{mathspec}
\setmainfont[Mapping=tex-text]{Times New Roman}
\setmathfont(Digits,Greek,Latin){Times New Roman}

Examples

The gb4e package seems to be the best one for syntax-style examples, with morph-by-morph glossing and the like. I myself deal mostly in phonology, so I use the tabular environment wrapped with an example environment of my own creation called simplex.sty and packaged in my own LingTeX (which is a work-in-progress).

Bibliographies

I use natbib. When I have a choice, I usually reach for pwpl.bst, the bibliography style that we use for the Penn Working Papers in Linguistics. It’s loosely based on the Linguistic Inquiry style.

Compiling

I use make for compiling. This will be installed on most Linux computers. On Macintoshes, you can get it as part of the Developer Tools package, or with Xcode.

I type make to compile, make bib to refresh the bibliography, and make clean to remove all the temporary files. Here’s what a standard Makefile for paper.tex would look like for me.

 # commands
 PDFTEX=xelatex -halt-on-error
 BIBTEX=bibtex

 # files
 NAME=paper

 all: $(NAME).pdf

 $(NAME).pdf: $(NAME).tex $(NAME).bib *.pdf
      $(PDFTEX) $(NAME).tex
      $(BIBTEX) $(NAME).tex
      $(PDFTEX) -interaction=batchmode -no-pdf $(NAME).tex
      $(PDFTEX) -interaction=batchmode $(NAME).tex

 clean:
      latexmk -c

There are a couple interesting things here. -halt-on-error kills a compile the second it goes bad: why wouldn’t you want to fix the problem right when it’s detected, since it won’t produce a full PDF anyways? Both -interaction=batchmode and -no-pdf shave off a considerable amount of compile time, but aren’t practical when debugging, and when producing a final PDF, respectively. I use latexmk -c, which reads the log files and removes temporary files but preserves the target PDF. For some reason, though, it doesn’t remove .bbl files.

Draft mode

Up until you’re done, start your file like so:

\documentclass[draft,12pt]{article}

This will do two things: “overfull hboxes” will be marked with a black line, so you can rewrite and fix them. Secondly, images won’t be rendered into the PDF, which saves time. It’s very easy to tell who does this and who doesn’t.

On slides and posters

There are several TeX-based tools for making slides and posters. Beamer seems to be very popular for slides, but I find the default settings very ugly (gradients!) and cluttered (navigation buttons on every slide!). I use a very minimal Keynote style (Helvetica Neue, black on white). I’m becoming a bigger fan of handouts, since unlike slides or posters, the lengthy process of making a handout gets me so much closer to publication.