Python ellipses considered harmful

Python has a conventional object-oriented design, but it was slowly grafted onto the language, something which shows from time to time. Arguably, you see this in the convention that instance methods need self passed as their first argument, and class methods need clsas their first argument. Another place you see it is how Python does abstract classes. First, one can use definitions in the built-in abc module, proposed in PEP-3119, to declare a class as abstract. But in practice most Pythonistas make a class abstract by declaring unimplemented instance methods. There are two conventional ways to do this, either with ellipses or by raising an exception, illustrated below.

class AbstractCandyFactory:
    def make_candy(self, batch_size: int): ...
class AbstractCandyFactory:
    def make_candy(self, batch_size: int):
        raise NotImplementedError

The latter is a bit more verbose, but there is actually a very good reason to prefer it to the former, elliptical version. With the exception version, if one forgets to implement make_candy—say, in a concrete subclass like SnickersFactory(AbstractCandyFactory)—an informative exception will be raised when make_candy is called on a SnickersFactory instance. However, in the elliptical form, the inherited form will be called, and of course will do nothing because the method has no body. This will likely cause errors down the road, but they will not be nearly as easy to track down because there is nothing to directly link the issue to the failure to override this method. For this reason alone, I consider ellipses used to declare abstract instance methods as harmful.

Announcing UDTube

In collaboration with CUNY master’s program graduate Daniel Yakubov, we have recently open-sourced UDTube, our neural morphological analyzer. UDTube performs what is sometimes called morphological analysis in context: it provides morphological analyses—coarse POS tagging, more-detailed morphosyntactic tagging, and lemmatization—to whole sentences using nearby words as context.

The UDTube model, developed in Yakubov 2024, is quite simple: it uses a pre-trained Hugging Face encoders to compute subword embeddings. We then take the last few layers of these embeddings and mean-pool them, then mean-pool subword embeddings for those words which correspond to multiple subwords. The resulting encoding of the input is then fed to separate classifier heads for the different tasks (POS tagging, etc.). During training we fine-tune the pre-trained encoder in addition to fitting the classifier heads, and we make it possible to set separate optimizers, learning rates, and schedulers for the encoder and classifier modules.

UDTube is built atop PyTorch and Lightning, and its command-line interface is made much simpler by the use of LightningCLI, a module which handles most of the interface work. One can configure the entire thing using YAML configuration files. CUDA GPUs and MPS-era Macs (M1 etc.) can be used to accelerate training and inference (and should work out of the box). We also provide scripts to perform hyperparameter tuning using Weights & Biases. We believe that this model, with appropriate tuning, is probably state-of-the-art for morphological analysis in context.

UDTube is available under an Apache 2.0 license on GitHub and on PyPI.

References

Yakubov, D. 2024. How do we learn what we cannot say? Master’s thesis, CUNY Graduate Center.

News from the east

I am a total sucker for cute content from East Asia. I loved to watch Pangzai do his little drinking tricks. I love to hear what the “netizens” are up to. I love the greasy little hippo. I love the horse archer raves. I even love the chow chows painted as pandas. It’s delightful. Is this propaganda? Maybe; certainly it’s embedded a larger matrix of Western-oriented soft-power diplomacy. (That’s why we have so many Thai restaurants.) But I suppose I’m blessed to live in a time where you can get so much cute news from halfway across the world.

Our vocation

If you’re a linguist: well, why?

One thing that stands out about the life of the professional linguist is what Chomsky has the responsibility of intellectuals, to “speak to the truth and to expose lies”, in this case uncomfortable truths about language and its role in society. Certainly this responsibility—and privilege, as Chomsky also points out—is inspiration for many linguists. But other motives abound. I for one am more drawn to learning about (an admittedly narrow corner) of human nature than I am to speaking truth to power, and most likely would have ended in some other area of social science had I not discovered the field. And there’s nothing wrong with a linguist who is most of all drawn to little logic puzzles, so long as these puzzles are ultimately grounded in those questions about human nature. (I do reject, categorically, those who say that linguists ought to be doing nothing than “Word Sudoku” or “Wordle with more steps”. Maybe there are people who work solely in those modes, and if so I wish them a very happy alt-ac career transition.)

I think the truths about human nature uncovered by the epistemology-obsessed generativists—including those of the armchair variety—has something to say about the proper organization of society. But one is more likely to get such messages from sociolinguists. Sociolinguists correctly point out that we have unexamined, corrosive ideologies about language, languages, and their speakers that are mostly contrary to the liberal values most of us profess, and they certainly are well-positioned to speak these truths. That said, I do not agree with an often-implicit assumption that sociolinguistics is somehow a more noble vocation than other topics in the field. The “discourse” on this is often fought as a proxy war over hiring: e.g., one I’ve heard before is “Why doesn’t MIT’s linguistics faculty include a sociolinguist?” First off, it sort of does: it includes one of the world’s foremost creolists, who has written extensively about the role of creole studies in neocolonialism and white supremacy. Whether or not a creolist is a sociolinguist is probably more a matter of self-identity than one of observable fact, but there’s no question that creole studies has a lot to give to—but also a both a lot to answer for on—the problem of linguistic equality. Should the well-rounded linguist have studied sociolinguistics? Absolutely. But there are probably many other areas, topics, or even theories you think that any well-rounded linguist ought to have studied but which are not required or widely taught, and these rarely provoke such discourse.