I’m excited about our new article in the American Journal of Speech-Language Pathology (with Gerasimos Fergadiotis and Steven Bedrick) on automatic classification of paraphasias using basic natural language processing techniques.
Paraphasias are speech errors associated with aphasia. Roughly speaking, these errors may be phonologically similar to the target (dog for the target LOG) or dissimilar. They also may be semantically similar to the target (dog for the target CAT), or both (rat for the target CAT). Finally, they may be neologisms (tat for the target CAT). Finally, some paraphasias may be real words but neither phonologically nor semantically similar. The relative frequencies of these types of errors differ between people with aphasia. These can be measured in a confrontation naming task and, with complex and time-consuming manual error classification, used to create individualized profiles for treatment.
In the paper, we take archival data from a confrontation naming task and attempt to automate the classification of paraphasias. To quantify phonological similarity, we automate a series of baroque rules. To quantify semantic similarity, we use a computational model of semantic similarity (namely cosine similarity with word2vec embeddings). And, to identify neologisms, we use frequency in the SUBTLEX-US corpus. The results suggest that test scoring can in fact be automated with performance close to that of human annotators. With advances in speech recognition, it may soon be possible to develop a fully-automated computer-adaptive confrontation naming task in the near future!