I have long been critical of naïve interpretations of probabilistic grammar. To me, it seems like the major motivation for this approach derives from a naïve—I’d say overly naïve—linking hypothesis mapping between acceptability judgments and grammaticality, as seen in Likert scale-style acceptability tasks. (See chapter 2 of my dissertation for a concrete argument against this.) But in this approach, the probabilities are measures of wellformedness.
It occurs to me that there are a number of ontologically distinct interpretations of grammatical probabilities of the sort produced by “maxent”, i.e., logistic regression models.
For instance, at M100 this weekend, I heard Bruce Hayes talk about another use of maximum entropy models: scansion. In poetic meters, there is variation in, say, whether the caesura is masculine (after a stressed syllable) or feminine (after an unstressed syllable), and the probabilities reflect that.1 However, I don’t think it makes sense to equate this with grammaticality, since we are talking about variation in highly self-conscious linguistic artifacts here and there is no reason to think one style of caesura is more grammatical than the other.2
And of course there is a third interpretation, in which the probabilities are production probabilities, representing actual variation in production, within a speaker or across multiple speakers.
It is not obvious to me that these facts all ought to be modeled the same way, yet the maxent community seems comfortable assuming a single cognitive model to cover all three scenarios. To state the obvious, it makes no sense for a cognitive model to acconut for interspeaker variation because there is no such thing as “interspeaker cognition”, there are just individual mental grammars.
Endnotes
- This is a fabricated example because Hayes and colleagues mostly study English meter—something I know nothing about—whereas I’m interested in Latin poetry. I imagine English poetry has caesurae too but I’ve given it no thought yet.
- I am not trying to say that we can’t study grammar with poetry. Separately, I note, as did, I think, Paul Kiparsky at the talk, that this model also assumes that the input text the poet is trying to fit to the meter has no role to play in constraining what happens.