[continuing an argument from earlier…]
I think that intellectual diversity and academic freedom are good baseline values, but that they are not so obviously a positive value in pedagogy. Students genuinely look to their instructors for information about approaches to pursue, and not all frameworks (etc.) are of equal value. Let us suppose that I subscribe to theory P and you to theory Q (≠ P). Necessarily one of the following must be true:
(1) P and Q are mere notational variants (i.e., weakly equivalent or better).
(2) P is “more right” than Q or Q is “more right” than P.
In the case of (1), the best pedagogical practice would probably be to continue to propagate whichever of {P, Q} is more widely used, more intellectually robust, etc. For instance if Q is the more robust tradition, efforts to “port” insights and technologies from Q to P contribute little, and often quite difficult in practice. Of course this doesn’t mean that P and its practitioners should be suppressed or anything of the sort, but there is no strong imperative to transmit P to young scholars.
In the case of (2), the best practice is also to focus pedagogy on whichever of the two is “more right”. Of course it is often useful to teach the intellectual history, and it is not always clear which theory is out ahead, but it is imperative to make sure students are conversant in the most promising approaches.
In the case of computational syntactic formalisms, weak equivalences hold between minimalist grammars (MGs) as formalized by Stabler and colleagues) and most of the so-called alternative formalisms. It is also quite clear to me that insights largely flow from minimalism and friends (broadly construed) to the alternative formalisms, and not the other way around. Finally, efforts to “port” these insights to alternative formalisms have stalled, or perhaps are just many years behind the bleeding edge in syntactic theory. (1) therefore applies, and I simply don’t see the strong imperative to teach the alternative formalisms.
In the case of computational phonology, there is also an emerging consensus that harmonic grammar (HG) of the sort learned by “maxent” technologies) have substantial pathologies compared to earlier formalisms, so that classic Optimality Theory (OT) is clearly “less wrong”. I am similarly sympathetic to arguments that global evaluation frameworks including HG and OT are overly constrained with respect to opacity phenomena and overgenerate in other dimensions, and these issues of expressivity are not shared with imperative (i.e., rule-based) and declarative (i.e., “weakly deterministic” formalisms of the “Delaware school”). (2) thus applies here.
Of course I am not advocating for restrictions on academic freedom or speech in general; I make this argument only regarding best pedagogical practices, and I’m not sure I’m in the right here.
Thanks for the interesting post!
A couple of comments:
I think there is a lot of excluded ‘middle’ between (1) and (2). One option is that the formalisms could be incomparable in that P can derive data A and B, and Q can derive B and C. More likely, however, is the option that different grammar formalisms can derive different phenomena in a more succinct way, and it is hard to evaluate which theory is more parsimonious after all is said and done. And as I see it, finding the most succinct theory is what matters at the end of the day, Ockham’s razor and all.
As a concrete example, ATB-movement and RNR are amazingly easy to express in CCG but require extensions to the standard tools of minimalism.
Regarding the equivalences you mentioned: IIRC, MGs are more powerful than CCGs and TAGs, not equivalent. The latter can express 4 counting dependencies, MGs more (see e.g. Stabler 2011: Computational perspectives on minimalism).
Thanks for the comment. To your first point, I would say that it can be the case that non-veridical theories P and Q can have non-overlapping coverage of veridical facts {A, B, C}, but any veridical theory will have to have to cover all three, so my exclusion of the middle is intentional-rhetorical. That’s not to say you’re wrong…that can happen. I don’t have any current doubts about the explanatory adequacy of any of the formalisms I’m mentioning here, modulo technologies that exist in one more robust formalisms that haven’t yet been imported into less robust formalisms but probably will eventually…does that make sense? On your second point, I don’t necessarily agree that succinctness/parsimony is a good way to do theory comparison in grammar, but I am interested in how we would compare the complexity of ATB and RNR in these different theories; is there any work on that? (Such a method would probably have to be applied globally, would it not?) On your last point, I was being somewhat deliberately hasty regarding the exact equivalences but I appreciate the correction.
Not sure if I get your points about explanatory adequacy. Current theories are ‘adequate’ in the sense that they can derive *all* mildly context-sensitive patterns, of which natural language (apart from 2,3 debated exceptions) is a small subset. So in that sense one can be sure that every formalism could at least in principle derive everything we throw at it.
Which brings me again to my second point: given that we have many sledgehammers that are all garantueed to smash every nut, for me at least the most important follow-up task then is to find the smallest and most succinct theory that cracks only the natural language nuts.
How else should one compare theories of grammar or theories of anything except by parsimony? The alternative sounds to me like giving up Ockham’s razor or am I misunderstanding what you are getting at? And yes, ideally they would be compared globally, but to cut down the task into managable pieces one could start with a small set of phenomena and then extend it more and more, assuming naively that if a blow-up happens it will occur with these weird edge-case phenomena.
In order to compare how grammar formalisms deal with ATB/RNR, one could for starters count the number of rules they need to derive ATB-phenomena. The version of MG (parser) by Torr&Stabler 2016 that can do ATB counts 45 rules (according to their conclusion), the question then is how many rules a CCG would need. A much clearer case would of course be if some proposed additional mechanism (e.g. sth like sidewards mvmt) would make a grammar a parallel-MCFG whereas a MCFG-equivalent formalism can derive the same data. But I don’t know if such a case exists. But sounds like an interesting project.
I think we agree then that we’re dealing with theories that are clearly “powerful enough”.
I don’t agree that counting the number of rules is obviously informative to theory quality. Maybe it is, maybe it isn’t; maybe a CCG rule “costs” more or less than a rule in some other formalism. I don’t think we know how to measure simplicity across theories making Occam’s useless (at the present state of knowledge) for the type of theory comparison you have in mind.