In this assignment, you will build an English part-of-speech (POS) tagger using the multinomial averaged perceptron classifier and a greedy decoding scheme. The data consists of tokenized and (manually) tagged sentences from the Wall St. Journal portion of Penn Treebank II:
Pierre/NNP Vinken/NNP ,/, 61/CD years/NNS old/JJ ,/, will/MD join/VB the/DT board/NN as/IN a/DT nonexecutive/JJ director/NN Nov./NNP 29/CD ./.
Most of the tagger is already laid out for you in
perceptron.py. You are welcome to use, modify, or ignore the suggested API. Your first assignment is to complete the tagger by implementing three methods:
efeats, which extracts the "emission" features for all tokens in the sentence (see skeleton implementation for full guidelines)
tfeats, which extracts the "transition" features for a single token given a list of preceding hypothesized tags (see skeleton implementation for full guidelines)
AveragedPerceptronTagger.evaluate, which computes the tagger's tag accuracy on held-out (tagged and tokenized) data. To be clear as possible: all this method has to do is to tag held-out data, compare the predicted tags to the gold tags, and return the percentage of predicted tags which are correct.
Once you have implemented these three functions, test the tagger by training a trigram (i.e., order = 2) model using
trn_00-18.pos, which contains 38,219 sentences from the Wall St. Journal, and evaluating it on the held-out data in
dev_19-21.pos. Using this training data, the reference implementation completes 20 epochs of training in 90 minutes and achieves 96.90% accuracy on
What to turn in:
Tip: Manually inspect the output of your
efeats function for sanity before testing on the full data set.
tfeats. To run these tests, execute
python -m doctest tagger.py; note that this silent if all tests pass.
Once you turn in your tagger, I will evaluate your program's performance on another held-out test set. Using
trn_00-18.pos for training data, the reference implementation achieves 96.84% accuracy on this test set.
Once you have a working tagger, your next assignment is to augment the feature set with at least one new feature that increases tag accuracy on the development set (
dev_19-21.pos). Some possibilities include:
What to turn in:
Tip: Generate a confusion matrix for the original tagger, and identify a set of tags which are often confused. Then, consult the Penn Treebank tagging manual to learn more about the guidelines for these tags. Then, add features which you believe to be relevant to disambiguating between these confused tags.