MP4: Part Of Speech Tagging

In this assignment, you will build an English part-of-speech (POS) tagger using the multinomial averaged perceptron classifier and a greedy decoding scheme. The data consists of tokenized and (manually) tagged sentences from the Wall St. Journal portion of Penn Treebank II:

Pierre/NNP Vinken/NNP ,/, 61/CD years/NNS old/JJ ,/, 
will/MD join/VB the/DT board/NN as/IN a/DT 
nonexecutive/JJ director/NN Nov./NNP 29/CD ./.

Finish the tagger

Most of the tagger is already laid out for you in tagger.py and perceptron.py. You are welcome to use, modify, or ignore the suggested API. Your first assignment is to complete the tagger by implementing three methods:

  1. efeats, which extracts the "emission" features for all tokens in the sentence (see skeleton implementation for full guidelines)
  2. tfeats, which extracts the "transition" features for a single token given a list of preceding hypothesized tags (see skeleton implementation for full guidelines)
  3. AveragedPerceptronTagger.evaluate, which computes the tagger's tag accuracy on held-out (tagged and tokenized) data. To be clear as possible: all this method has to do is to tag held-out data, compare the predicted tags to the gold tags, and return the percentage of predicted tags which are correct.

Once you have implemented these three functions, test the tagger by training a trigram (i.e., order = 2) model using trn_00-18.pos, which contains 38,219 sentences from the Wall St. Journal, and evaluating it on the held-out data in dev_19-21.pos. Using this training data, the reference implementation completes 20 epochs of training in 90 minutes and achieves 96.90% accuracy on dev_19-21.pos.

What to turn in:

  1. The complete tagger program
  2. Some sample terminal input/output showing precisely how the program is used
  3. A paragraph describing your approach, mentioning any interesting or unexpected problems or bugs you ran into as well as how you got around them

Tip: Manually inspect the output of your efeats function for sanity before testing on the full data set.

Tip: Use the provided unit tests while developing tfeats. To run these tests, execute python -m doctest tagger.py; note that this silent if all tests pass.

Once you turn in your tagger, I will evaluate your program's performance on another held-out test set. Using trn_00-18.pos for training data, the reference implementation achieves 96.84% accuracy on this test set.

Improve the tagger

Once you have a working tagger, your next assignment is to augment the feature set with at least one new feature that increases tag accuracy on the development set (dev_19-21.pos). Some possibilities include:

  1. Features that are totally novel
  2. Features based on the conjunction of unigram features already present
  3. Features used by some other tagger, but not this one

What to turn in:

  1. Your modified tagger script
  2. At least one paragraph describing the new feature or features, and how much it increases tag accuracy

Tip: Generate a confusion matrix for the original tagger, and identify a set of tags which are often confused. Then, consult the Penn Treebank tagging manual to learn more about the guidelines for these tags. Then, add features which you believe to be relevant to disambiguating between these confused tags.