MP8: Machine translation

Hansards are verbatim transcriptions of parliamentary debates, named after printer Thomas Curson Hansard. In countries with de jure bilingualism, these transcripts are usually made available in multiple languages. In this assignment, you will use the bilingual English-French hansards of the 36th Canadian Parliament. The natural language group at the Information Sciences Institute at the University of Southern California extracted these files from the official HTML hansards, performed sentence tokenization, and then identified parallel English and French sentences using the GSA (Geometric Segment Alignment) algorithm.

Part 1: IBM Model I translation model

Around a quarter of a million parallel sentences of the Canadian hansards can be found in hansards.36.ca.e.tok.gz and hansards.36.ca.f.tok.gz. These have been converted to UTF-8, tokenized, and case-folded. Your assignment is to implement IBM Model 1 translation table estimation and use this to compute a French-to-English translation model. A skeleton implementation is provided in model1.py. After five training iterations, use the translation table to compute the highest-probability English translation of each of the 50 "government-related" French words in fwords.txt.

What to turn in:

  1. Your program/function for computing the Model I translation table
  2. Some sample terminal input/output showing precisely how your code is used
  3. The best translation for the 50 French words, displayed in some standard tabular format
  4. A paragraph describing your approach, mentioning any interesting or unexpected problems or bugs you ran into as well as how you got around them

Tip: As before, collections.defaultdict is your friend

Bonus: Experiment with different numbers of training iterations: what seems to be a sweet spot?

Part 2: BLEU score

The text file e.tok contains one reference translation for each of the following (unnormalized) French sentences from the Canadian hansards:

Beaucoup perdront leur droit de toucher une pension à leur propre nom en raison du revenu de leur mari tandis que d'autres verront le montant de leur pension réduit.
La pension combinée créera un problème pour les femmes âgées parce qu'elle élimine tout ce à quoi peuvent prétendre certaines d'entre elles pour leurs achats personnels ou, par exemple, pour acheter des cadeaux pour leurs petits-enfants.
Le gouvernement a-t-il calculé les répercussions de la nouvelle prestation sur les femmes?
Si oui, les renseignements pourraient-ils être déposés au Sénat?
Par exemple, le ministre pourrait-il nous dire combien de femmes n'auront plus droit à une pension en raison de la prestation pour aîné(e)s, combien de femmes verront leurs prestations réduites et de combien ces prestations seront réduites, en moyenne?
L'honorable B. Alasdair Graham (leader du gouvernement):
Honorables sénateurs, il est évident que le vieillissement de notre population entraîne des augmentations de coûts pour le gouvernement.
Certains couples reçoivent des prestations plus élevées que d'autres même si leur revenu global est le même.
Le système est très complexe et cela constitue un fardeau pour les personnes âgées.
Le gouvernement essaie de protéger et de renforcer le système.

The above text is UTF-8, but it may render improperly in your browser. Do not be alarmed: the text itself is properly encoded.

These sentences were run through two automated French-to-English translation systems (Google Translate and SYSTRAN, both circa 2014), and then normalized to produce gtranslate.tok and systran.tok. Your assignment is to compute BLEU score for each sentence, using the exact formulas given by Papineni et al. (2002) and the lecture notes, treating the sentences in e.tok as the reference translations. A skeleton implementation is provided in bleu.py. Then, compare the translations produced by the two systems to the English reference translations from e.tok.

What to turn in:

  1. Your program/function for computing BLEU scores
  2. Some sample terminal input/output showing precisely how your code is used
  3. The BLEU scores for each sentence for both systems in some standard tabular form
  4. A paragraph about whether the BLEU score results correspond to your intuitions or not
  5. A paragraph describing your approach, mentioning any interesting or unexpected problems or bugs you ran into as well as how you got around them