Your final project for this class should involve the creation of some speech or language technology. You should submit the project as a term paper which

  1. motivates the technology (why does it exist? why did you create it?) and
  2. evaluates the technology using standard evaluation procedures (i.e., held-out evaluation).

While you're welcome to submit data or code, you will not be graded on this. Rather, your project will be graded on:

  1. technical sophistication,
  2. how you motivate the technology in question, and
  3. how you evaluate the technology in question.

Some suggestions for the project are given below.

  1. Experiment with neural network grapheme-to-phoneme conversion, cf. HW2:
    1. Study the effects of morphological features or segmentations (e.g., in French or German)
    2. Address outstanding issues in WikiPron
    3. Compare LSTM and transformer models across several languages, cf. Gorman et al. (2020)
    4. Focus on one language and perform a detailed error analysis, cf. Ashby et al. (2021)
  2. Build a finite-state morphological analyzer
  3. Experiment with morphological generation, cf. HW3:
    1. Study the effects of inherent features (e.g., gender and animacy in Polish, gender in German, aspect in Russian)
    2. Compare LSTM and transformer models across several languages
    3. Focus on one language and perform a detailed error analysis
    4. Something to do with "unnamed morphological abstractness project"
  4. Conduct an ambitious grid, random, or black box hyperparameter tuning for HW2 or HW3
  5. Extend FairSeq for:
    1. POS tagging
    2. Named entity recognition
    3. Morphological analysis/lemmatization
    4. Text classification
  6. Experiment with machine translation using FairSeq and data from the WMT shared tasks
  7. Experiment with neural network homograph disambiguation using data from Gorman et al. (2018)
  8. Experiment with neural network abbreviation expansion using data from Gorman et al. (2021)
  9. Build a speech recognizer using Kaldi and data from OpenSLR
  10. Use speech recognition to perform a sociolinguistics experiment

If you're pursuing an idea not on the list below, you are strongly encouraged to submit a brief abstract to Kyle (either over email or at office hours) describing your concept. This will allow Kyle to ensure the project is both feasible and of appropriate scope.