|
This class provides an introduction to statistical data analysis from various areas in linguistics research. Topics covered include probability theory, descriptive statistics, non-parametric statistics, simple parametric tests, randomization techniques, linear regression, logistic regression, and mixed-effects models.
Students will learn to use the R statistical environment and a wide variety of methods for statistical inference. They will learn best practices for reporting statistical results. They will learn to critique statistical methods commonly used in the linguistics literature.
Readings will be assigned from two sources: Analyzing Linguistic Data: A Practical Introduction to Statistics using R (Baayen 2008) and Quantitative Methods in Linguistics (Johnson 2008). Readings are intended to provide additional background and details to the lectures, and as such, students can choose to consume the readings either before or after the associated lecture depending on their personal preferences.
Students are encouraged to bring a laptop to the practicum and are welcome to bring one to the lecture as well. Students are also invited to make use of the Computational Linguistics Laboratory (7400.13) for practice and assignments.
Some assignments will be "pencil-and-paper" exercises. Others will require the student to turn in valid R code to solve some given problem. Students will turn in assignments using GitHub Classroom.
80% of students' grades will be derived from assignments and exams; the remaining 20% will be reserved for participation and attendance. Assignments must be submitted on time or will receive a 0 grade (barring a documented emergency).
The instructor will attempt to provide all reasonable accommodations to students upon request. If you believe you are covered under the Americans With Disabilities Act, please direct accommodations requests to Matthew G. Schoengood, Vice President for Student Affairs.
Students are extended to attend all lectures and practica. The instructor reserves the right to tie grades to attendance records. The instructor and practicum leader are not responsible for reviewing materials missed due to absence.
In line with the Student Handbook policies on plagiarism, students are expected to complete their own work. The instructor reserves the right to refer violations of this policy to the Academic Integrity Officer.
For the sake of the privacy, students are asked not to record lectures. Students are expected to be considerate of your peers and to treat them with respect during class discussions.
All dates are Wednesday unless otherwise noted.
(Please note that this is subject to change.)
1/29 | Experimental design; random variables | [Notes] | Baayen §1, Johnson §1 | |||
2/5 | Notation; distributions; R | [Notes, Slides] | Johnson §2-2.3; Johnson 2014 | |||
2/12 | Lincoln's Birthday (no class) | |||||
2/19 | Null hypothesis significance testing; the binomial test | [Notes, Slides] | Baayen §3 | |||
2/26 | Homework 1 [solution] | One- and two-sample tests | [Slides] | Baayen §4-4.3.1; Johnson §3-3.1; Johnson §5-5.1 | ||
3/4 | Homework 2 [solution] | Effect size and power analysis | [Notes, Slides] | |||
3/11 | Homework 3 [solution] | Correlation analysis | [Audio, Slides] | Johnson §2.3 | ||
3/18 | Instructional pause | |||||
3/25 | Homework 4 [solution] | Simple linear regression | [Slides] | Johnson §4-4.3 | ||
4/1 | Midterm exam [study guide, solution] | Linear regression 2 | [Slides] | Johnson §5.4-5.5.7 | ||
4/7 | (Tuesday) | Linear regression 3 | [Slides, Notes] | |||
4/8 | Spring Recess (no class) | |||||
4/15 | Spring Recess (no class) | |||||
4/22 | Homework 5 [solution] | Logistic regression | [Slides] | |||
4/29 | Homework 6 [solution] | Mixed effects regression 1 | [Slides] | Baayen § 7 | ||
5/6 | Homework 7 [solution] | Mixed effects regression 2 | [Slides] | Johnson §4.4, §7.3-7.4 | ||
5/13 | Homework 8 [solution] | Data visualization; ggplot2 | [Slides] | Wickham 2009 | ||
5/15 | (Friday) | Reading Day | ||||
5/22 | (Friday) | Final exam [study guide, solution] |