Natural Language Processing, CS322, SP19

Instructor: Jack Hessel; jhessel@carleton.edu (CMC 324, Office Hours: Tu 11:30AM-12:30PM, F 3PM-4:30PM, and by appointment)
Place/time: CMC 301; 2a (M/W 9:50-11AM, F 9:40-10:40AM)
Textbook: “Speech and Language Processing” by Jurafsky and Martin (3rd edition: available online here; contact me if you’re having any trouble accessing)
Class Piazza
Syllabus

Description:

Natural languages (e.g., Chinese, English, etc.) enable humans to communicate, but, for the better part of history, computers have been left out of the conversation. Enabling machines to understand language is the goal of Natural Language Processing (NLP), and achieving this goal (or coming close) offers immense promise in fields like human-computer interaction, computational social science, and medicine (among others). However — language understanding is quite difficult, as spoken/written language often encodes complex factors beyond literal meaning; in fact, NLP is so hard that it is sometimes called “AI Complete,” i.e., if you could build a machine that truly understands language, you could build a machine with intellectual capacity equal to a human.

This course will cover several topics in Natural Language Processing, with a particular focus on statistical methods that learn patterns automatically from corpora (in contrast to methods that rely on hand-designed features and rules). Topics will include language modeling, supervised learning with bag-of-words inputs, lexical/vector semantics, feed-forward neural networks (for language modeling), and recurrent neural networks (for sequence tagging).

Calendar

	In-Class	Reading (for next lecture)	Assignments Out	Assignments Due
Week 1 (April 1)	Welcome; Why NLP?; slides	“I’m sorry Dave, I’m afraid I can’t do that…” Lee 2004.; J+M 2.2, 2.3, 2.4 intro, 2.4.2-2.4.6
	Statistical/Rule-based Discussion; Tokenization	J+M 3.intro, 3.1, 3.2	HW1: N-Gram Language Models; and the data you need for A1	Group preferences; Getting to know you
	Tokenization, Stemming, Lemmatizing; Probability Refresher; Demo, Bayes Rule Video	J+M 3.3, 3.4
Week 2 (April 8)	Language models	J+M 3.5, 3.6	Groups for project assigned
	Language models	None! Optionally, just for fun, you can listen to this podcast
	Language models + Whirlwind tour of project topics	J+M 4.intro, 4.1, 4.2, 4.3, 4.4
Week 3 (April 15)	Language Models Wrapup + Text Classification	J+M 4.5, 4.6, 4.7	HW2: Sentiment Classification and the data you need for HW2.	HW1, Project Topic Preferences
	Text Classification	J+M 4.8, 4.9; Read over/follow along in a terminal with Justin Johnson’s python/numpy/scipy intro
	Logistic Regression	J+M 5.intro, 5.1
Week 4 (April 22)	Logistic Regression	J+M 5.2, 5.3, 5.4		Projects: Proposals
	Logistic Regression	J+M 5.5, 5.6, 5.7
	Logistic Regression sklearn demo	J+M 6.intro, 6.1, 6.2, 6.3, 6.4
Week 5 (April 29)	Neural Networks intro, Keras Demo	Tensorflow/keras introduction		HW2
	Classification Wrapup, Lexical and Vector Semantics	J+M 6.5, 6.6, 6.7; linear algebra cheatsheet; optionally, the linear SVM section of wikipedia	HW3: Vector Semantics, and the data you need for HW3.
	Lexical and Vector Semantics, Midterm Evaluations	J+M 6.8 – 6.12; Kirk Baker’s Truncated SVD; optional linear algebra review
Week 6 (May 6)	Midterm Break
	Lexical and Vector Semantics	J+M 7.intro – 7.3
	word embeddings from SVD demo, Neural Networks for Language Modeling, Exam Topics	J+M 7.4 – 7.6		Projects: Progress Report
Week 7 (May 13)	Midterm Exam	No reading! :)
	Neural Networks for Language Modeling	Review or catch up on J+M ch. 7
	Neural Networks for Language Modeling; word2vec in keras demo	None!		HW3
Week 8 (May 20)	Project Updates from Teams Dependency Parse, Duplicate Questions; Recurrent Nets	J+M 8.1-8.3; J+M 9.intro, 9.1	HW4: Neural Language Models; data for HW4
	Project Updates from Teams Paraphrase, Summarization, Constituency, Named Entity; Recurrent Nets	J+M 9.2, 9.3 (and review 9.1 and 9.intro)
	Project Updates from Teams Question Answering, Coreference, Inference; Recurrent Nets	J+M 9.4, 9.5
Week 9 (May 27)	RNNs for sequence processing	“There is a Blind Spot in AI Research”: Crawford and Calo 2016; Daumé’s 2016 Proposal for NLP/ML ethics code; Optional: NLP for hackers blogpost about POS tagging in keras
	POS tagging wrapup, Ethics in Machine Learning: “can” versus “should”	Scan abstracts of (or read fully, at your discretion) current “hot” papers (and some that I just think are cool): GLUE Benchmarks; BERT; Multitask Extension of BERT; BERT rediscovers NLP pipeline
	A look ahead: where is NLP heading?; Ask Me Anything	None! Thanks for a great term, y’all!!		HW4
Week 10 (June 3)	Group Presentations
	Guest Lecture from Zachary Levonian			Projects: Final Writeups (+code, etc.) at 5PM!

Projects

The final project for this class is a research/review project about an NLP task of your choosing. Aside from researching the task, it’s history, its practical importance, etc. you will: 1) download/explore a real dataset researchers use to build/evaluate models for your task; 2) implement two baselines for the task, and measure their performance; 3) measure the performance of a 3rd party API on your task; and 4) read and summarize a real NLP research paper.

NLP Tasks (and some starting-point links)

Proposal

Your proposal should:

Specify which task your group has been assigned, and what is interesting about it.
Highlight datasets of interest — which dataset are you going to use? How are you going to get it?
Talk about evaluation — how do researchers measure performance in your setting? How will you implement these performance metrics?
Discuss what steps you will take before the in-person check-in.
Highlight roadblocks you’ve encountered — or expect to encounter.

Code (will be turned in with the final writeup)

While different groups will have different code formats, your code should:

Include a README, with instructions of how to run it (including what commands do what, and what datasets need to be downloaded).
Load and preprocess the corpus of interest
Have support for printing dataset statistics (e.g., number of documents, vocab size, etc.)
Have support for computing standard evaluation metrics on your corpus of interest, e.g., accuracy, precision.
Implement a very simple baseline for your task. For example, a constant prediction baseline would be a good candidate.
Implement a smarter method for your task that you come up with. Your implementation should use a a subset of: python, numpy, scipy, sklearn, and keras (i.e., here, you shouldn’t import some python library that accomplishes your specific task for you automatically in a single call). The only requirement is that this method is “smarter” than the very simple baseline; I don’t expect you to achieve state-of-the-art results! Good examples of slightly smarter methods are: n-gram overlap baselines, hand-designed rules, logistic regression, etc.
Implement code that calls an off-the-shelf tool for your task/dataset. In NLP, it’s quite useful to know when to write your own code versus call someone else’s code. Thus, you will write code that calls an existing API to accomplish your task, and see how much better/worse it performs than your method.

Off-the-shelf tools for each task

Note: If you find a tool that you’d rather use elsewhere online, you are free to use it — just make sure to check with me first.

Note2: I expect (and understand) that these resources somewhat vary in their ease-of-runability (e.g., some have a relatively simple API, and others require messing around with code on GitHub).

Progress Report

The goal of the progress report is to ensure that both you and I have a realistic expectation of how much work you will be able to do for the rest of the term. Your progress report should be typeset in LaTeX, and use the Association for Computational Linguistics template. The easiest way to access this template is by using overleaf and link sharing between group members. Your progress report should contain the following sections, which purposefully mirror the sections for the final writeup:

An introduction: what is your problem, why is it interesting?
A related work section: each member in your group individually will read a research paper and write a 3-paragraph summary of that paper (what question does it tackle?; at a high-level, what methods does it use?; what are the results and conclusions?). The paper should tackle your group’s task (though it doesn’t need to be on exactly the same dataset). I expect this to be difficult, and that’s okay (in fact — it’s part of the point)! At the time of the progress report, you should have at least decided who will read what paper.
A dataset section: describe the dataset you selected, and the statistics of the corpus
An evaluation section: what evaluation metrics are used for your task? How will you implement them, if you haven’t yet?
An experiment section for your baseline: what simple baseline did you select? Have you run it on your dataset yet?
An experiment section for your slightly-better-than-baseline: what method did you choose that improves performance over the baseline? Have you run it on your dataset yet?
An experiment section for the external API’s performance: what external API have you chosen? Will it be easy to run this 3rd party code on your dataset? What roadblocks have you encountered?

I will be providing in-person feedback on your progress reports in the form of 15-minute individual group meetings: your entire group is required to attend this meeting. More information will be provided closer to the due date of the progress report.

Update Presentation

Towards the end of the term (e.g., week 7-8) one group will present a 5 minute update presentation at the beginning of class each day. The goal of these presentations is to explain your task to your classmates, highlight the progress you’ve made thus far, and describe some of the difficulties you’ve encountered.

Final Presentation

The final two days of the course are reserved for 12-15 minute group presentations. In addition to re-introducing your task to the group and explaining why it is cool/interesting/useful, you will present your final results, including your evaluations of your simple baseline, your slightly-smarter baseline, and the off-the-shelf tool.

Writeup

Your final writeups should be typeset in LaTeX, and use the Association for Computational Linguistics template. It is okay to have textual overlap with your progress report — in fact, that is the intention of making the sections (mostly) mirror each-other. Your writeup should include:

An introduction: what is your problem, why is it interesting?
A related work section: each member in your group should read and write a summary of a research paper that tackles your group’s task.
A dataset section: describe the dataset you selected, and the statistics of the corpus
An evaluation section: what evaluation metrics are used for your task? How did you implement them?
An experiment section for your baseline: what simple baseline did you select, and how does it perform according to the metrics described in the evaluation section?
An experiment section for your slightly-better-than-baseline: what method did you choose that improves performance over the baseline? How does it perform?
An experiment section for the external API’s performance: how well does the off-the-shelf tool perform for this task?
Shortcomings and Future Work: What did you aim to accomplish that you did not? What would you like to do in the future with this task?
Conclusion: A summary of your main findings.

Additional Resources

How to install python packages via pip. In addition, all of the lab machines should have all of the libraries you need for your homeworks (let me know if they do not!). If you’re having trouble installing, please talk to me.
nltk; a natural language processing toolkit for python
Justin Johnson’s introduction to python/numpy.
Maria Antoniak’s list of resources for machine learning and data science.
tqdm: a loading bar library for python.