POLS 559: Text as Data
John Wilkerson (firstname.lastname@example.org)
Class meets TH 1:30=4:20, Savery 164
Office Hours: M (1-3 or by apt) Smith 221C
This class introduces computational approaches for collecting, preparing and analyzing text as data. It is not a programming class but you will need to do some programming to complete assignments. The main goal of the class is to survey computational approaches that have essentially the same objectives as other quantitative studies – counting, scaling and grouping. Like any quantitative project, validity – are we really capturing what we are trying to measure? – is of central concern.
We begin by considering the goals and practice of non-computational content analysis. We then work through the stages of a typical text as data project - getting text, converting it to data, and analysis. The analysis component covers several approaches commonly used in the social sciences.
Finally, the main benefit of computational methods is the ability to scale up an analysis. Each participant designs and executes an original and ambitious project using the methods covered in the course.
There are no required books for this class. A lot of books on data science are being published lately and they are worth reading. However, none are closely related to how this class is taught (there is no book about Quanteda for example). Many of these are also available on-line.
The Coding Manual for Qualitative Researchers, by Saldana (we’ll read a chapter but this is a very helpful book)
R for Data Science, by Wickham and Grolemund (Wickham is a leading R developer)
Data Vizualization for Social Science, by Healy (how to present results using R)
Natural Language Processing with Python, by Bird, Klein, Loper
Learning Python, by Lutz (If you are really interested in the Python programming language, this is the intro.)
Arguably the most important development in terms of learning programming languages is the internet. Most of the answers to your questions can be found by using Google search. There are also many helpful on-line tools, such as:
- Participation (20%) – Class attendance and contributions to in class discussions and activities. Readings are to be completed by the listed date.
- Homeworks (30%) – They are due Tues evening on Catalyst unless otherwise noted.
- Research Project (50%) . This is where you demonstrate what you have learned. We will be talking about potential projects right away. My office door is open! Proposal (Feb 14); Draft (Mar 5); Final project (Mar 15)