Aki-Juhani Kyröläinen (McMaster and Brock)

Wednesday May 19, 10:30am – 3:00pm MST
Friday May 21, 10:30am – 3:00pm MST
There will be a 30 minute break in the middle

Free-form texts provide a unique perspective on understanding how language use can reflect aspects of cognition. At the same time, the quantitative analysis of free-form texts presents multiple challenges for psycholinguistic research. In this course, participants will familiarize themselves with tools related to preprocessing of texts such as tokenization, lemmatization and syntactic parsing based on Universal Dependencies. These preprocessing steps provide opportunities for further quantitative analysis of free-form texts. During the course, quantitative semantic analysis of free-form texts is carried out with structural topic modeling. Finally, we will utilize topic modeling to generate features from the texts and use them in combination with machine learning. This allows us to model various individual-level variables commonly employed in psycholinguistic research, such as level of education, perceived loneliness, and the frequency of memory failures.

Course prerequisites: The participants are comfortable in using R independently and performing linguistic analysis of language data. The course will also make use of packages related to tidyverse so prior knowledge of them is a bonus.