Corpus-linguistic text processing with R

Stefan Th. Gries (UC Santa Barbara & JLU Giessen)

This course will introduce basic corpus-linguistic analysis using the open-source programming language R. It presupposes you have recent versions of R and RStudio installed and some general knowledge of R about data structures (in particular vectors), functions and arguments. The course will consist of two parts:

Part 1: two introductory/preparatory sessions
– 20 April 2021, 10:30 – 12:00 MDT: General programming aspects of R (incl. lists, loops, conditionals, and input/output)
– 27 April 2021, 10:30 – 12:00 MDT: Text/character string processing (esp. with regular expressions)

After each part, you should read the relevant sections in the courses’s textbook: Gries, Stefan Th. 2016. Quantitative corpus linguistics with R. 2nd rev. & ext. ed. New York & London: Routledge, pp. 274.

Part 2: four corpus-linguistic applications
– 11 May 2021, 10:30 – 12:00 MDT: Elementary frequency lists
– 13 May 2021, 10:30 – 12:00 MDT: Computing dispersion measures
– 18 May 2021, 10:30 – 12:00 MDT: Concordancing and collocation
– 20 May 2021, 10:30 – 12:00 MDT: More ‘advanced ‘ aspects

Note: About three days before each course session, you will get an email with a small preparatory task that will hopefully facilitate the presentation of code in the actual session itself.

CCP

Centre for Comparative Psycholinguistics

Corpus-linguistic text processing with R

Stefan Th. Gries (UC Santa Barbara & JLU Giessen)