Stefan Th. Gries (UC Santa Barbara & JLU Giessen)
This course will introduce basic corpus-linguistic analysis using the open-source programming language R. It presupposes you have recent versions of R and RStudio installed and some general knowledge of R at this level (hyperlink to be provided later). The course will consist of three parts:
Part 1: one introductory/preparatory session
- 12 June 2023, 08.30-10.15: text/character string processing (esp. with regular expressions)
Part 2: basic corpus-linguistic applications
- 12 June 2023, 10:30-12:00: frequency lists
- 12 June 2023, 13:00-14:45: dispersion measure
- 12 June 2023, 15:00-16:45: concordancing and collocation
Part 3: showcasing diverse applications
- 13 June 2023, 08:30-10:15: more than one dimension of information
- 13 June 2023, 10:30-12:00: combing the above w/ other functions/packages
Data and other files will be made available on my website.
Registration is capped at 25 students.