Stefan Th. Gries (UC Santa Barbara & JLU Giessen)

This course will introduce basic corpus-linguistic analysis using the open-source programming language R. It presupposes you have recent versions of R and RStudio installed and some general knowledge of R at this level (hyperlink to be provided later). The course will consist of three parts:

Part 1: one introductory/preparatory session

  • 12 June 2023, 08.30-10.15: text/character string processing (esp. with regular expressions)

Part 2: basic corpus-linguistic applications

  • 12 June 2023, 10:30-12:00: frequency lists
  • 12 June 2023, 13:00-14:45: dispersion measure
  • 12 June 2023, 15:00-16:45: concordancing and collocation

Part 3: showcasing diverse applications

  • 13 June 2023, 08:30-10:15: more than one dimension of information
  • 13 June 2023, 10:30-12:00: combing the above w/ other functions/packages

Data and other files will be made available on my website.

Registration is capped at 25 students.