ELL237 : Big Data: Language & Digital Corpora

Convener(s): Dr Gabriel Ozon


General Information

Data-oriented approaches characteristic of digital humanities have benefitted from the empirical footing provided by corpus linguistics (CL) techniques, which allow researchers to harness big data in order to answers research question related to language and communication. CL investigates databases containing millions/billions of words of actual spoken/written language. These are used (i) to map current usage, (ii) to verify hypotheses from various sub-disciplines, and (iii) to obtain straight answers to questions old and new, by applying state-of-the-art corpus technology. This module introduces issues of using big data in linguistic studies. It builds on the knowledge acquired from other modules (Structure of English, Varieties of English, History of English, Sociolinguistics). Students will be introduced to the notion of the linguistic corpus and will be exposed to various digital language databases currently in the public domain.


With a dual focus on ‘why’ and ‘how to’ in quantitative language studies, this practical module will be delivered through a series of lectures and hands-on lab sessions. The weekly teaching typically comprises two parts. The first is a lecture introducing key concepts, theories and data analysis skills (outcomes 1, 2, 3). Lab sessions are designed as hands-on events, aiming both at (i) exposing students to some available digital language databases, as well as to (ii) providing familiarity with tools and techniques for manipulating data. Students may be asked to replicate a case study (either the one illustrated in the theoretical component, or a different one), and to address a set research question/hypothesis, for which they would need to get involved in obtaining and manipulating relevant data (outcomes 3, 4, 5, 6, 7).


The assessment consists of (i) a 1,500-word essay that critically reviews either a corpus exploration tool or a corpus-based study (40%); and (ii) a 2,500-word project report based on students’ independent research (60%).

Contact Details

0114 222 8478


Information last changed: Friday 09th of February 2018 :: 01:52:56 PM (GMT)

Please note: This module may or may not run in any individual session. Please check with the course convener.


The University of Sheffield, Western Bank, Sheffield, S10 2TN, UK