Gaëtanelle Gilquin

From written product to writing process: A new direction in learner corpus research

The field of learner corpus research seeks to investigate learner language through the analysis of authentic language data available in electronic format (see Granger et al. 2015). For writing, the data typically take the form of texts produced by learners in the target language. The International Corpus of Learner English (Granger et al. 2020), for example, one of the earliest written learner corpora, is made up of essays as they were submitted by the learners who wrote them. In other words, most learner corpora represent the written product, i.e. the text in its final stage.

Although in L2 writing studies, the writing process, i.e. the different steps leading to the final product, has been considered since the end of the 1970s (Matsuda 2003), “the influence of corpora in the study of writing processes has been limited” (Wärnsby et al. 2016: 198). Recently, however, some learner corpora have been compiled that give a glimpse of what the learner writing process may look like. This is the case of corpora that include several drafts of the same text, such as the Hanken Corpus of Academic Written English for Economics (Mäkinen & Hiltunen 2016) or the CityU Corpus of Essay Drafts of English Language Learners (Lee et al. 2015). This is also the case of the Marburg corpus of Intermediate Learner English (Kreyer 2015), which shows some traces of revision in handwritten texts (deletions or insertions kept visible in the manuscripts).

Another learner corpus that gives access to the writing process, in addition to the written product, is the Process Corpus of English in Education (PROCEED; Gilquin 2022). This corpus comprises finished texts, like most learner corpora, but for each text it also includes a screencast video, recorded by means of OBS Studio (Jim & OBS Studio contributors 2012), which shows the screen activity from the beginning to the end of the writing task, as well as a keystroke log file, recorded by means of Inputlog (Leijten & Van Waes 2013), which represents all the keys struck on the keyboard at any point during the whole writing process.

It will be shown how such learner corpora can provide new insights into learner writing, making it possible to investigate aspects such as linearity in text production, writing fluency or the use of online resources. We will also see how the writing process can help explain certain features of the written product, and how the two could be jointly exploited in the description or assessment of L2 writing. More generally, it will be argued that learner corpus research integrating the process dimension can refine our views of learner language in unprecedented ways.


  • Gilquin, G. 2022. The Process Corpus of English in Education: Going beyond the written text. Research in Corpus Linguistics 10(1): 31-44.
  • Granger, S., M. Dupont, F. Meunier, H. Naets & M. Paquot. 2020. The International Corpus of Learner English. Version 3. Louvain-la-Neuve: Presses universitaires de Louvain.
  • Granger, S., G. Gilquin & F. Meunier (eds). 2015. The Cambridge Handbook of Learner Corpus Research. Cambridge: Cambridge University Press.
  • Jim & OBS Studio contributors. 2012. OBS Studio,
  • Kreyer, R. 2015. The Marburg Corpus of Intermediate Learner English (MILE). In M. Callies & S. Götz (eds) Learner Corpora in Language Testing and Assessment (pp. 13-34). Amsterdam: John Benjamins.
  • Lee, J., C. Yan Yeung, A. Zeldes, M. Reznicek, A. Ladling & J. Webster. 2015. CityU Corpus of Essay Drafts of English Language Learners: A corpus of textual revision in second language writing. Language Resources and Evaluation 49(3): 659-683.
  • Leijten, M. & L. Van Waes. 2013. Keystroke logging in writing research: Using Inputlog to analyze and visualize writing processes. Written Communication 30(3): 358-392.
  • Mäkinen, M. & T. Hiltunen. 2016. Creating a corpus of student writing in economics: Structure and representativeness. In M. J. López-Couso, B. Méndez-Naya, P. Núñez-Pertejo & I. M. Palacios-Martínez (eds) Corpus Linguistics on the Move: Exploring and Understanding English through Corpora (pp. 41-58). Leiden: Brill.
  • Matsuda, P. K. 2003. Second language writing in the twentieth century: A situated historical perspective. In B. Kroll (ed.) Exploring the Dynamics of Second Language Writing (pp. 15-34). Cambridge: Cambridge University Press.
  • Wärnsby, A., A. Kauppinen, A. Eriksson, M. Wiktorsson, E. Bick & L.-J. Olsson. 2016. Building interdisciplinary bridges. MUCH: The Malmö University-Chalmers Corpus of Academic Writing as a Process. In O. Timofeeva, A.-C. Gardner, A. Honkapohja & S. Chevalier (eds) New Approaches to English Linguistics: Building Bridges (pp. 197-211). Amsterdam: John Benjamins.

Gaëtanelle Gilquin is a Professor of English Language and Linguistics at the University of Louvain, Belgium, and a member of the Centre for English Corpus Linguistics. Her research interests include learner corpus research and cognitive linguistics. She is the author of Corpus, Cognition and Causative Constructions (2010), and one of the editors of Linking up Contrastive and Learner Corpus Research (2008), A Taste for Corpora (2011), The Cambridge Handbook of Learner Corpus Research (2015) and Applied Construction Grammar (2016). She is a co-founding member of the Learner Corpus Association and the coordinator of several corpus projects: the Louvain International Database of Spoken English Interlanguage (LINDSEI), the New Englishes Student Interviews corpus (NESSI) and the Process Corpus of English in Education (PROCEED). She is also the co-editor-in-chief of the book series Corpora and Language in Use and an associate editor of the Cambridge Elements in Corpus Linguistics series.