Part 1 of 2
by Ron Carter
In the first of a two-part blog entry, Prof. Ronald Carter of the University of Nottingham provides a brief introduction to corpora and corpus linguistics, exploring ways in which corpora are currently being used to inform language teaching and the development of teaching materials.
What is a corpus?
corpus noun (plural corpuses or corpora) the collection of a single writer’s work or of writing about a particular subject, or a large amount of written and sometimes spoken material collected to show the state of a language
Cambridge Advanced Learner’s Dictionary Third Edition (2008) Cambridge: Cambridge University Press
Many corpora these days run to millions of words. The British National Corpus (BNC), for example, consists of 100 million words of English: a written part (90%) includes newspapers, magazines, journals, books, letters, memos, essays, etc and a spoken part (10%) includes conversations, recorded in a way that achieves a demographic balance, as well as a range of spoken language from business or government meetings, radio shows, phone-ins, etc. These large collections of text are stored and read electronically, allowing researchers to employ a variety of software to reveal different patterns of language that exist within the corpus.
Continue reading “A few words on corpus linguistics”