A few words on corpus linguistics part 2

by Ron Carter

Part 2 of 2

In the second of this two-part blog entry, Prof. Ronald Carter of the University of Nottingham looks in more detail at the kind of information corpora can reveal about the use of language and why this is so important for the development of language teaching materials. Continue reading “A few words on corpus linguistics part 2”

A few words on corpus linguistics

Part 1 of 2 

by Ron Carter

In the first of a two-part blog entry, Prof. Ronald Carter of the University of Nottingham provides a brief introduction to corpora and corpus linguistics, exploring ways in which corpora are currently being used to inform language teaching and the development of teaching materials.

What is a corpus?

corpus noun (plural corpuses or corpora) the collection of a single writer’s work or of writing about a particular subject, or a large amount of written and sometimes spoken material collected to show the state of a language

Cambridge Advanced Learner’s Dictionary Third Edition (2008) Cambridge: Cambridge University Press

Many corpora these days run to millions of words. The British National Corpus (BNC), for example, consists of 100 million words of English: a written part (90%) includes newspapers, magazines, journals, books, letters, memos, essays, etc and a spoken part (10%) includes conversations, recorded in a way that achieves a demographic balance, as well as a range of spoken language from business or government meetings, radio shows, phone-ins, etc. These large collections of text are stored and read electronically, allowing researchers to employ a variety of software to reveal different patterns of language that exist within the corpus.

Continue reading “A few words on corpus linguistics”