by Ron Carter
Part 2 of 2
In the second of this two-part blog entry, Prof. Ronald Carter of the University of Nottingham looks in more detail at the kind of information corpora can reveal about the use of language and why this is so important for the development of language teaching materials.
Here is a list of the top twenty most frequent three word chunks from the Cambridge and Nottingham Business English Corpus (CANBEC) compared with a corpus of spoken academic English (ACAD). The numbers are occurrences per million words. The items in bold are discussed below.
|CANBEC||per m||Spoken ACAD||per m|
|1||I don’t know||642||1||a lot of||477|
|2||a lot of||563||2||I don’t know||469|
|3||at the moment||485||3||one of the||442|
|4||we need to||438||4||you can see||364|
|5||I don’t think||378||5||this is a||358|
|6||the end of||376||6||you have to||343|
|7||in terms of||243||7||this is the||338|
|8||a bit of||241||8||in terms of||300|
|9||be able to||237||9||a sort of||297|
|10||at the end||235||10||there is a||276|
|11||end of the||230||11||and this is||271|
|12||and I think||229||12||look at the||268|
|13||I think it’s||229||13||the end of||265|
|14||to do it||223||14||the sort of||265|
|15||we have to||208||15||at the end||253|
|16||have a look||196||16||you want to||253|
|17||I think we||194||17||you know the||250|
|18||you know the||192||18||do you think||247|
|19||a couple of||187||19||to do with||247|
|20||we’ve got a||184||20||and so on||239|
I don’t know is high in both corpora, and in both cases it is frequently followed by reporting clauses  beginning with if or a wh-word. A lot of, a couple of and sort of, all rather vague expressions, are also evident in both (though not all shown in the table), as is the specifying expression  in terms of. So we have a mix of specific and vague expressions but overall there is more vagueness. The CANBEC list has four chunks involving think, perhaps reflecting the constant speculating and hedging in business negotiations. And I don’t know is often used when beginning to explore possibilities, allowing us not to reveal what we do know too openly. Vague language also allows us to hedge our bets when we speak with one another.
Both corpora have chunks that refer to looking at things (i.e. considering things), with ACAD also including you can see, a structure which mirrors more direct one-to-one instruction typical of many academic contexts. CANBEC has a high occurrence of at the moment, perhaps suggesting the constant flux and change in business situations. The CANBEC list also brings together the high-frequency key words we and need (we need to at no. 4). This reflects the high incidence of statements of collective goals in spoken business English, (mirroring the corporate mantra there’s no I in team), for need is often used in business requests and directives when we don’t want to sound too forceful.
We in CANBEC carries a wide range of references, from very broad corporate references to smaller, group references and to the individual speaker, who may use it to shelter behind corporate authority or responsibility or to avoid embarrassment for those present in, say, a meeting. The following examples illustrate this:
We need to have a close look at it and then you’re gonna check it over aren’t you? We need to revisit that really. We need to approach them to see if we can get the price down. We need to figure it out about the server.
Here is a stretch of conversation from the corpus which illustrates one use of we as a reference to the individual speaker (with all names and places anonymised, of course).
[Meeting between a multinational car manufacturer and a British hydraulics company. They are discussing product development.]
Speaker 3: I mean ultimat… ultimately it’s your decision whether you want a…
Speaker 1: True. But er …
Speaker 3: a hard blow fuse if you like or a a resettable fuse.
Speaker 1: You’re right. But the thing is I mean we need to know what your rationale is. And if you say ‘We prefer to have a resettable one because we we know this is a problem’ then it will help Nigel to make that decision you see.
The relevance of corpora data
We have so far simply scratched the surface of what a corpus can reveal but it may give you a flavour of the possibilities open to course book, materials and dictionary writers. Clearly there are times when frequency lists and hard quantitative evidence of patterns of language from multi-million word databases are valuable; and there are other times when it is important to look at language more qualitatively, exploring longer stretches of continuous language from the corpus. There are times when it helps for examples to be made up for the purposes of graded learning; and there are other times when it helps for there to be examples of real English. Any examples drawn from the CANBEC database will be authentic, produced in real contexts of use and the comparison between different corpora allows the different linguistic fingerprints of different registers to emerge.
A corpus allows you to look at the language a little more objectively and to give evidence for our judgements and intuitions. Clearly, CANBEC needs to be supplemented by written business corpora from the Cambridge English Corpus, but this corpus both helps us understand better the differences and distinctions between spoken and written versions of the language and gives us insights and information that go beyond what intuition and conventional knowledge about the language can tell us.
Some text for this blog has been extracted from: O’Keeffe, A., McCarthy, M. and Carter, R. (2007) From Corpus to Classroom: Language use and language teaching. Cambridge: Cambridge University Press. For further reading on CANBEC see Handford, M. (2011) The Language of Business MeetingsCambridge: Cambridge University Press.
Prof. Ronald Carter, School of English Studies,University of Nottingham
12 thoughts on “A few words on corpus linguistics part 2”
What strikes me, as an American, is the fact that the words and phrases that appear in these messages are very common on this side of the Atlantic as well; there is nothing distinctly “British” here. (The Queen might claim copyright on the pseudo-royal “we,” but I’ve heard it far too often from minor corporate executives.) You are clearly discussing world English.
Perhaps this reflects the dominance of American voices in popular media (film, TV, hip hop, etc.). There’s also the fact that American academics tend to dominate fields related to business management. (I’ve heard this from an educator in Curacao.) Then there’s the fact that increased international travel and communications — those phone banks in India and the Philippines, for instance — is erasing regional differences, much as regional accents have faded (but not disappeared) in the US.
The UK, of course, is brilliant at protecting regional accents. I once visited Durham and had no idea what most people were saying. I’m sure, however, that they would quickly revert to your list in a business environment.
I have a new word:
(Tals)…..a technologically advanced living species navigating the ufos and including mission controll…these spices are a select group of individuals.
the tals have a great jump start in this Darwinen universe …..
alien…….its slang……meaning…..anything not from earth….alien microbe ……
extraterrestrial …..its slang….to long….meaning……anything not from earth…..
give the technologically advanced living species navigating the ufos a name thank you. Be advised that we are already using the word in the field…..
Pingback: Linguistics: oddments, miscellany and paraphernalia | ELT Infodump
Being not just in criminal law, but a heavy hitter, you want to tell me about corpus. Lets make a fun duel for the members if you are sure enough of Y O U lol
could be fun and I am sure the general membership took much away from your good but rather drab assistance.
God Bless – Keep Tryin
Pingback: New words – 17 February 2020 – About Words – Cambridge Dictionaries Online blog – Get Proficiency in English
Chomsky sez corpora ar USELESS –
‘innit’ is ‘authentic’ – the definition of authenticity is suspect.
‘er’ is the sound in ERic.
why’r pepl insensibl?
And why can’t pepl say ‘one’ ie WUN
when they southernise or americanise accent ?
i do now beacuse so many covid viruses?
Do your systems track whether or not a word or phrase was used correctly, and should it count if it was not? And at what point (what frequency, I suppose, or maybe how many similarly incorrect uses) would a particular use come to be considered acceptable?
I’ve been thinking about this a lot lately, as I’ve spent a fair amount of this year of isolation contributing to and editing a blog that’s been developed over the last five or so years. Previous major contributors were American, British, and Brazilian, with varying levels of education in English (and with varying levels of translation services) — and I believe that they’re mostly younger than I am, as well (my assessment, based on their use of the language). It’s been by turns fascinating, somewhat horrifying, amusing, confusing, and — most importantly — time consuming.
I just wish I could do well my vocaboray
Question: Are all these posted responses above mine responding to the blog post : “A few words on corpus linguistics part 2”? I am confused by some of them. 🙂 lol
English is an interesting language but yes, any language will have two or more meanings so it is very rich. Thanks for this article useful for me and everyone.