corpus linguistics as a research method · outline what a corpus is why we use corpora in...

45
Corpus linguistics as a research method 18 th Seminar on 12 th April 2016 Institute for Research in Digital Culture and Humanities Open University of Hong Kong Althea Ha Centre for Applied English Studies The University of Hong Kong

Upload: others

Post on 14-May-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Corpus linguistics as a research method · Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus

Corpus linguistics

as a research method 18th Seminar on 12th April 2016

Institute for Research in Digital Culture and Humanities

Open University of Hong Kong

Althea Ha

Centre for Applied English Studies

The University of Hong Kong

Page 2: Corpus linguistics as a research method · Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus

Outline

What a corpus is

Why we use corpora in linguistic research

Different types of corpora

Considerations when using/building a corpus

Text analytical tools

A corpus-based lexical study – Academic Word List (Coxhead, 2000)

What corpus linguistics is 2 OUHK - RIDCH 18th Seminar (April 2016) - Corpus linguistics as a research method

Page 3: Corpus linguistics as a research method · Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus

What a corpus is

A corpus is “a collection of pieces of language text in electronic form” (Sinclair, 2004, p. 19).

Text is “natural language used for communication, whether it is realised in speech or writing” (Biber & Conrad, 2009, p. 5).

3 OUHK - RIDCH 18th Seminar (April 2016) - Corpus linguistics as a research method

Page 4: Corpus linguistics as a research method · Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus

Why corpora in linguistic research

different aspects of linguistics such as lexis, registers, and genres

discipline-specific vocabulary

written vs. spoken

academic writing

new constructions in English

new words / phrases

new senses attached to existing words

4 OUHK - RIDCH 18th Seminar (April 2016) - Corpus linguistics as a research method

Page 5: Corpus linguistics as a research method · Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus

Why corpora in linguistic research

But how?

intuition?

existing prescriptive rules?

a number of authentic texts?

5 OUHK - RIDCH 18th Seminar (April 2016) - Corpus linguistics as a research method

Page 6: Corpus linguistics as a research method · Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus

Why corpora in linguistic research

A corpus is “a more reliable guide to language use than native speaker intuition is” (Hunston, 2002, p. 20).

“It is hard to imagine any area of vocabulary research into acquisition, processing, pedagogy, or assessment where the insights available from corpus analysis would not be valuable” (Schmitt, 2010, p.307).

6 OUHK - RIDCH 18th Seminar (April 2016) - Corpus linguistics as a research method

Page 7: Corpus linguistics as a research method · Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus

Why corpora in linguistic research

“Corpora and EAP are perfect

companions…corpora bring evidence of

typical patterning and salient features to

the study of academic discourse, providing

data which represent a speaker’s

experience of language in a restricted

domain” (Hyland, 2009, p. 317).

7 OUHK - RIDCH 18th Seminar (April 2016) - Corpus linguistics as a research method

Page 8: Corpus linguistics as a research method · Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus

Why corpora in linguistic research

corpora electronically processed in text

analytical tools

useful statistical information such as

number of word types, frequency, co-

occurrences

comparison between corpora

8 OUHK - RIDCH 18th Seminar (April 2016) - Corpus linguistics as a research method

Page 9: Corpus linguistics as a research method · Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus

Different types of corpora

general English

spoken English

national varieties of English

academic English

languages other than English

parallel

monolingual

(Schmitt, 2010) 9 OUHK - RIDCH 18th Seminar (April 2016) - Corpus linguistics as a research method

Page 10: Corpus linguistics as a research method · Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus

Major corpora available to date

British Naitonal Corpus (BNC) (100 million tokens)

Collins WordBanks Online (WordBanks) (553 million tokens)

Longman Corpus Network (330 million tokens)

Cambridge English Corpus (1.5 billion tokens)

Cambridge and Nottingham Corpus of Discourse in English (CANCODE)

Cambridge and Nottingham Spoken Business English (CANBEC)

Cambridge Corpus of Business English

Cambridge Corpus of Financial English

OUHK - RIDCH 18th Seminar (April 2016) - Corpus linguistics as a research method 10

Page 11: Corpus linguistics as a research method · Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus

Considerations in using/building a

corpus Matching corpus data with the research

purpose is the most crucial consideration in

the design of a corpus ( Koester, 2009;

McEnery & Hardie, 2012).

11 OUHK - RIDCH 18th Seminar (April 2016) - Corpus linguistics as a research method

Page 12: Corpus linguistics as a research method · Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus

Purpose of your research

Criteria of the target corpus

Survey of existing corpora

Existing/self-built corpus

Categories of texts in the existing corpus

Structure and size of the self-built corpus

OUHK - RIDCH 18th Seminar (April 2016) - Corpus linguistics as a research method 12

Considerations in using/building a

corpus

Page 13: Corpus linguistics as a research method · Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus

Mode

Type

Domain

Language(s)/ Language varieties

Location

Date

OUHK - RIDCH 18th Seminar (April 2016) - Corpus linguistics as a research method 13

Sinclair’s (2004) criteria

Page 14: Corpus linguistics as a research method · Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus

What corpus to use

survey of existing corpora

accessible?

free / at a cost?

categorisation of texts

read descriptions and guidelines

built-in tools in the website

OUHK - RIDCH 18th Seminar (April 2016) - Corpus linguistics as a research method 14

Page 15: Corpus linguistics as a research method · Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus

Sinclair’s (2004) steps towards a

representative corpus structural criteria principal corpus components;

available text types for each component

rank the text types in terms of importance

estimate the size of the corpus

number of text types

number of texts

total number of words

compare the self-built corpus with the original planned corpus

OUHK - RIDCH 18th Seminar (April 2016) - Corpus linguistics as a research method 15

Page 16: Corpus linguistics as a research method · Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus

no perfect corpus

even though it is huge and carefully designed

never have exactly the same characteristics as

the language itself

as representative as possible

(Sinclair, 2004)

OUHK - RIDCH 18th Seminar (April 2016) - Corpus linguistics as a research method 16

Caveat

Page 17: Corpus linguistics as a research method · Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus

Text analytical tools

17 OUHK - RIDCH 18th Seminar (April 2016) - Corpus linguistics as a research method

Existing web-based corpora

built-in tools in the website

search concordances of a word

search collocates of a word

search and compare the frequency of words and phrases in different genres

Self-built corpora

WordSmith Tools (Scott, 2012)

Range (Heatley, Nation, & Coxhead, 2002)

Page 18: Corpus linguistics as a research method · Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus

Text analytical tools

18 OUHK - RIDCH 18th Seminar (April 2016) - Corpus linguistics as a research method

Web-based mega corpus

Wordbanks at http://www.collins.co.uk/page/Wordbanks+Online

Web-based text analytical tool

VocabProfilers at http://www.lextutor.ca/vp/eng/

Desktop application

WordSmith Tools 6.0 (Scott, 2012)

Page 19: Corpus linguistics as a research method · Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus

OUHK - RIDCH 18th Seminar (April 2016) - Corpus linguistics as a research method 19

Page 20: Corpus linguistics as a research method · Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus

OUHK - RIDCH 18th Seminar (April 2016) - Corpus linguistics as a research method 20

Page 21: Corpus linguistics as a research method · Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus

OUHK - RIDCH 18th Seminar (April 2016) - Corpus linguistics as a research method 21

Page 22: Corpus linguistics as a research method · Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus

OUHK - RIDCH 18th Seminar (April 2016) - Corpus linguistics as a research method 22

Page 23: Corpus linguistics as a research method · Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus

OUHK - RIDCH 18th Seminar (April 2016) - Corpus linguistics as a research method 23

Page 24: Corpus linguistics as a research method · Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus

OUHK - RIDCH 18th Seminar (April 2016) - Corpus linguistics as a research method 24

Page 25: Corpus linguistics as a research method · Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus

OUHK - RIDCH 18th Seminar (April 2016) - Corpus linguistics as a research method 25

Page 26: Corpus linguistics as a research method · Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus

Text analytical tools

26 OUHK - RIDCH 18th Seminar (April 2016) - Corpus linguistics as a research method

Page 27: Corpus linguistics as a research method · Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus

VocabProfilers

27 OUHK - RIDCH 18th Seminar (April 2016) - Corpus linguistics as a research method

Page 28: Corpus linguistics as a research method · Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus

Text analytical tools

28 OUHK - RIDCH 18th Seminar (April 2016) - Corpus linguistics as a research method

Page 29: Corpus linguistics as a research method · Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus

OUHK - RIDCH 18th Seminar (April 2016) - Corpus linguistics as a research method 29

Page 30: Corpus linguistics as a research method · Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus

OUHK - RIDCH 18th Seminar (April 2016) - Corpus linguistics as a research method 30

Page 31: Corpus linguistics as a research method · Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus

OUHK - RIDCH 18th Seminar (April 2016) - Corpus linguistics as a research method 31

Page 32: Corpus linguistics as a research method · Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus

Academic Word List (Coxhead, 2000)

Research purpose

to develop and evaluate a new academic word list

Factors considered in building the Academic

Corpus

Representation

Organization

Size

Word selection

32 OUHK - RIDCH 18th Seminar (April 2016) - Corpus linguistics as a research method

Page 33: Corpus linguistics as a research method · Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus

Academic Word List (Coxhead, 2000)

Representation

not only textbooks, but also a range of academic texts

158 journal articles (print)

51 edited journal articles (online)

43 complete university textbooks or course books

42 texts from the Learned and Scientific section of the Wellington Corpus of Written English (Bauer, 1993)

etc.

33 OUHK - RIDCH 18th Seminar (April 2016) - Corpus linguistics as a research method

Page 34: Corpus linguistics as a research method · Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus

Academic Word List (Coxhead, 2000)

Organization

4 disciplines

arts, commerce, law, science

28 subject areas

34 OUHK - RIDCH 18th Seminar (April 2016) - Corpus linguistics as a research method

Page 35: Corpus linguistics as a research method · Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus

Academic Word List (Coxhead, 2000)

Size

3.5 million running words

so as to identify 100 occurrences of a word

family

Coxhead referred to the data from Brown

Corpus (Francis & Kucera, 1982)

35 OUHK - RIDCH 18th Seminar (April 2016) - Corpus linguistics as a research method

Page 36: Corpus linguistics as a research method · Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus

Academic Word List (Coxhead, 2000) Cohead, 2000, p. 220

36 OUHK - RIDCH 18th Seminar (April 2016) - Corpus linguistics as a research method

Page 37: Corpus linguistics as a research method · Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus

Academic Word List (Coxhead, 2000)

Word selection

What a word is

morphologically different words (e.g. –s and –

ed)

word types word families

“[a] word family was defined as a stem plus all

closely related affixed forms…” (Coxhead,

2000, p. 218)

37 OUHK - RIDCH 18th Seminar (April 2016) - Corpus linguistics as a research method

Page 38: Corpus linguistics as a research method · Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus

Taking “analyse” as an example

regular inflections

analysed, analysing, analyses

derivations

analyser, analysers, analysis, analyst, analysts,

analytic, analytical, analytically

American spelling

analyze, analyzed, analyzes, analyzing

OUHK - RIDCH 18th Seminar (April 2016) - Corpus linguistics as a research method 38

Page 39: Corpus linguistics as a research method · Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus

Academic Word List (Coxhead, 2000)

Methods

Range (Heatley & Nation, 1996)

Criteria for a member of a word family

Specialised Occurrence

excluding 2000 most frequent words

Range

occurs at least 10 times in each discipline

occurs in 15 or more subject areas (out of 28)

Frequency

occurs at least 100 times in the Academic Corpus

39 OUHK - RIDCH 18th Seminar (April 2016) - Corpus linguistics as a research method

Page 40: Corpus linguistics as a research method · Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus

Academic Word List (Coxhead, 2000)

Results

570 word families

12% word coverage for commerce

9.3% word coverage for arts

9.4% word coverage for law

9.1% word coverage for science

Average 10% word coverage for academic texts

40 OUHK - RIDCH 18th Seminar (April 2016) - Corpus linguistics as a research method

Page 41: Corpus linguistics as a research method · Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus

Corpus linguistics

“Corpus linguistics is a research approach

that has developed over the past several

decades to support empirical investigations

of language variation and use, resulting in

research findings that have much greater

generalizability and validity than would

otherwise be feasible” (Biber, Reppen, &

Friginal, 2010, p. 548).

OUHK - RIDCH 18th Seminar (April 2016) - Corpus linguistics as a research method 41

Page 42: Corpus linguistics as a research method · Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus

Corpus linguistics Corpus linguistics involves “dealing with some

set of machine-readable texts which is deemed an appropriate basis on which to study a specific set of research questions” (McEnery & Hardie, 2012, p. 1).

a methodological approach rather than a model of language (Biber, Conrad, & Reppen, 1998, p. 4)

42 OUHK - RIDCH 18th Seminar (April 2016) - Corpus linguistics as a research method

Page 43: Corpus linguistics as a research method · Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus

Corpus linguistics it is empirical, analyzing the actual patterns of use in

natural texts;

it utilizes a large and principled collection of natural texts, known as a corpus as the basis for analysis;

it makes extensive use of computers for analysis, employing both automatic and interactive techniques;

it depends on both quantitative and qualitative analytical techniques.

(Biber, Conrad, & Reppen, 1998, p. 4)

43 OUHK - RIDCH 18th Seminar (April 2016) - Corpus linguistics as a research method

Page 44: Corpus linguistics as a research method · Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus

List of References Bauer, L., & Nation, I. S. P. (1993). Word families. International Journal of Lexicography, 6(3), 1–27.

Biber, D., Conrad, S., & Reppen, R. (1998). Corpus linguistics: Investigating language structure and use. Cambridge: Cambridge University Press.

Biber, D. E., Reppen, R., & Friginal, E. (2010). Research in corpus linguistics. In R. B. Kaplan (Ed.), The Oxford Handbook of Applied Linguistics

(pp. 548-570). New York: Oxford University Press.

Cheng, W. (2012). Exploring corpus linguistics: Language in action. New York: Routledge.

Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34(2), 213-238.

Heatley, A., Nation, I.S.P. and Coxhead, A. (2002). Range and Frequency programmes. http://www.victoria.ac.nz/lals/about/staff/paul-nation

Hyland, K. (2009). Corpora and EAP: Specificity in disciplinary discourses. In Goźdź-Roszkowski (Ed.), Explorations across languages and corpora

(pp. 317-334). Frankfurt am Main: Peter Lang.

Koester, A. (2010). Building small specialised corpora. In A. O’Keeffe and M. McCarthy (Eds.), The Routledge handbook of corpus linguistics (pp.

66-79). London: Routledge.

McEnery, T. & Hardie, A. (2012). Corpus linguistics. Cambridge: Cambridge University Press.

Schmitt, N. (2010). Research vocabulary: A vocabulary research manual. Basingstoke: Palgrave Macmillan.

Scott, M. (2012). WordSmith tools version 6. Liverpool: Lexical Analysis Software Ltd.

Sinclair, J. M. (2004). Corpus and text: Basic principles. In M. Wynne (Ed.), Developing linguistic corpora: A guide to good practice. Oxford:

Oxbow Books. Retrieved from http://ota.ahds.ac.uk/documents/creating/dlc/chapter1.htm#section3

OUHK - RIDCH 18th Seminar (April 2016) - Corpus linguistics as a research method 44

Page 45: Corpus linguistics as a research method · Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus

Thank you for your attention.

OUHK - RIDCH 18th Seminar (April 2016) - Corpus linguistics as a research method 45