corpus linguistic what is corpus linguistic?

6
Corpus linguistic What is corpus linguistic? Corpus linguistic use large collection spoken or written natural text that are stores in computers. One of major contribution corpus linguistic is in area explore pattern language use

Upload: destiny-espinoza

Post on 30-Dec-2015

56 views

Category:

Documents


7 download

DESCRIPTION

Corpus linguistic What is corpus linguistic?. Corpus linguistic use large collection spoken or written natural text that are stores in computers . One of major contribution corpus linguistic is in area explore pattern language use. Corpus Design and compilation:. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Corpus linguistic What is corpus linguistic?

Corpus linguistic

What is corpus linguistic?•Corpus linguistic use large collection spoken or

written natural text that are stores in computers.

•One of major contribution corpus linguistic is in area explore pattern language use

Page 2: Corpus linguistic What is corpus linguistic?

Corpus Design and compilation: Corpus is a large and principled collection

text stored in electronic format.

There is no minimal size for textcollection to be consider as corpus, an

standard size set by creator Brown corpus was on

million words.

Page 3: Corpus linguistic What is corpus linguistic?

Type of corpora There are many corpus such as

1: LOB corpus

2: COCA corpus

3: BNC corpus

Page 4: Corpus linguistic What is corpus linguistic?

Issues in corpus design: One of most important factor in corpus

linguistic is design of corpus.

The composition of corpus reflect the anticipate research goal.

Corpus used for explore lexical question to very large to allow accurate representation large number of words and of the different sense or meaning that word might have.

Page 5: Corpus linguistic What is corpus linguistic?

Corpus compilation: When creating corpus ,data collection obtain or

creating electronic version of target text, and stored and organize them.

Written corpora far less labour intensive to collect than spoken corpora.

Data collection for written corpus mean: using scanner and optical character recognition software to scan paper document into electronic text files

Page 6: Corpus linguistic What is corpus linguistic?

Markup and Annotation:

Simple corpus consist of raw text, with no additional information about origin,

authors, speaker ,structure or content of text themselves.

Encode some this information in markup make corpus much richer and useful esp.

To research who were not involved in compilation.

Structural markup refer to use of code in text to identify structural feature of text