paul thompson applied linguistics ([email protected]) corpora: resources for the study of...
TRANSCRIPT
![Page 2: Paul Thompson Applied Linguistics (p.a.thompson@reading.ac.uk) Corpora: Resources for the study of language](https://reader036.vdocuments.site/reader036/viewer/2022062318/5515f99b550346a2308b48a2/html5/thumbnails/2.jpg)
160 lectures, 39 seminars Transcripts, video and audio 199 XML files:
Transcripts with detailed annotation Metadata included in header
160 lecture transcripts are tagged for Part-of-Speech
www.reading.ac.uk/AcaDepts/ll/base_corpus/ Funded by AHRB, Euralex, BALEAP and university
sources
![Page 3: Paul Thompson Applied Linguistics (p.a.thompson@reading.ac.uk) Corpora: Resources for the study of language](https://reader036.vdocuments.site/reader036/viewer/2022062318/5515f99b550346a2308b48a2/html5/thumbnails/3.jpg)
A corpus of assessed student writing at university level
Texts collected at Warwick, Reading and Oxford Brookes University
Funded by Economic and Social Research Council of England (ESRC)
RES-000-23-0800
![Page 4: Paul Thompson Applied Linguistics (p.a.thompson@reading.ac.uk) Corpora: Resources for the study of language](https://reader036.vdocuments.site/reader036/viewer/2022062318/5515f99b550346a2308b48a2/html5/thumbnails/4.jpg)
6.5 million words 2,896 texts
2,761 assignments XML files, POS-tagged
30+ disciplines 4 levels of study
![Page 5: Paul Thompson Applied Linguistics (p.a.thompson@reading.ac.uk) Corpora: Resources for the study of language](https://reader036.vdocuments.site/reader036/viewer/2022062318/5515f99b550346a2308b48a2/html5/thumbnails/5.jpg)
Query interface:
Sketch Engine
Commercial service: Applied Linguistics
pays annual subscription
![Page 6: Paul Thompson Applied Linguistics (p.a.thompson@reading.ac.uk) Corpora: Resources for the study of language](https://reader036.vdocuments.site/reader036/viewer/2022062318/5515f99b550346a2308b48a2/html5/thumbnails/6.jpg)
![Page 7: Paul Thompson Applied Linguistics (p.a.thompson@reading.ac.uk) Corpora: Resources for the study of language](https://reader036.vdocuments.site/reader036/viewer/2022062318/5515f99b550346a2308b48a2/html5/thumbnails/7.jpg)
![Page 8: Paul Thompson Applied Linguistics (p.a.thompson@reading.ac.uk) Corpora: Resources for the study of language](https://reader036.vdocuments.site/reader036/viewer/2022062318/5515f99b550346a2308b48a2/html5/thumbnails/8.jpg)
Level Raw Rel %
3 225 121.7
2 275 107.7
1 255 96.0
PG 66 62.1
![Page 9: Paul Thompson Applied Linguistics (p.a.thompson@reading.ac.uk) Corpora: Resources for the study of language](https://reader036.vdocuments.site/reader036/viewer/2022062318/5515f99b550346a2308b48a2/html5/thumbnails/9.jpg)
BASE: Linking audio and video to the transcripts, either online or on hard drives Insertion of timestamp data into transcripts
Example Why?
Access to temporal, spatial, paralinguistic, phonological information
Studies of speech rate, for example
![Page 10: Paul Thompson Applied Linguistics (p.a.thompson@reading.ac.uk) Corpora: Resources for the study of language](https://reader036.vdocuments.site/reader036/viewer/2022062318/5515f99b550346a2308b48a2/html5/thumbnails/10.jpg)
Comparison between languages Historical linguistics Stylistics Studies of language in use Specialised language use [eg, doctor-
patient interactions] Investigations of multimodality
![Page 11: Paul Thompson Applied Linguistics (p.a.thompson@reading.ac.uk) Corpora: Resources for the study of language](https://reader036.vdocuments.site/reader036/viewer/2022062318/5515f99b550346a2308b48a2/html5/thumbnails/11.jpg)
PhD thesis corpus Electronic submission
Academic speech events Seminars, tutorials, etc
Student use of computers in preparing assignments [video and text]
Reading and writing of undergraduates
![Page 12: Paul Thompson Applied Linguistics (p.a.thompson@reading.ac.uk) Corpora: Resources for the study of language](https://reader036.vdocuments.site/reader036/viewer/2022062318/5515f99b550346a2308b48a2/html5/thumbnails/12.jpg)
Hosting corpus resources at Reading or other university – preferably on Linux servers – with customisable interfaces BASE, BAWE, and other corpora that Reading
possesses For use by all departments at Reading and also
elsewhere Varied levels of user access Centralised support needed – lack of continuity
with project staff