multi-modal corpus design, construction and usepszaxc/dress/baal07.pdf · capturing, transcribing...
TRANSCRIPT
![Page 1: Multi-modal corpus design, construction and usepszaxc/DReSS/BAAL07.pdf · Capturing, transcribing and aligning, and adding gesture to transcription 2) Tracking, defining and coding](https://reader036.vdocuments.site/reader036/viewer/2022070113/605aeebeb7d9736944798ffc/html5/thumbnails/1.jpg)
Multi-modal corpus design,construction and use
David Evans, Dawn Knight, Ronald Carter and Svenja Adolphs
BAAL 20076-8th September 2007, The University of Edinburgh
![Page 2: Multi-modal corpus design, construction and usepszaxc/DReSS/BAAL07.pdf · Capturing, transcribing and aligning, and adding gesture to transcription 2) Tracking, defining and coding](https://reader036.vdocuments.site/reader036/viewer/2022070113/605aeebeb7d9736944798ffc/html5/thumbnails/2.jpg)
Introducing the Digital Record Project:
• 3-year research initiative, funded by the Economic andSocial Research Council (ESRC)
• Part of an e-Social Science ‘Node’ based at TheUniversity of Nottingham
• Interdisciplinary project, involving staff from Psychology,Applied Linguistics and Computer Science
• Develop a multi-modal corpus of spoken interaction: theNottingham Multi-Modal Corpus (NMMC)
![Page 3: Multi-modal corpus design, construction and usepszaxc/DReSS/BAAL07.pdf · Capturing, transcribing and aligning, and adding gesture to transcription 2) Tracking, defining and coding](https://reader036.vdocuments.site/reader036/viewer/2022070113/605aeebeb7d9736944798ffc/html5/thumbnails/3.jpg)
The Nottingham Multi-Modal Corpus (NMMC):
Corpus data:
• 250,000 words• 125,000 words of 1-party data
125,000 words of 2-party data• Data in three different modes: textual, audio and video
Corpus tool-bench:
• Develop a reusable corpus tool (with appropriatelinguistic software)
• Search lexical, prosodic and gestural features of spokendiscourse
![Page 4: Multi-modal corpus design, construction and usepszaxc/DReSS/BAAL07.pdf · Capturing, transcribing and aligning, and adding gesture to transcription 2) Tracking, defining and coding](https://reader036.vdocuments.site/reader036/viewer/2022070113/605aeebeb7d9736944798ffc/html5/thumbnails/4.jpg)
Key Methodological Issues:
1) Data collection and collation:Capturing, transcribing and aligning, and addinggesture to transcription
2) Tracking, defining and coding gesture of interest:Using specifically developed software to track andautomatically encode gestures according to a pre-defined kinesic coding scheme
3) Representing the data in an easy-to-use interface forfurther analysis:
Constructing an intelligent corpus database andassociated software (including a text/ gesture
concordancer)
![Page 5: Multi-modal corpus design, construction and usepszaxc/DReSS/BAAL07.pdf · Capturing, transcribing and aligning, and adding gesture to transcription 2) Tracking, defining and coding](https://reader036.vdocuments.site/reader036/viewer/2022070113/605aeebeb7d9736944798ffc/html5/thumbnails/5.jpg)
1a) Capturing data
Naturalistic data v. Usable video image
![Page 6: Multi-modal corpus design, construction and usepszaxc/DReSS/BAAL07.pdf · Capturing, transcribing and aligning, and adding gesture to transcription 2) Tracking, defining and coding](https://reader036.vdocuments.site/reader036/viewer/2022070113/605aeebeb7d9736944798ffc/html5/thumbnails/6.jpg)
1b) Transcribing and aligning data
• All data is transcribed using CANCODE transcriptionconventions.
• Data is also time-stamped using Transana, linking thetextual and audio streams:
¤<139851><$1> But if it's if it's utterly irrelevant then you're alright.
¤<143459><$2> Right.¤<143793><$1> Do you see what I mean cos cos
you're not there's no interfering factor then.
¤<147602><$2> Yeah so s=¤<148138><$1> Erm so that sounds like it's okay.¤<150144><$2> Okay.
![Page 7: Multi-modal corpus design, construction and usepszaxc/DReSS/BAAL07.pdf · Capturing, transcribing and aligning, and adding gesture to transcription 2) Tracking, defining and coding](https://reader036.vdocuments.site/reader036/viewer/2022070113/605aeebeb7d9736944798ffc/html5/thumbnails/7.jpg)
• Do you see what I mean cos cos you'renot there's [no interfering factor] then.
onset stroke retraction
1c) Adding gesture to transcription
![Page 8: Multi-modal corpus design, construction and usepszaxc/DReSS/BAAL07.pdf · Capturing, transcribing and aligning, and adding gesture to transcription 2) Tracking, defining and coding](https://reader036.vdocuments.site/reader036/viewer/2022070113/605aeebeb7d9736944798ffc/html5/thumbnails/8.jpg)
2a) Defining gestures of interest for codification
STUFF
Figure 2: Division of the gesture space fortranscription purposes. (From McNeill, 1992: 378 )
![Page 9: Multi-modal corpus design, construction and usepszaxc/DReSS/BAAL07.pdf · Capturing, transcribing and aligning, and adding gesture to transcription 2) Tracking, defining and coding](https://reader036.vdocuments.site/reader036/viewer/2022070113/605aeebeb7d9736944798ffc/html5/thumbnails/9.jpg)
2b) Coding gestures of interest
Figure 3: Computer image trackingapplied to video
We have developed a4-point coding schemefor hand movement:
1) Left hand moves to the left
2) Left hand moves to the right
3) Right hand moves to the left
4) Right hand moves to the right
![Page 10: Multi-modal corpus design, construction and usepszaxc/DReSS/BAAL07.pdf · Capturing, transcribing and aligning, and adding gesture to transcription 2) Tracking, defining and coding](https://reader036.vdocuments.site/reader036/viewer/2022070113/605aeebeb7d9736944798ffc/html5/thumbnails/10.jpg)
2c) Turning raw data to corpus data
Figure 4: An excel output generated by the tracker
![Page 11: Multi-modal corpus design, construction and usepszaxc/DReSS/BAAL07.pdf · Capturing, transcribing and aligning, and adding gesture to transcription 2) Tracking, defining and coding](https://reader036.vdocuments.site/reader036/viewer/2022070113/605aeebeb7d9736944798ffc/html5/thumbnails/11.jpg)
3a) Requirements for MM corpus representation
= 2nd Generation = 3rd Generation
Figure 5: Defining ‘3rd Generation’ corpus software
![Page 12: Multi-modal corpus design, construction and usepszaxc/DReSS/BAAL07.pdf · Capturing, transcribing and aligning, and adding gesture to transcription 2) Tracking, defining and coding](https://reader036.vdocuments.site/reader036/viewer/2022070113/605aeebeb7d9736944798ffc/html5/thumbnails/12.jpg)
3b) Current shortcomings of corpus software
• Current tools tend to focus either on the management of dataor upon the processes of coding and annotating previouslycollected data (examples include Transana, Anvil, NITE XMLWorkbench, ELAN)
![Page 13: Multi-modal corpus design, construction and usepszaxc/DReSS/BAAL07.pdf · Capturing, transcribing and aligning, and adding gesture to transcription 2) Tracking, defining and coding](https://reader036.vdocuments.site/reader036/viewer/2022070113/605aeebeb7d9736944798ffc/html5/thumbnails/13.jpg)
3b) Current shortcomings of corpus software
• There does not appear to be a tool available to supportthe integration of these individual processes, supportingthe research process from:
The ‘Record’Phase
OrganisingRecords
AnalysingData
Defining andCoding Data
![Page 14: Multi-modal corpus design, construction and usepszaxc/DReSS/BAAL07.pdf · Capturing, transcribing and aligning, and adding gesture to transcription 2) Tracking, defining and coding](https://reader036.vdocuments.site/reader036/viewer/2022070113/605aeebeb7d9736944798ffc/html5/thumbnails/14.jpg)
3c) Introducing DRS: Basic user information
• The DRS (formerly ReplayTool) enables the replay andannotation of large quantities of time-based datasets.
• It allows for the simultaneous synchronized replay ofmultiple data sources including videos, system log files,spatial data.
• In addition to the actual replay and annotation of suchdata sets, the DRS will also enable the user to performtasks with their data files that aid the organisation oftheir data sets.
![Page 15: Multi-modal corpus design, construction and usepszaxc/DReSS/BAAL07.pdf · Capturing, transcribing and aligning, and adding gesture to transcription 2) Tracking, defining and coding](https://reader036.vdocuments.site/reader036/viewer/2022070113/605aeebeb7d9736944798ffc/html5/thumbnails/15.jpg)
3d) DRS: A real-time demo
Demonstrating the basic corpus tool-benchinterface, for the representation and replay ofindividual sets of encoded data, and theconcordance tool that has been developed aspart of the tool to enable detailed linguisticenquiry:
http://www.mrl.nott.ac.uk/research/projects/dress/software/DRS/webstart/drs.jnlphttp://www.mrl.nott.ac.uk/research/projects/dress/software/DRS/replaytool.zip
![Page 16: Multi-modal corpus design, construction and usepszaxc/DReSS/BAAL07.pdf · Capturing, transcribing and aligning, and adding gesture to transcription 2) Tracking, defining and coding](https://reader036.vdocuments.site/reader036/viewer/2022070113/605aeebeb7d9736944798ffc/html5/thumbnails/16.jpg)
3f) Outlining ethical issues and concerns
• Defining ‘consent’
• Anonymisation in textual, audio and video data:the limitations of pixellisation
• Re-use and distribution problems
![Page 17: Multi-modal corpus design, construction and usepszaxc/DReSS/BAAL07.pdf · Capturing, transcribing and aligning, and adding gesture to transcription 2) Tracking, defining and coding](https://reader036.vdocuments.site/reader036/viewer/2022070113/605aeebeb7d9736944798ffc/html5/thumbnails/17.jpg)
Contacts:
David Evans: [email protected]
Dawn Knight: [email protected]
Ronald Carter: [email protected]
Svenja Adolphs: [email protected]