data collection in nespole!

42
Data Collection in Nespole! Goals, procedures and tools Susanne Burger (Carnegie Mellon University) Erica Costantini (University of Trieste) Recent Advances in Speech Translation Systems

Upload: diata

Post on 06-Jan-2016

41 views

Category:

Documents


1 download

DESCRIPTION

Recent Advances in Speech Translation Systems. Data Collection in Nespole!. Goals, procedures and tools. Susanne Burger (Carnegie Mellon University) Erica Costantini (University of Trieste). New Idea. Why data collection?. Learning by Data. Speech Material: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data Collection in Nespole!

Data Collection in Nespole!

Goals, procedures and tools

Susanne Burger (Carnegie Mellon University)Erica Costantini (University of Trieste)

Recent Advances in Speech Translation Systems

Page 2: Data Collection in Nespole!

New Idea

Information about Users:• Acceptance• Usage• Behavior• Wish-list• Problem solving ...

System Information (Dry Run):• Stability• Speed• Bugs ...

Speech Material:• Domain, concept, vocabulary• Style (Human machine conversation)• Quality (Robustness) ...

Why data collection?

Learning by Data

J.T. Hackos, J.C. Redish, User and Task Analysis for interface design, J. Wiley & Sons, 1998.

Page 3: Data Collection in Nespole!

Learning by Data

1 Mass-Data from the scratch

Artificial Scenario/Environment/Set upWizard of Oz

Cooperative User/Actor

Data collection through usage ofbeta-system with increasing reality

2 User-study Data

AnalysisDevelopmentTrainingTestingEvaluation

Beta-System

Page 4: Data Collection in Nespole!

Data Collection: Planning

Who are the “Data Customers”?Nespole!:•ASR•MT•Synthesis•Interface Development•...

Type of Collection?Nespole!:•Mass Data Collection•Specific features•User study

Customer Needs?Nespole!:•Audio / Video •Transcription (levels of transcription)•Segmentation

Time and Budget

Data Usage?Nespole!:•Analysis•Development•Training•Testing•Evaluation

Page 5: Data Collection in Nespole!

Mass-Data Collection: Showcase 1

Travel Scenario / H323 Set upMonolingual

Cooperative Users

Travel + MultimodalityBeta System MT

Unseen Users

Multimodal Experiment

IDEA:NEgotiation through SPOken

Language in E-commerce

NespoleShowcase1-System

Nespole! Data Collection

AnalysisDevelopmentTrainingTestingEvaluation

Page 6: Data Collection in Nespole!

Example: Mass-Data Collection (Showcase 1)Monolingual data collection for system development

“Assembling Line”

Data Collection Procedure

Recording

Scen./Topic

Participants

Environment

Equipment

DataData

Page 7: Data Collection in Nespole!

Recording

Scen./Topic

Participants

Environment

Equipment

DataData

Page 8: Data Collection in Nespole!

Scenarios

• Scenario in Nespole!Detailed description of:

– the customers’ features (age, marital status…);

– the destination of the travel;

– the objectives and preferences for the holiday

(accommodation, sport activities, cultural events…)

J. M. Carroll, Ed., Scenario-Based Design: Envisioning Work and Technology in System Development, New York, J. Wiley & Sons, 1995.

• Scenario: “story” about users, their work, their environment, how they do tasks, the task they need to do, and all combinations of these elements (*).

Page 9: Data Collection in Nespole!

Scenarios

Showcase 1

1. winter holidays in val di fiemme2. all inclusive tourist package3. summer vacation in a park4. castle and lake tours5. looking for folklore and brochures

Showcase 2a

All-inclusive tourist packages:1. summer in a hotel or apartment2. summer in a campsite3. summer in a hotel or apartment for a family4. summer in a campsite for a family5. winter in a hotel or apartment

Showcase 2b

1. script 1: chest_pain_12. script 3: chest_pain_23. script 2: flu-like syndrome 14. script 4: flu-like syndrome 2(version 1 and 2 are different for personal dataand symptoms description)

Scenarios in Nespole!

Page 10: Data Collection in Nespole!

Scenario example

Situation (Winter Holidays in Val di Fiemme):

• choose your vacation starting date after December 10th you want to stay there for (a weekend, 1 week, 2 weeks)

• you have 2 children (choose 2 ages between 2 and 11) and wife/husband

• you want to travel by car and park it at the hotel

• you already know the road to Val di Fiemme

• you want accommodation in ** or *** hotels in Val di Fiemme with bed & breakfast

• choose two hotels among: Latemar in Molina, Bellavista in Cavalese, Excelsior in Cavalese, Lagorai in Cavalese, Belvedere in Panchia, Bellaria in Predazzo, Cimon in Predazzo, Erica in Tesero, Lucia in Tesero, Montanara in Ziano, Zanon in Ziano

• you want to practice a winter sport (choose your favorite winter sport among the following: down hill skiing, cross-country skiing/snowshoeing, ice skating, snow-boarding)

Page 11: Data Collection in Nespole!

Things to ask for:

• prices and how far in advance to book

• types of ski-lifts nearby and their distance from hotel

• existence of cross-country trails and ice skating areas

• details about favorite winter-sport (exact location, prices, possibility of renting equipment)

• type of parking facilities for the car

• possibility of eating in the hotel and prices of dinner and late supper

• daycare and activities for children in the hotel

• special prices for children

Scenario example

Page 12: Data Collection in Nespole!

Scenario definition in Nespole!

Example: Showcase 1

• analysis of 5000 e-mail messages (in four languages);

• clustering of the e-mails on the base of the request type;

• selection e-mails concerning requests which could be discussed through phone call;

• construction of 21 scenarios;

• selection of 5 scenarios* among the 21 (done by the APT tourist board office manager)

* http://www.is.cs.cmu.edu/nespole/datacoll.html

Page 13: Data Collection in Nespole!

Recording

Scen./Topic

Participants

Environment

Equipment

DataData

Page 14: Data Collection in Nespole!

Participants

Language Fluent speaker

Age Adults

Sex We tried to balance M & F

Education University (students or more)

Knowledge in the field Half from speech labs and half fromother labs or departments

Computer literacy Average-high

Recruitment Volunteers (invitation)

Reward Non-paid

Other Collaborative

CUSTOMERS:

AGENTS:

Italian professional agents working at Trentino tourist office APT

Page 15: Data Collection in Nespole!

Recording

Scen./Topic

Participants

Environment

Equipment

DataData

Page 16: Data Collection in Nespole!

• APT (agent’s site, Italy) records the English client via H323 connection and the Italian agent via headset

• CMU (client’s site, USA) records the Italian agent via H323 connection and the English client via headset

Environment

File .wav (stereo)

File .wav (stereo)

File .wav (stereo)H323 Eng. customerAgent (local)

File .wav (stereo)H323 AgentEng. Customer (local)

Page 17: Data Collection in Nespole!

Recording

Scen./Topic

Participants

Environment

Equipment

DataData

Page 18: Data Collection in Nespole!

Hardware: PC Pentium 200 and up

Software:Windows NT or Win 98Total RecorderNetMeeting3.01

Microphone:Headsetor close microphone

Environment: Quiet office

Equipment

Page 19: Data Collection in Nespole!

Recording

Scen./Topic

Participants

Environment

Equipment

DataData

Page 20: Data Collection in Nespole!

Recording Procedure(customer’s site)

Before the recording session- Detailed background knowledge of the scenario- Access to web-pages- On-line form (to learn more about the role)

During the recording session- Signing a consent form and providing information about

factors possibly affecting the spoken language- Sitting in front of a computer, wearing a headset- pressing the call button on the Netmeeting window (when

the customer feels ready)- after 10 min the customer was urged to finish

Page 21: Data Collection in Nespole!

Recording:LTI Data Collection Database

Oracle database, accessible online, containing detailed information and descriptions about meetings recorded, demographics of the speakers, transcriptions and audio files

(currently two separate interfaces to enter data into and retrieve data from the database)

Page 22: Data Collection in Nespole!

Recording

Scen./Topic

Participants

Environment

Equipment

DataData

Page 23: Data Collection in Nespole!

2 stereo wav filesSpr protocolRpr protocolvideo tapes

(200 collected dialogues )

Page 24: Data Collection in Nespole!

File naming conventions

Confusion with parallel recordings;different types of files concerning the same recording;different languages, types of scenario, locations; stereo vs mono files, etc.

Why?

Example from Nespole! file naming conventions

[dia_name] .[extension]

[language] [count] [scenario] [rec_location] [channel] .[extension]e =Englishg =Germanf =Frenchi =Italian

000-999 A = scen1b = scen2c = scen3d = scen4e = scen5

a = APTg = Grenoblei = IRSTk = Karlsruhep = Pittsburgh

1 = agent2 = customer

wav=audiospr=speaker_inforpr=recording protocoltrl=transcriptionmar=time stamps

Page 25: Data Collection in Nespole!

FOR EACH DIALOGUE FOR EACH CHANNEL

Recording NameSession No.Project NameRecording TypeRecording CategoryRecording TopicRecording DescriptionRecording ScenarioRECORDING DATERecorded ByNumber of SpeakersNumber of ChannelsComments

Coding (pcm, A-law, u-law)Number of bits (8/16)byte-order (little-endian, big-endian)ratemono or stereosize in byteslength in msmedium typemedium brandmedium usagemedium IDcable IDmixer brandmixer settingschannelspeakers

Log data: recording protocol

Page 26: Data Collection in Nespole!

MANDATORY DATA NON MANDATORY DATA

Native languageGenderDate of birthEducationCurrent occupationArea of residence duringprimary yearsof schooling (until age 12)

Last NameFirst NameMiddle InitialUser Name (only if applicable)Father's and Mother’s NativeLanguageAccent/Dialect within NativeLanguage (if any)Height (Ft/in or Cm)Weight (Lbs or Kg)Area Of BirthArea Of Longest ResidenceRight or Left HandedSmokerMedical Conditions which couldAffect Speaker's VoiceSpeaker CommentsEmail AddressPhone Number

Log data: speaker protocol

Page 27: Data Collection in Nespole!

Audio Data

TranscriptionConventions

Transcription Tool

TRL FilesMAR FilesVoc Lists

...m054_1_0575_QXE_00: if it was , I don't know , in the beginning of the century , I would think so , but .

m054_5_0576_MTY_00: yeah , I mean , +/we d=/+ we don't know a lot <B> about anything .

m054_4_0577_ZMW_00: but +/even/+ I think even if they would have known a little bit more . <B> think about all these chicken farms or things like all this <B> +/k=/+ kind of really <B> terrible <B> behavior against animals , anyway . <B> so , +/I/+ +/I don't think/+ <B> <hes> +/I th=/+ I think as soon as some financial or land things or things like this <B> came into the game , <B> they don't think anymore <Laugh> about <B> animal behavior . this is +/ku=/+ just <B> <Noise> secondary% .

m054_3_0578_AAH_00: <hm>

m054_5_0579_MTY_00: right . <B>

m054_4_0580_ZMW_00: so , <B> this...

Transcriptionprocess

Page 28: Data Collection in Nespole!

Audio Data

TranscriptionConventions

Transcription Tool

TRL FilesMAR FilesVoc Lists

...m054_1_0575_QXE_00: if it was , I don't know , in the beginning of the century , I would think so , but .

m054_5_0576_MTY_00: yeah , I mean , +/we d=/+ we don't know a lot <B> about anything .

m054_4_0577_ZMW_00: but +/even/+ I think even if they would have known a little bit more . <B> think about all these chicken farms or things like all this <B> +/k=/+ kind of really <B> terrible <B> behavior against animals , anyway . <B> so , +/I/+ +/I don't think/+ <B> <hes> +/I th=/+ I think as soon as some financial or land things or things like this <B> came into the game , <B> they don't think anymore <Laugh> about <B> animal behavior . this is +/ku=/+ just <B> <Noise> secondary% .

m054_3_0578_AAH_00: <hm>

m054_5_0579_MTY_00: right . <B>

m054_4_0580_ZMW_00: so , <B> this...

Transcriptionprocess

Page 29: Data Collection in Nespole!

-Verbmobil II: - we are familiar with VMB and we have appropriate tools - BAS partitur format - finite/close system (parsing, filtering,converting) - line oriented, no formats (one line/turn) - turn oriented (turn-IDs contain full identification) - time stamps and trl are in different files linked by turn-ID (- http://www.is.cs.cmu.edu/trl_conventions/)

Transcription (trl) Conventions

S. Burger, L. Besacier, P. Coletti, F. Metze and C. Morel, “The NESPOLE! VoIP Dialogue Database”, in Proc. of Eurospeech 2001. Aalborg, Denmark.

Page 30: Data Collection in Nespole!
Page 31: Data Collection in Nespole!

-words

-capitalization-punctuation-white space-turn-end-syntax

-non-grammatical phrases-broken words-interrupted words-acoustically hard to understand

-pauses and breathing-filled pauses-acoustically not understandable-human noise

-word tags

-elements

-rules

Orthography: - orthographic rules as long as they are non-ambiguous- no capitalization in case of initial sentence position- vocabulary lists to keep vocabulary spelled the same

Content

Page 32: Data Collection in Nespole!

<*tENG> Foreign Language Turn (JAP, GER, ..)

;.. global Comment

..'.. Apostrophe (reduced word)

..-.. (--) Hyphen (compound word)

$.. spelled Letter

~..Name

#.. Number

*.. Neologism/Mispronunciation

<*XXX.. Foreign Word (FRA,ITA, ..)

...<L>.. / ..<Z>.. Lengthening

..% Poor intelligible

..= Articulated Break-off

.._ Interruption of a Word, Left Fragment

_.. Interruption of a Word, Right Fragment

<T_>.. Technical Interruption of a Word, Beginning

..<_T> Technical Interruption of a Word, End

<*T> Technical interruption of a Turn

<*T>t Technical Break-off of a Turn

<!n ..> Comment on Pronunciation

. / ? / , Punctuation

+/.. Beginning of a Repetition/Correction

../+ End of a Repetition/Correction

-/.. Beginning of a False Start

../- End of a False Start

<B> / <A> Respiration

<uh> / <"ah> Filled Pause (Hesitation)

<uhm> / <"ahm> Filled Pause (Hesitation)

<hm> Filled Pause (Hesitation)

<hes> / <h"as> Filled Pause (Hesitation)

<%> Unidentifiable Sound Production

<Smack> / <Schmatzen> Nonverbal Artikulatory Sound (sound: smacking)

<Swallow> / <Schlucken> Nonverbal Artikulatory Sound (sound: swallowing)

<Throat> / <R"auspern> Nonverbal Artikulatory Sound (sound: clearing one's throat)

<Cough> / <Husten> Nonverbal Artikulatory Sound (sound: cough)

<Laugh> / <Lachen> Nonverbal Artikulatory Sound (sound: laughing)

<Noise> / <Ger"ausch> Nonverbal Artikulatory Sound (other sounds)

<#Click> / <#Klicken> Technical Noise

<#Ring> / <#Klingeln> Technical Noise

<#Knock> / <#Klopfen> Technical Noise

<#Mtouch> / <#Mikrobe> Technical Noise

<#Mwind> / <#Mikrowind> Technical Noise

<#Rustle> / <#Rascheln> Technical Noise

<#Squeak> / <#Quietschen> Technical Noise

<#> Technical Noise

<P> Pause during Speech

@n.. Active Interference by a Speaker

..n@ Passively Interfered Speaker

<@n.. Active Interference by Acoustic Events

..n@> Passive Interference of Acoustic Events

<:<..> .. Beginning of Noise Interference

..:> End of Noise Interference

<;..> Local Comment

!KEY!.. Code Word

<PP> Scenario Caused Pause

 

Page 33: Data Collection in Nespole!

Audio Data

TranscriptionConventions

Transcription Tool

TRL FilesMAR FilesVoc Lists

...m054_1_0575_QXE_00: if it was , I don't know , in the beginning of the century , I would think so , but .

m054_5_0576_MTY_00: yeah , I mean , +/we d=/+ we don't know a lot <B> about anything .

m054_4_0577_ZMW_00: but +/even/+ I think even if they would have known a little bit more . <B> think about all these chicken farms or things like all this <B> +/k=/+ kind of really <B> terrible <B> behavior against animals , anyway . <B> so , +/I/+ +/I don't think/+ <B> <hes> +/I th=/+ I think as soon as some financial or land things or things like this <B> came into the game , <B> they don't think anymore <Laugh> about <B> animal behavior . this is +/ku=/+ just <B> <Noise> secondary% .

m054_3_0578_AAH_00: <hm>

m054_5_0579_MTY_00: right . <B>

m054_4_0580_ZMW_00: so , <B> this...

Transcriptionprocess

Page 34: Data Collection in Nespole!

Why another tool?

Other requirements as before: - Windows instead of Linux - Meetings – multiparty transcription - Transcriber from different backgrounds

At that time (over three years ago) there wasn’t a sufficient transcriber tool

• We did a study what would be the basic requirements.• We asked transcribers what they would find convenient.• We programmed a beta tool according to that.• We are still using this tool (and so do different other places in the mean time)• We call it TransEdit.

Transcription Tools

Page 35: Data Collection in Nespole!

• MFC program• Windows text editor• click-able buttons for transcription elements• automatic turn naming and counting• label editor• parallel display of multi audio signals• easy turn segmentation• lots of listen functions• easy handling, no research functions•“home work” but available for universities (write to: [email protected])

TransEdit:transcription tool just for transcribers

Page 36: Data Collection in Nespole!
Page 37: Data Collection in Nespole!

Audio Data

TranscriptionConventions

Transcription Tool

TRL FilesMAR FilesVoc Lists

...m054_1_0575_QXE_00: if it was , I don't know , in the beginning of the century , I would think so , but .

m054_5_0576_MTY_00: yeah , I mean , +/we d=/+ we don't know a lot <B> about anything .

m054_4_0577_ZMW_00: but +/even/+ I think even if they would have known a little bit more . <B> think about all these chicken farms or things like all this <B> +/k=/+ kind of really <B> terrible <B> behavior against animals , anyway . <B> so , +/I/+ +/I don't think/+ <B> <hes> +/I th=/+ I think as soon as some financial or land things or things like this <B> came into the game , <B> they don't think anymore <Laugh> about <B> animal behavior . this is +/ku=/+ just <B> <Noise> secondary% .

m054_3_0578_AAH_00: <hm>

m054_5_0579_MTY_00: right . <B>

m054_4_0580_ZMW_00: so , <B> this...

Transcriptionprocess

Page 38: Data Collection in Nespole!

; CDR: 00.00 ; TRV: 00.00 ; File: e025at; Last changes made on 09/29/2000 ; Transcriber: VLM ; Comments: ; e025_1_0000_ITL_00: hello ? <P> can you hear me now ?

e025_2_0001_XYZABC_00: hello .

e025_1_0002_ITL_00: hello% . yeah% .

e025_2_0003_ XYZABC _00: <uh> yes , I can .

e025_1_0004_ITL_00: yes , okay . <P> so ?

e025_2_0005_ XYZABC _00: -/hi I would like/- <P> yes ?

e025_1_0006_ITL_00: yes , can you hear me now ?

e025_2_0007_ XYZABC _00: <uh> yes , I can .

e025_1_0008_ITL_00: okay . <B> wonderful . <Laugh> <B> <P> <Smack> <B> so , can I help you ? <B>

e025_2_0009_ XYZABC _00: -/all right I would like/- <uh> yes , madam . I would like to schedule a winter vacation <P> in the north of Italy .

e025_1_0010_ITL_00: <hm> <B>

e025_1_0011_ITL_00: yes . <B> would you like t= <*T>t

e025_1_0012_ITL_00: yes . would you like to come here% in summer or during winter ?

e025_2_0013_ XYZABC _00: <uh> in winter please .

Page 39: Data Collection in Nespole!

automatic convention check

close check and correction by another transcriber

spell-checking

marker file and trl file cross-check

first pass transcription (but not rough ..)

Data transcription process

Page 40: Data Collection in Nespole!

Audio Data

TranscriptionConventions

Transcription Tool

TRL FilesMAR FilesVoc Lists

...m054_1_0575_QXE_00: if it was , I don't know , in the beginning of the century , I would think so , but .

m054_5_0576_MTY_00: yeah , I mean , +/we d=/+ we don't know a lot <B> about anything .

m054_4_0577_ZMW_00: but +/even/+ I think even if they would have known a little bit more . <B> think about all these chicken farms or things like all this <B> +/k=/+ kind of really <B> terrible <B> behavior against animals , anyway . <B> so , +/I/+ +/I don't think/+ <B> <hes> +/I th=/+ I think as soon as some financial or land things or things like this <B> came into the game , <B> they don't think anymore <Laugh> about <B> animal behavior . this is +/ku=/+ just <B> <Noise> secondary% .

m054_3_0578_AAH_00: <hm>

m054_5_0579_MTY_00: right . <B>

m054_4_0580_ZMW_00: so , <B> this...

Transcriptionprocess

Page 41: Data Collection in Nespole!

Following mass-data collectionShowcase 2a and 2b

Showcase 2a Showcase 2b

Domain Tourism Medicine

Scenarios 5 4

Multimodality Yes Yes

Dialogues 66 56

Participants12 real APT agents;16 simulated customersper Language

3 doctors in the doctor’srole) and 7 doctors in thepatient’s role per Lang.

Average length 15 mins. 6 mins.

Page 42: Data Collection in Nespole!

Analysis of medical databases

Definition of some scripts

Pre-tests

Scenarios

Data collection

Medical scenarios development

Doctors