data collection for the chil clear 2007 evaluation campaign
DESCRIPTION
Data Collection for the CHIL CLEAR 2007 Evaluation Campaign. N. Moreau 1 , D. Mostefa 1 , R. Stiefelhagen 2 , S. Burger 3 , K. Choukri 1 1 ELDA, 2 UKA-ISL, 3 CMU E-mails: {moreau;mostefa;choukri}@elda.org, [email protected], [email protected] - PowerPoint PPT PresentationTRANSCRIPT
NM – LREC 2008 /1
N. MoreauN. Moreau11, D. Mostefa, D. Mostefa11, R. Stiefelhagen, R. Stiefelhagen22, S. Burger, S. Burger33, K. Choukri, K. Choukri11
11ELDA, ELDA, 22UKA-ISL, UKA-ISL, 33CMUCMUE-mails: {moreau;mostefa;choukri}@elda.org, [email protected], [email protected]: {moreau;mostefa;choukri}@elda.org, [email protected], [email protected]
Evaluations and Language resources distribution agency (ELDA)Evaluations and Language resources distribution agency (ELDA)
www.elda.orgwww.elda.org
Data CollectionData Collectionfor the CHIL CLEAR 2007 for the CHIL CLEAR 2007
Evaluation CampaignEvaluation Campaign
NM – LREC 2008 /2
Plan
1. CHIL project2. Evaluation campaigns3. Data recordings4. Annotations5. Evaluation package6. Conclusion
NM – LREC 2008 /3
CHIL Project
• CHIL: Computers in the Human Interaction Loop• Integrated project funded by the European
Commission (FP6)• January 2004 – August 2007• 15 partners, 9 countries
(ELDA responsible for data collection and evaluations)• Multimodal and perceptual user interface
technologies• Context:
– Real-life meetings (small meeting rooms)– Activities and interactions of attendees
NM – LREC 2008 /4
CHIL evaluation campaigns
• June 2004: Dry run• January 2005: Internal evaluation campaign• February 2006: CLEAR 2006 campaign• February 2007: CLEAR 2007 campaign
• CLEAR = Classification of Events, Activities and Relationships– Opened to external participants
– Supported by CHIL and NIST (VACE Program)
– Co-organized with the NIST RT (Rich Transcription) Evaluation
NM – LREC 2008 /5
CLEAR 2007 evaluation campaign
• 9 technologies evaluated– Vision technologies
• Face Detection and Tracking• Visual Person Tracking • Visual Person Identification• Head Pose Estimation
– Acoustic technologies• Acoustic Person Tracking • Acoustic Speaker Identification • Acoustic Event Detection
– Mutlimodal technologies• Multimodal Person Tracking • Multimodal Speaker Identification
NM – LREC 2008 /6
CHIL Scenarios
Non InteractiveLectures
InteractiveSeminars
NM – LREC 2008 /7
CHIL Data Sets
CLEAR 2007 Data Collection:– 25 highly interactive seminars
– Attendees: between 3 and 7
– Events: several presenters, discussions, coffee breaks, people entering / leaving the room, ...
Campaign # Lectures # Interactive Seminars
Internal 12 0
CLEAR 2006 34 15
CLEAR 2007 0 25
NM – LREC 2008 /8
Recording set up
• 5 recording rooms
• Sensors:
– 64-channel microphone array
– 4-channel T-shaped microphones
– Table-top microphones
– Close talking microphones
• Audio
• Video
– 4 fixed corner cameras
– 1 ceiling wide-angle camera
– Pan-tilt-zoom (PTZ) cameras
NM – LREC 2008 /9
Camera Views
NM – LREC 2008 /10
Quality Santards
• Recording of 25 seminars in 2007 (5 per CHIL room)• Audio-visual clap at beginning and end• Cameras (JPEG files at 15, 25 or 30 fps)
– Max. desynchronisation = 200 ms• Microphone array
– Max. desynchronisation = 200 ms• Other microphones (T-shape, table)
– Max. desynchronisation = 50 ms• If desynchronisation > max => recording to be
remade
NM – LREC 2008 /11
Annotations
CLEAR 2007 Annotations:– Audio: transcriptions, acoustic events– Video: facial features, head pose
Campaign Development data Evaluation data
Internal 2h 20 1h 40
CLEAR 2006 2h 30 3h 10
CLEAR 2007 2h 45 3h 25
NM – LREC 2008 /12
Audio Annotations
• Orthographic transcriptions– 2 channels
• Based on near filed recordings (close-talking microphones)• Compared with one far-field recording
– Speaker turns– Non verbal events (laugh, pauses...)– See: S. Burger “The CHIL RT07 Evaluation Data”
• Acoustic events– Based on one microphone array channel– 15 categories of sounds:
Speech, door slam, step, chair moving, cup jingle, applause, laugh, key jingle, cough, keyboard, phone, music, knock, paper wrapping, unknown
NM – LREC 2008 /13
Video Annotations
• Facial Features (Face detection, Person tracking)– annotations every 1 second– all attendees– 4 camera views– facial labels
• head centroïd• left and right eyes• nose bridge• face bounding box
– 2D head centroïds 3D ”ground truth”• Person Identification Database
– 28 persons to identify– audio-visual excerpts for each person ID– video labels every 200 ms
NM – LREC 2008 /14
Video Annotations
NM – LREC 2008 /15
Head Pose Data Set
• Persons captured with different head orientations– standing in the middle of a CHIL room (ISL)– captured by the 4 corner cameras
• Annotations:– Head bounding box– Head orientation: Pan, Tilt, Roll
• 10 persons for development• 5 persons for evaluation
NM – LREC 2008 /16
Head Pose Data Set
NM – LREC 2008 /17
Evaluation package
• The CLEAR 2007 evaluation package is publicly available through the ELRA catalog
• Enable external players to evaluate their system offline
• For each of the evaluated technologies:– Data sets (development/evaluation)– Evaluation and scoring tools– Results of the official campaign
NM – LREC 2008 /18
Conclusion
• 9 technologies evaluated during the 3rd CHIL evaluation campaign
• The CHIL 2007 evaluation package available through the ELRA catalog:
http://catalog.elra.info/• For more on the evaluations see:CLEAR 2007: http://www.clear-evaluation.org/RT 2007: http://www.nist.gov/speech/tests/rt/