the fourth dialog state tracking challenge (dstc4)

20
IWSDS 2016 The Fourth Dialog State Tracking Challenge (DSTC4) Seokhwan Kim 1 , Luis Fernando D’Haro 1 , Rafael E. Banchs 1 , Jason D. Williams 2 , Matthew Henderson 3 1 Institute for Infocomm Research, 2 Microsoft Research, 3 Google

Upload: seokhwan-kim

Post on 11-Apr-2017

300 views

Category:

Engineering


0 download

TRANSCRIPT

Page 1: The Fourth Dialog State Tracking Challenge (DSTC4)

IWSDS 2016

The Fourth Dialog State Tracking

Challenge (DSTC4)

Seokhwan Kim1, Luis Fernando D’Haro1, Rafael E. Banchs1,

Jason D. Williams2, Matthew Henderson3 1 Institute for Infocomm Research, 2 Microsoft Research, 3 Google

Page 2: The Fourth Dialog State Tracking Challenge (DSTC4)

IWSDS 2016

Dialogue State Tracking

• A key subtask in dialogue management

• To estimate the user’s goal as a dialogue progresses

Pg 2

Utterance Food Area

S Hello, How may I help you? Persian South

U I need a Persian restaurant in the south part of town.

S What kind of food would you like? Persian South

U Persian.

S I’m sorry but there is no restaurant serving persian food Portuguese South

U How about Portuguese food?

S Peking restaurant is a nice place in the south of town. Portuguese South

U Is that Portuguese?

S Nandos is a nice place in the south of town serving tasty Portuguese food. Portuguese South

U Alright. Whats the phone number?

S The phone number of nandos is 01223 327908 . Portuguese South

U And the address?

S Sure, nandos is on Cambridge Leisure Park Clifton Way. Portuguese South

U Thank you good bye.

Page 3: The Fourth Dialog State Tracking Challenge (DSTC4)

IWSDS 2016

Previous Dialog State Tracking Challenges

• DSTC1 (Williams et al., SIGDIAL 2013)

– Human-machine dialogues on bus timetable search

– Collected with Let’s go (CMU)

– Focused on the evaluation metrics for state tracking

• DSTC2 (Henderson et al., SIGDIAL 2014)

– Human-machine dialogues on restaurant search

– Collected with Cambridge University’s system

– Introduced changing user goals in a single dialogue session

• DSTC3 (Henderson et al., IEEE SLT 2014)

– Human-machine dialogues on tourist information search

– Collected with Cambridge University’s system

– Addressed the problem of adaptation to a new domain from DSTC2

Pg 3

Page 4: The Fourth Dialog State Tracking Challenge (DSTC4)

IWSDS 2016

TourSG: Dataset for DSTC4

• Human-human dialogues

• Tourist information in Singapore

• Speakers

– Guide (3 actual tour guides from Singapore)

– Tourist (35 possible tourists from Philippines)

• Characteristics

– Goal-oriented dialogues

– Mixed-initiative dialogues

– Knowledge-based dialogues

– Multi-topic dialogues

– Verbose dialogues

– Noisy dialogues

Pg 4

Page 5: The Fourth Dialog State Tracking Challenge (DSTC4)

IWSDS 2016

DSTC4: Timeline

Period Task

Mar 2012 – Oct 2012

Data collection and annotation

Sep 2014 – Dec 2014

Internal discussions

7 Dec 2014 Challenge planning meeting @ SLT 2014

Dec 2015 – Apr 2015

Labelling additional annotations and building resources for evaluation

15 Apr 2015 – 16 Aug 2015

Development phase of the main and pilot tasks of DSTC4

17 Aug 2015 – 31 Aug 2015

Evaluation phase of the main task of DSTC4

14 Sep 2015 – 16 Sep 2015

Evaluation phase of the pilot tasks of DSTC4

30 Sep 2015 Paper submission deadline to IWSDS 2016

Pg 5

Page 6: The Fourth Dialog State Tracking Challenge (DSTC4)

IWSDS 2016

Main Task: Dialogue State Tracking

• Motivation

– Each subject could be expressed through a series of multiple turns

– Multiple topics are interlaced in a session

• Problem Definition

– Dialogue state tracking for each sub-dialogue level

– Focusing on the most common topic categories

• Annotations

– Segmentation

– Topic Category

– Frame Structure for major topic categories

• Itinerary, Accommodation, Attraction, Food, Transportation

Pg 6

Page 7: The Fourth Dialog State Tracking Challenge (DSTC4)

IWSDS 2016

Examples of Dialogue States

Pg 7

Tourist Can you give me some uh- tell me some cheap rate hotels, because I'm planning just to leave my bags there and go somewhere take some pictures.

Guide Okay. I'm going to recommend firstly you want to have a backpack type of hotel, right?

Tourist Yes. I'm just gonna bring my backpack and my buddy with me. So I'm kinda looking for a hotel that is not that expensive. Just gonna leave our things there and, you know, stay out the whole day.

Guide Okay. Let me get you hm hm. So you don't mind if it's a bit uh not so roomy like hotel because you just back to sleep.

Tourist Yes. Yes. As we just gonna put our things there and then go out to take some pictures. Guide Okay, um- Tourist Hm. Guide Let's try this one, okay? Tourist Okay.

Guide It’s InnCrowd Backpackers Hostel in Singapore. If you take a dorm bed per person only twenty dollars. If you take a room, it's two single beds at fifty nine dollars.

Tourist Um. Wow, that's good.

Guide Yah, the prices are based on per person per bed or dorm. But this one is room. So it should be fifty nine for the two room. So you're actually paying about ten dollars more per person only.

Tourist Oh okay. That's- the price is reasonable actually. It's good.

TOPIC ACCOMMODATION

TYPE Hostel

PRICERANGE Cheap

TOPIC ACCOMMODATION

NAME InnCrowd Backparkers Hostel

Page 8: The Fourth Dialog State Tracking Challenge (DSTC4)

IWSDS 2016

Examples of Dialogue States

Pg 8

Tourist So uh is it near the airport?

Guide Hm no. But you can get there easily by taking the trains from the airport. You just need to make a change in the train direction.

Tourist Hm okay. Because I have no idea at all about Singapore trains or transit. Uh how can I go to the train or to the transit from the airport? Is it just outside the airport?

Guide So when you reach the airport you go down to the basement. Tourist Um. Okay.

Guide So you get your ticket, you pay your deposit. And I think at the airport they gave you a map. and- to give you an idea. So all this is free. And then you travel along the East line towards the West. Can you see Tanah Merah on the second stop?

Tourist Okay. Hm, Tanah Merah, yes.

Guide Okay. So that is where you change to go down to town to the West towards the West. And you go down to- I think the easiest way is to go to Outram Park.

Tourist Outrum Park. Guide Yah. Tourist Alright.

Guide So when you get up there, you take the line towards Little India. So it's one, two, three stops and you are at Little India.

Tourist Hm, okay.

TOPIC TRANSPORTATION

FROM Changi Airport

TO InnCrowd Backparkers Hostel

BY MRT

Page 9: The Fourth Dialog State Tracking Challenge (DSTC4)

IWSDS 2016

Examples of Dialogue States

Pg 9

Guide So this is the place that you can go out and try street food. you can soak in the atmosphere. You would love taking your camera out because you can photograph the Indian garland makers, the fortune tellers. Uh it's full of life and culture. It's one of my favourite places.

Tourist Oh- Oh, great. Oh Yah. Is Little India is like a Indian community town or Indiantown?

Guide Yes. So there are Hindu temples there. You can photograph beautiful architecture and statues of the different Deities, the Hindu Deities.

Tourist Uh huh. Okay. Great. So other than Indiantowns, are there other uh nations town there or race town? What else?

Guide Okay. And then Chinatown you take the same line. Two, three stops down. So you'll get off at Chinatown, you are right in the heart of Chinatown. And in Chinatown we have uh also Bhuddist temple and Terrace temple also great for photography.

Tourist Uh Yes. Yes, okay. Okay, great. So we have Little India, then Chinatown. Other than that two, there are other kinds of town, right? Like uh- is there a um something like Vietnamese town or just the two of these?

Guide Not Vietnamese but there is uh Kampong Glam which you have to go by bus because- well actually you could go by train because you are young and healthy you can walk.

Tourist Hm. Yah, I like walking.

TOPIC ATTRACTION

NAME Little India

TOPIC ATTRACTION

TYPE Ethnic enclave TOPIC ATTRACTION

NAME Chinatown

TOPIC ATTRACTION

TYPE Ethnic enclave

TOPIC ATTRACTION

NAME Kampong Glam

Page 10: The Fourth Dialog State Tracking Challenge (DSTC4)

IWSDS 2016

Examples of Dialogue States

Pg 10

Tourist So what about other than street food, of course I have to eat my dinner. Wha~ where do you suggest me to eat my dinner? I also want to experience Singaporean delicacies or Singaporean dishes.

Guide do you like hot food? Do you like curries? Tourist Curries? Guide Yah.

Tourist Indian curries? What about Singaporean restaurants? Like they, you know, they offer Singaporean delicacies or Singaporean dishes? Do you have a Singaporean dishes in Singapore?

Guide Uh, Singaporean food is mostly try at the uh food courts. This is one I am recommending to you. It's at old market. It's Maxwell Road Food Centre.

Tourist Um. Road Food Centre.

Guide So it is at place called Maxwell Road which is in Chinatown. So if you take the train to Chinatown from where you are and you'd- It's near. You just walk there.

Tourist Okay, nice.

TOPIC FOOD

CUISINE Singaporean

TOPIC FOOD

DISH Curry

TOPIC FOOD

CUISINE Singaporean TOPIC FOOD

TYPE_OF_PLACE Hawker centre

NAME Maxwell Road Food Centre

Page 11: The Fourth Dialog State Tracking Challenge (DSTC4)

IWSDS 2016

Main Task: Evaluation

• Resources

– Data

• Training set: 14 dialogues with 12,759 utterances

• Development set: 6 dialogues with 4,812 utterances

• Test set: 9 dialogues with 7,848 utterances

– Ontology

– Evaluation scripts

– Baseline tracker

• Fuzzy string matching with the ontology entries

– CodaLab: Web-based Competition Platform

• Metrics

– Schedules

• Schedule 1: all turns are included

• Schedule 2: only the turns at the end of segments are included

– Metrics

• Frame Structure-level Accuracy

• Slot-level Precision/Recall/F-measure

Pg 11

Page 12: The Fourth Dialog State Tracking Challenge (DSTC4)

IWSDS 2016

Pg 12

Schedule 1 Schedule 2

Team Entry Accuracy Precision Recall F-measure Accuracy Precision Recall F-measure

Baseline 0 0 0.0374 0.3589 0.1925 0.2506 0.0488 0.3750 0.2519 0.3014

1 0 0.0456 0.3876 0.3344 0.3591 0.0584 0.4384 0.3377 0.3815

1 0.0374 0.4214 0.2762 0.3336 0.0584 0.4384 0.3377 0.3815

2 0.0372 0.4173 0.2767 0.3328 0.0575 0.4362 0.3377 0.3807

3 0.0371 0.4179 0.2804 0.3356 0.0584 0.4384 0.3426 0.3846

2 0 0.0487 0.4079 0.2626 0.3195 0.0671 0.4280 0.3257 0.3699

1 0.0467 0.4481 0.2655 0.3335 0.0671 0.4674 0.3275 0.3851

2 0.0478 0.4523 0.2623 0.3320 0.0706 0.4679 0.3226 0.3819

3 0.0489 0.4440 0.2703 0.3361 0.0697 0.4634 0.3335 0.3878

3 0 0.1212 0.5393 0.4980 0.5178 0.1500 0.5569 0.5808 0.5686

1 0.1210 0.5449 0.4964 0.5196 0.1500 0.5619 0.5787 0.5702

2 0.1092 0.5304 0.5031 0.5164 0.1316 0.5437 0.5875 0.5648

3 0.1183 0.5780 0.4904 0.5306 0.1473 0.5898 0.5678 0.5786

4 0 0.0887 0.5280 0.3595 0.4278 0.1072 0.5354 0.4273 0.4753

1 0.0910 0.5314 0.3122 0.3933 0.1055 0.5325 0.3623 0.4312

2 0.1009 0.5583 0.3698 0.4449 0.1264 0.5666 0.4455 0.4988

3 0.1002 0.5545 0.3760 0.4481 0.1212 0.5642 0.4540 0.5031

5 0 0.0309 0.2980 0.2559 0.2754 0.0392 0.3344 0.2547 0.2892

1 0.0268 0.3405 0.2014 0.2531 0.0401 0.3584 0.2632 0.3035

2 0.0309 0.3039 0.2659 0.2836 0.0392 0.3398 0.2639 0.2971

6 0 0.0421 0.4175 0.2142 0.2831 0.0541 0.4380 0.2656 0.3307

1 0.0478 0.5516 0.2180 0.3125 0.0654 0.5857 0.2702 0.3698

2 0.0486 0.5623 0.2314 0.3279 0.0645 0.5941 0.2850 0.3852

7 0 0.0286 0.2768 0.1826 0.2200 0.0323 0.3054 0.2410 0.2694

1 0.0044 0.0085 0.0629 0.0150 0.0061 0.0109 0.0840 0.0194

Page 13: The Fourth Dialog State Tracking Challenge (DSTC4)

IWSDS 2016

Main Task: Results

Pg 13

Page 14: The Fourth Dialog State Tracking Challenge (DSTC4)

IWSDS 2016

Main Tasks: Error Distribution

Pg 14

Page 15: The Fourth Dialog State Tracking Challenge (DSTC4)

IWSDS 2016

Main Tasks: Ensemble Learning

Schedule 1 Schedule 2

Accuracy F-measure Accuracy F-measure

Single best entry 0.1212 0.5306 0.1500 0.5786

Top 3 entries: union 0.1111 - 0.5147 - 0.1325 - 0.5619 -

Top 3 entries: intersection 0.1241 + 0.5344 + 0.1561 + 0.5861 +

Top 3 entries: majority voting 0.1172 - 0.5194 - 0.1421 - 0.5703

Top 5 entries: union 0.0980 - 0.5133 - 0.1107 - 0.5543 -

Top 5 entries: intersection 0.1157 0.4370 - 0.1369 0.5008 -

Top 5 entries: majority voting 0.1183 - 0.5210 - 0.1439 0.5711

Top 10 entries: union 0.0623 - 0.4719 - 0.0680 - 0.5014 -

Top 10 entries: intersection 0.0300 - 0.1816 - 0.0453 - 0.2275 -

Top 10 entries: majority voting 0.1268 + 0.4741 - 0.1456 0.5380 -

All entries: union 0.0077 - 0.1320 - 0.0078 - 0.1366 -

All entries: intersection 0.0132 - 0.0229 - 0.0192 - 0.0331 -

All entries: majority voting 0.0646 - 0.3535 - 0.0898 - 0.4135 -

Pg 15

Page 16: The Fourth Dialog State Tracking Challenge (DSTC4)

IWSDS 2016

Pilot Tasks: Evaluation

• Tasks

– Spoken Language Understanding (SLU)

– Speech Act Prediction (SAP)

– Spoken Language Generation (SLG)

– End-to-end System (EES)

• Evaluation Metrics

– SLU and SAP

• Precision/Recall/F-measure

– SLG and EES

• BLEU

• AM-FM

Pg 16

Page 17: The Fourth Dialog State Tracking Challenge (DSTC4)

IWSDS 2016

• Resources

– Data

• Training set: 14 dialogues with 12,759 utterances

• Development set: 6 dialogues with 4,812 utterances

• Test set: 6 dialogues with 5,615 utterances

– Ontology

– Evaluation Scripts

• Offline evaluation

• Web-based evaluation

• Web-based Evaluation

Pilot Tasks: Evaluation

Pg 17

JSON Messages

Web-server

System

Participant

Web-client

Evaluation Script

Organizer

Page 18: The Fourth Dialog State Tracking Challenge (DSTC4)

IWSDS 2016

Pilot Tasks: Results

• Participant

– SLU

• Team 3 (5 entries)

• Results

Pg 18

Speech Act Semantic Tag

Speaker Entry Precision Recall F-measure Precision Recall F-measure

Guide 1 0.6287 0.5191 0.5687 0.5646 0.4886 0.5239

2 0.6330 0.5227 0.5726 0.5646 0.4886 0.5239

3 0.7451 0.6153 0.6740 0.5646 0.4886 0.5239

4 0.6314 0.5214 0.5712 0.5646 0.4886 0.5239

5 0.6762 0.5584 0.6117 0.5646 0.4886 0.5239

Tourist 1 0.3583 0.2977 0.3252 0.5741 0.4764 0.5207

2 0.2931 0.2435 0.2660 0.5741 0.4764 0.5207

3 0.5627 0.4675 0.5107 0.5741 0.4764 0.5207

4 0.2939 0.2442 0.2668 0.5741 0.4764 0.5207

5 0.5736 0.4766 0.5206 0.5741 0.4764 0.5207

Page 19: The Fourth Dialog State Tracking Challenge (DSTC4)

IWSDS 2016

Conclusions

• DSTC4

– Main Task: Dialogue State Tracking

• Multi-topic, Mixed-initiative, Human-human conversations

• Tracking sub-dialogue segment-level state structures

• 24 entries from 7 participants

– Pilot Tasks

• SLU, SAP, SLG, EES

• Web-based evaluation

• 5 SLU entries from a participant

Pg 19

Page 20: The Fourth Dialog State Tracking Challenge (DSTC4)

Thank You

Pg 20