"sorry, i didn't get that!" - statistical learning from dialogues for intelligent...
TRANSCRIPT
1
DR YUN-NUNG (VIVIAN) CHEN H T T P V I V I A N C H E N I D VT W
Statistical Learning from Dialogues for Intelligence Assistants
Sorry I didnrsquot get that
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
2
My Background Yun-Nung (Vivian) Chen 陳縕儂 httpvivianchenidvtw
National Taiwan University
2009
BS
2005
Freshman
2011
MS
2015
PhD
Carnegie Mellon University
spoken dialogue systemlanguage understanding
user modeling
speech summarizationkey term extraction
spoken term detection
Microsoft Research
2016
Postdoc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
3
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
4
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
5
Apple Siri
(2011)
Google Now
(2012)
Microsoft Cortana
(2014)
Amazon AlexaEcho
(2014)
httpswwwapplecomiossirihttpswwwgooglecomlandingnowhttpwwwwindowsphonecomen-ushow-towp8cortanameet-cortanahttpwwwamazoncomocecho
Facebook M
(2015)
What are Intelligent Assistants
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
6
Why do we need them Daily Life Usage
Weather Schedule Transportation Restaurant Seeking
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
7
Why do we need them Get things done
Eg set up alarmreminder take note Easy access to structured data services and apps
Eg find docsphotosrestaurants Assist your daily schedule and routine
Eg commute alerts tofrom work Be more productive in managing your work and personal life
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
8
Why do companies care Global Digital Statistics (2015 January)
Global Population
721B
Active Internet Users
301B
Active Social Media Accounts
208B
Active Unique Mobile Users
365B
The more natural and convenient input of the devices evolves towards speech
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
9
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquorestaurant suggestionsrdquoldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
10
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
11
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
12
Spoken dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via spoken interactions
Spoken dialogue systems are being incorporated into various devices (smart-phones smart TVs in-car navigating system etc)
Good SDSs assist users to organize and access information conveniently
Spoken Dialogue System (SDS)
JARVIS ndash Iron Manrsquos Personal Assistant Baymax ndash Personal Healthcare Companion
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
13
Baymax is capable of maintaining a good spoken dialogue system and learning new knowledge for better understanding and interacting with people
What is Baymaxrsquos intelligenceBig Hero 6 -- Video content owned and licensed by Disney Entertainment Marvel Entertainment LLC etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
2
My Background Yun-Nung (Vivian) Chen 陳縕儂 httpvivianchenidvtw
National Taiwan University
2009
BS
2005
Freshman
2011
MS
2015
PhD
Carnegie Mellon University
spoken dialogue systemlanguage understanding
user modeling
speech summarizationkey term extraction
spoken term detection
Microsoft Research
2016
Postdoc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
3
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
4
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
5
Apple Siri
(2011)
Google Now
(2012)
Microsoft Cortana
(2014)
Amazon AlexaEcho
(2014)
httpswwwapplecomiossirihttpswwwgooglecomlandingnowhttpwwwwindowsphonecomen-ushow-towp8cortanameet-cortanahttpwwwamazoncomocecho
Facebook M
(2015)
What are Intelligent Assistants
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
6
Why do we need them Daily Life Usage
Weather Schedule Transportation Restaurant Seeking
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
7
Why do we need them Get things done
Eg set up alarmreminder take note Easy access to structured data services and apps
Eg find docsphotosrestaurants Assist your daily schedule and routine
Eg commute alerts tofrom work Be more productive in managing your work and personal life
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
8
Why do companies care Global Digital Statistics (2015 January)
Global Population
721B
Active Internet Users
301B
Active Social Media Accounts
208B
Active Unique Mobile Users
365B
The more natural and convenient input of the devices evolves towards speech
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
9
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquorestaurant suggestionsrdquoldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
10
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
11
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
12
Spoken dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via spoken interactions
Spoken dialogue systems are being incorporated into various devices (smart-phones smart TVs in-car navigating system etc)
Good SDSs assist users to organize and access information conveniently
Spoken Dialogue System (SDS)
JARVIS ndash Iron Manrsquos Personal Assistant Baymax ndash Personal Healthcare Companion
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
13
Baymax is capable of maintaining a good spoken dialogue system and learning new knowledge for better understanding and interacting with people
What is Baymaxrsquos intelligenceBig Hero 6 -- Video content owned and licensed by Disney Entertainment Marvel Entertainment LLC etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
3
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
4
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
5
Apple Siri
(2011)
Google Now
(2012)
Microsoft Cortana
(2014)
Amazon AlexaEcho
(2014)
httpswwwapplecomiossirihttpswwwgooglecomlandingnowhttpwwwwindowsphonecomen-ushow-towp8cortanameet-cortanahttpwwwamazoncomocecho
Facebook M
(2015)
What are Intelligent Assistants
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
6
Why do we need them Daily Life Usage
Weather Schedule Transportation Restaurant Seeking
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
7
Why do we need them Get things done
Eg set up alarmreminder take note Easy access to structured data services and apps
Eg find docsphotosrestaurants Assist your daily schedule and routine
Eg commute alerts tofrom work Be more productive in managing your work and personal life
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
8
Why do companies care Global Digital Statistics (2015 January)
Global Population
721B
Active Internet Users
301B
Active Social Media Accounts
208B
Active Unique Mobile Users
365B
The more natural and convenient input of the devices evolves towards speech
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
9
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquorestaurant suggestionsrdquoldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
10
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
11
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
12
Spoken dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via spoken interactions
Spoken dialogue systems are being incorporated into various devices (smart-phones smart TVs in-car navigating system etc)
Good SDSs assist users to organize and access information conveniently
Spoken Dialogue System (SDS)
JARVIS ndash Iron Manrsquos Personal Assistant Baymax ndash Personal Healthcare Companion
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
13
Baymax is capable of maintaining a good spoken dialogue system and learning new knowledge for better understanding and interacting with people
What is Baymaxrsquos intelligenceBig Hero 6 -- Video content owned and licensed by Disney Entertainment Marvel Entertainment LLC etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
4
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
5
Apple Siri
(2011)
Google Now
(2012)
Microsoft Cortana
(2014)
Amazon AlexaEcho
(2014)
httpswwwapplecomiossirihttpswwwgooglecomlandingnowhttpwwwwindowsphonecomen-ushow-towp8cortanameet-cortanahttpwwwamazoncomocecho
Facebook M
(2015)
What are Intelligent Assistants
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
6
Why do we need them Daily Life Usage
Weather Schedule Transportation Restaurant Seeking
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
7
Why do we need them Get things done
Eg set up alarmreminder take note Easy access to structured data services and apps
Eg find docsphotosrestaurants Assist your daily schedule and routine
Eg commute alerts tofrom work Be more productive in managing your work and personal life
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
8
Why do companies care Global Digital Statistics (2015 January)
Global Population
721B
Active Internet Users
301B
Active Social Media Accounts
208B
Active Unique Mobile Users
365B
The more natural and convenient input of the devices evolves towards speech
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
9
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquorestaurant suggestionsrdquoldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
10
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
11
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
12
Spoken dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via spoken interactions
Spoken dialogue systems are being incorporated into various devices (smart-phones smart TVs in-car navigating system etc)
Good SDSs assist users to organize and access information conveniently
Spoken Dialogue System (SDS)
JARVIS ndash Iron Manrsquos Personal Assistant Baymax ndash Personal Healthcare Companion
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
13
Baymax is capable of maintaining a good spoken dialogue system and learning new knowledge for better understanding and interacting with people
What is Baymaxrsquos intelligenceBig Hero 6 -- Video content owned and licensed by Disney Entertainment Marvel Entertainment LLC etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
5
Apple Siri
(2011)
Google Now
(2012)
Microsoft Cortana
(2014)
Amazon AlexaEcho
(2014)
httpswwwapplecomiossirihttpswwwgooglecomlandingnowhttpwwwwindowsphonecomen-ushow-towp8cortanameet-cortanahttpwwwamazoncomocecho
Facebook M
(2015)
What are Intelligent Assistants
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
6
Why do we need them Daily Life Usage
Weather Schedule Transportation Restaurant Seeking
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
7
Why do we need them Get things done
Eg set up alarmreminder take note Easy access to structured data services and apps
Eg find docsphotosrestaurants Assist your daily schedule and routine
Eg commute alerts tofrom work Be more productive in managing your work and personal life
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
8
Why do companies care Global Digital Statistics (2015 January)
Global Population
721B
Active Internet Users
301B
Active Social Media Accounts
208B
Active Unique Mobile Users
365B
The more natural and convenient input of the devices evolves towards speech
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
9
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquorestaurant suggestionsrdquoldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
10
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
11
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
12
Spoken dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via spoken interactions
Spoken dialogue systems are being incorporated into various devices (smart-phones smart TVs in-car navigating system etc)
Good SDSs assist users to organize and access information conveniently
Spoken Dialogue System (SDS)
JARVIS ndash Iron Manrsquos Personal Assistant Baymax ndash Personal Healthcare Companion
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
13
Baymax is capable of maintaining a good spoken dialogue system and learning new knowledge for better understanding and interacting with people
What is Baymaxrsquos intelligenceBig Hero 6 -- Video content owned and licensed by Disney Entertainment Marvel Entertainment LLC etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
6
Why do we need them Daily Life Usage
Weather Schedule Transportation Restaurant Seeking
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
7
Why do we need them Get things done
Eg set up alarmreminder take note Easy access to structured data services and apps
Eg find docsphotosrestaurants Assist your daily schedule and routine
Eg commute alerts tofrom work Be more productive in managing your work and personal life
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
8
Why do companies care Global Digital Statistics (2015 January)
Global Population
721B
Active Internet Users
301B
Active Social Media Accounts
208B
Active Unique Mobile Users
365B
The more natural and convenient input of the devices evolves towards speech
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
9
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquorestaurant suggestionsrdquoldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
10
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
11
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
12
Spoken dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via spoken interactions
Spoken dialogue systems are being incorporated into various devices (smart-phones smart TVs in-car navigating system etc)
Good SDSs assist users to organize and access information conveniently
Spoken Dialogue System (SDS)
JARVIS ndash Iron Manrsquos Personal Assistant Baymax ndash Personal Healthcare Companion
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
13
Baymax is capable of maintaining a good spoken dialogue system and learning new knowledge for better understanding and interacting with people
What is Baymaxrsquos intelligenceBig Hero 6 -- Video content owned and licensed by Disney Entertainment Marvel Entertainment LLC etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
7
Why do we need them Get things done
Eg set up alarmreminder take note Easy access to structured data services and apps
Eg find docsphotosrestaurants Assist your daily schedule and routine
Eg commute alerts tofrom work Be more productive in managing your work and personal life
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
8
Why do companies care Global Digital Statistics (2015 January)
Global Population
721B
Active Internet Users
301B
Active Social Media Accounts
208B
Active Unique Mobile Users
365B
The more natural and convenient input of the devices evolves towards speech
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
9
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquorestaurant suggestionsrdquoldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
10
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
11
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
12
Spoken dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via spoken interactions
Spoken dialogue systems are being incorporated into various devices (smart-phones smart TVs in-car navigating system etc)
Good SDSs assist users to organize and access information conveniently
Spoken Dialogue System (SDS)
JARVIS ndash Iron Manrsquos Personal Assistant Baymax ndash Personal Healthcare Companion
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
13
Baymax is capable of maintaining a good spoken dialogue system and learning new knowledge for better understanding and interacting with people
What is Baymaxrsquos intelligenceBig Hero 6 -- Video content owned and licensed by Disney Entertainment Marvel Entertainment LLC etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
8
Why do companies care Global Digital Statistics (2015 January)
Global Population
721B
Active Internet Users
301B
Active Social Media Accounts
208B
Active Unique Mobile Users
365B
The more natural and convenient input of the devices evolves towards speech
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
9
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquorestaurant suggestionsrdquoldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
10
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
11
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
12
Spoken dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via spoken interactions
Spoken dialogue systems are being incorporated into various devices (smart-phones smart TVs in-car navigating system etc)
Good SDSs assist users to organize and access information conveniently
Spoken Dialogue System (SDS)
JARVIS ndash Iron Manrsquos Personal Assistant Baymax ndash Personal Healthcare Companion
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
13
Baymax is capable of maintaining a good spoken dialogue system and learning new knowledge for better understanding and interacting with people
What is Baymaxrsquos intelligenceBig Hero 6 -- Video content owned and licensed by Disney Entertainment Marvel Entertainment LLC etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
9
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquorestaurant suggestionsrdquoldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
10
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
11
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
12
Spoken dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via spoken interactions
Spoken dialogue systems are being incorporated into various devices (smart-phones smart TVs in-car navigating system etc)
Good SDSs assist users to organize and access information conveniently
Spoken Dialogue System (SDS)
JARVIS ndash Iron Manrsquos Personal Assistant Baymax ndash Personal Healthcare Companion
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
13
Baymax is capable of maintaining a good spoken dialogue system and learning new knowledge for better understanding and interacting with people
What is Baymaxrsquos intelligenceBig Hero 6 -- Video content owned and licensed by Disney Entertainment Marvel Entertainment LLC etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
10
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
11
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
12
Spoken dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via spoken interactions
Spoken dialogue systems are being incorporated into various devices (smart-phones smart TVs in-car navigating system etc)
Good SDSs assist users to organize and access information conveniently
Spoken Dialogue System (SDS)
JARVIS ndash Iron Manrsquos Personal Assistant Baymax ndash Personal Healthcare Companion
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
13
Baymax is capable of maintaining a good spoken dialogue system and learning new knowledge for better understanding and interacting with people
What is Baymaxrsquos intelligenceBig Hero 6 -- Video content owned and licensed by Disney Entertainment Marvel Entertainment LLC etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
11
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
12
Spoken dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via spoken interactions
Spoken dialogue systems are being incorporated into various devices (smart-phones smart TVs in-car navigating system etc)
Good SDSs assist users to organize and access information conveniently
Spoken Dialogue System (SDS)
JARVIS ndash Iron Manrsquos Personal Assistant Baymax ndash Personal Healthcare Companion
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
13
Baymax is capable of maintaining a good spoken dialogue system and learning new knowledge for better understanding and interacting with people
What is Baymaxrsquos intelligenceBig Hero 6 -- Video content owned and licensed by Disney Entertainment Marvel Entertainment LLC etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
12
Spoken dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via spoken interactions
Spoken dialogue systems are being incorporated into various devices (smart-phones smart TVs in-car navigating system etc)
Good SDSs assist users to organize and access information conveniently
Spoken Dialogue System (SDS)
JARVIS ndash Iron Manrsquos Personal Assistant Baymax ndash Personal Healthcare Companion
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
13
Baymax is capable of maintaining a good spoken dialogue system and learning new knowledge for better understanding and interacting with people
What is Baymaxrsquos intelligenceBig Hero 6 -- Video content owned and licensed by Disney Entertainment Marvel Entertainment LLC etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
13
Baymax is capable of maintaining a good spoken dialogue system and learning new knowledge for better understanding and interacting with people
What is Baymaxrsquos intelligenceBig Hero 6 -- Video content owned and licensed by Disney Entertainment Marvel Entertainment LLC etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
14
ASR Automatic Speech Recognition SLU Spoken Language Understanding DM Dialogue Management NLG Natural Language Generation
SDS Architecture
DomainDMASR SLU
NLG
current bottleneck
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
15
Interaction ExampleUser
Intelligent Agent Q How does a dialogue system process this request
Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
16
SDS Process ndash Available Domain Ontology
find a cheap eating place for taiwanese foodUser
target
foodprice AMODNN
seeking PREP_FOR
Organized Domain KnowledgeIntelligent
Agent
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
17
SDS Process ndash Available Domain Ontology
User
target
foodprice AMODNN
seeking PREP_FOR
Organized Domain KnowledgeIntelligent
Agent
Ontology Induction(semantic slot)
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
18
SDS Process ndash Available Domain Ontology
User
target
foodprice AMODNN
seeking PREP_FOR
Organized Domain KnowledgeIntelligent
Agent
Ontology Induction(semantic slot)
Structure Learning(inter-slot relation)
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
19
SDS Process ndash Spoken Language Understanding (SLU)
User
target
foodprice AMODNN
seeking PREP_FORIntelligent
Agent
seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
20
find a cheap eating place for taiwanese food
SDS Process ndash Spoken Language Understanding (SLU)
User
target
foodprice AMODNN
seeking PREP_FORIntelligent
Agent
seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo
Semantic Decoding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
21
find a cheap eating place for taiwanese food
SDS Process ndash Dialogue Management (DM)
User
target
foodprice AMODNN
seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent
Agent
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
22
find a cheap eating place for taiwanese food
SDS Process ndash Dialogue Management (DM)
User
target
foodprice AMODNN
seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent
Agent
Surface Form Derivation(natural language)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
23
SDS Process ndash Dialogue Management (DM)
User
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo
Din Tai FungBoiling Point
Predicted intent navigation
Intelligent Agent
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
24
SDS Process ndash Dialogue Management (DM)
User
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo
Din Tai FungBoiling Point
Predicted intent navigation
Intelligent Agent
Intent Prediction
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
25
SDS Process ndash Natural Language Generation (NLG)
User
Intelligent Agent
Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there (navigation)
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
26
Required Knowledge
target
foodprice AMODNN
seeking PREP_FOR
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo
Predicted intent navigation
User
Required Domain-Specific Information
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
27
Challenges for SDS An SDS in a new domain requires
1) A hand-crafted domain ontology2) Utterances labelled with semantic representations3) An SLU component for mapping utterances into semantic representations
Manual work results in high cost long duration and poor scalability of system development
The goal is to enable an SDS to 1) automatically infer domain knowledge and then to 2) create the data for SLU modelingin order to handle the open-domain requests
seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquoasian foodrdquo
find a cheap eating place for asian food
fully unsupervised
Prior Focus
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
28
Contributions
target
foodprice AMODNN
seeking PREP_FOR
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquoasian foodrdquo
Predicted intent navigation
find a cheap eating place for taiwanese foodUser
Ontology Induction
Structure Learning
Surface Form Derivation
Semantic Decoding
Intent Prediction
(natural language)
(inter-slot relation)
(semantic slot)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
29
ContributionsUser
Ontology Induction
Structure Learning
Surface Form Derivation
Semantic Decoding
Intent Prediction
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
30
Ontology Induction Structure Learning Surface Form Derivation
Semantic Decoding Intent Prediction
ContributionsUser
Knowledge Acquisition SLU Modeling
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
31
Knowledge Acquisition1) Given unlabelled conversations how can a system automatically
induce and organize domain-specific concepts
Restaurant Asking
Conversations
target
foodprice
seeking
quantity
PREP_FOR
PREP_FOR
NN AMOD
AMODAMOD
Organized Domain Knowledge
Unlabelled Collection
Knowledge Acquisition
Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
32
SLU Modeling2) With the automatically acquired knowledge how can a system
understand utterance semantics and user intents
Organized Domain
Knowledge
price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation
SLU Modeling
SLU Component
ldquocan i have a cheap restaurantrdquo
SLU Modeling Semantic Decoding Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
33
SDS Architecture ndash Contributions
DomainDMASR SLU
NLG
Knowledge Acquisition SLU Modeling
current bottleneck
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
34
SDS Flowchart
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
35
SDS Flowchart ndash Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
36
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
37
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
38
[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing
FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame
ldquolow fatrdquo fills the descriptor frame element
SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated
FrameNet sentences
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
39
Ontology Induction [ASRUrsquo13 SLTrsquo14a]
can i have a cheap restaurant
Frame capability
Frame expensiveness
Frame locale by use
1st Issue differentiate domain-specific frames from generic frames for SDSs
GoodGood
Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014
slot candidate
Best Student Paper Award
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
40
1
Utterance 1i would like a cheap restaurant Train
hellip hellip
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1 Test
1 97 95
Frame Semantic Parsing
show me a list of cheap restaurantsTest Utterance
Word Observation Slot Candidate
Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award
Idea increase weights of domain-specific slots and decrease weights of others
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
41
1st Issue How to adapt generic slots to a domain-specific setting
Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot CandidateTrain
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
Slot Induction
Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication
i like
1 1
capability
1
locale_by_use
food expensiveness
seeking
relational_quantitydesiring
Utterance 1i would like a cheap restaurant
hellip hellip
find a restaurant with chinese foodUtterance 2
show me a list of cheap restaurantsTest Utterance
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
42
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
43
Knowledge Graph Construction Syntactic dependency parsing on utterances
ccomp
amoddobjnsubj det
can i have a cheap restaurantcapability expensiveness locale_by_use
Word-based lexical knowledge graph
Slot-based semantic knowledge graph
restaurantcan
have
i
acheap
w
w
capabilitylocale_by_use expensiveness
s
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
44
Dependency-based word embeddings
Dependency-based slot embeddings
Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)
can = have =
expensiveness = capability =
can i have a cheap restaurant
ccomp
amoddobjnsubj det
have acapability expensiveness locale_by_use
ccomp
amoddobjnsubj det
Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
45
Edge Weight Measurement Compute edge weights to represent relation importance
Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings
+
+
w1
w2
w3
w4
w5
w6
w7
s2
s1 s3
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
46
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
Slot Induction
Knowledge Graph Propagation Model119877119908
119878119863
119877119904119878119863
Structure information is integrated to make the self-training data more reliable
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
47
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance hidden semantics
2nd Issue unobserved semantics may benefit understanding
Semantic Decoding [ACL-IJCNLPrsquo15]
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
48
Reasoning with Matrix Factorization
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
9790 9585
93 929805 05
Slot Induction
Feature Model + Knowledge Graph Propagation Model
119877119908119878119863
119877119904119878119863
Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
49
2nd Issue How to model the unobserved hidden semantics
Matrix Factorization (MF) (Rendle et al 2009)
The decomposed matrices represent latent semantics for utterances and wordsslots respectively
The product of two matrices fills the probability of hidden semantics
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
9790 9585
93 929805 05
|119932|
|119934|+|119930|
asymp|119932|times119941 119941times (|119934|+|119930|)times
Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
50
Bayesian Personalized Ranking for MF Model implicit feedback
not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts
Objective
1
119891 +iquest iquest119891 minus119891 minus
The objective is to learn a set of well-ranked semantic slots per utterance
119906119909
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
51
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance
Matrix Factorization SLU (MF-SLU)
MF-SLU can estimate probabilities for slot candidates given test utterances
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
52
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
53
Experimental Setup Dataset Cambridge University SLU Corpus
Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type
Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots
Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
54
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
Approach ASR TranscriptsBaseline
SLUSupport Vector Machine 325 366
Multinomial Logistic Regression 340 388
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
55
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
The MF-SLU effectively models implicit information to decode semantics
The structure information further improves the results
Approach ASR Transcripts
Baseline SLU
Support Vector Machine 325 366Multinomial Logistic Regression 340 388
Proposed MF-SLU
Feature Model 376 453
Feature Model +Knowledge Graph Propagation
435
(+279)534
(+376)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
56
Experiments of Semantic DecodingEffectiveness of Relations
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
In the integrated structure information both semantic and dependency relations are useful for understanding
Approach ASR Transcripts
Feature Model 376 453
Feature + Knowledge Graph Propagation
Semantic 414 516
Dependency 416 490
All 435 (+157) 534 (+179)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Experiments for Structure LearningRelation Discovery Analysis
Discover inter-slot relations connecting important slot pairs
The reference ontology with the most frequent syntactic dependencies
locale_by_use
food expensiveness
seeking
relational_quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMODdesiring
DOBJ
type
food pricerange
DOBJ
AMOD AMOD
AMOD
taskarea
PREP_IN
The automatically learned domain ontology aligns well with the reference one
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57
The data-driven one is more objective while expert-annotated one is more subjective
58
Contributions of Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge
MF-SLU for Semantic Decoding is able to1) unify the automatically
acquired knowledge2) adapt to a domain-
specific setting 3) and then allows
systems to model implicit semantics for better understanding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
15
Interaction ExampleUser
Intelligent Agent Q How does a dialogue system process this request
Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
16
SDS Process ndash Available Domain Ontology
find a cheap eating place for taiwanese foodUser
target
foodprice AMODNN
seeking PREP_FOR
Organized Domain KnowledgeIntelligent
Agent
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
17
SDS Process ndash Available Domain Ontology
User
target
foodprice AMODNN
seeking PREP_FOR
Organized Domain KnowledgeIntelligent
Agent
Ontology Induction(semantic slot)
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
18
SDS Process ndash Available Domain Ontology
User
target
foodprice AMODNN
seeking PREP_FOR
Organized Domain KnowledgeIntelligent
Agent
Ontology Induction(semantic slot)
Structure Learning(inter-slot relation)
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
19
SDS Process ndash Spoken Language Understanding (SLU)
User
target
foodprice AMODNN
seeking PREP_FORIntelligent
Agent
seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
20
find a cheap eating place for taiwanese food
SDS Process ndash Spoken Language Understanding (SLU)
User
target
foodprice AMODNN
seeking PREP_FORIntelligent
Agent
seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo
Semantic Decoding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
21
find a cheap eating place for taiwanese food
SDS Process ndash Dialogue Management (DM)
User
target
foodprice AMODNN
seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent
Agent
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
22
find a cheap eating place for taiwanese food
SDS Process ndash Dialogue Management (DM)
User
target
foodprice AMODNN
seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent
Agent
Surface Form Derivation(natural language)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
23
SDS Process ndash Dialogue Management (DM)
User
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo
Din Tai FungBoiling Point
Predicted intent navigation
Intelligent Agent
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
24
SDS Process ndash Dialogue Management (DM)
User
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo
Din Tai FungBoiling Point
Predicted intent navigation
Intelligent Agent
Intent Prediction
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
25
SDS Process ndash Natural Language Generation (NLG)
User
Intelligent Agent
Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there (navigation)
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
26
Required Knowledge
target
foodprice AMODNN
seeking PREP_FOR
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo
Predicted intent navigation
User
Required Domain-Specific Information
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
27
Challenges for SDS An SDS in a new domain requires
1) A hand-crafted domain ontology2) Utterances labelled with semantic representations3) An SLU component for mapping utterances into semantic representations
Manual work results in high cost long duration and poor scalability of system development
The goal is to enable an SDS to 1) automatically infer domain knowledge and then to 2) create the data for SLU modelingin order to handle the open-domain requests
seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquoasian foodrdquo
find a cheap eating place for asian food
fully unsupervised
Prior Focus
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
28
Contributions
target
foodprice AMODNN
seeking PREP_FOR
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquoasian foodrdquo
Predicted intent navigation
find a cheap eating place for taiwanese foodUser
Ontology Induction
Structure Learning
Surface Form Derivation
Semantic Decoding
Intent Prediction
(natural language)
(inter-slot relation)
(semantic slot)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
29
ContributionsUser
Ontology Induction
Structure Learning
Surface Form Derivation
Semantic Decoding
Intent Prediction
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
30
Ontology Induction Structure Learning Surface Form Derivation
Semantic Decoding Intent Prediction
ContributionsUser
Knowledge Acquisition SLU Modeling
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
31
Knowledge Acquisition1) Given unlabelled conversations how can a system automatically
induce and organize domain-specific concepts
Restaurant Asking
Conversations
target
foodprice
seeking
quantity
PREP_FOR
PREP_FOR
NN AMOD
AMODAMOD
Organized Domain Knowledge
Unlabelled Collection
Knowledge Acquisition
Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
32
SLU Modeling2) With the automatically acquired knowledge how can a system
understand utterance semantics and user intents
Organized Domain
Knowledge
price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation
SLU Modeling
SLU Component
ldquocan i have a cheap restaurantrdquo
SLU Modeling Semantic Decoding Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
33
SDS Architecture ndash Contributions
DomainDMASR SLU
NLG
Knowledge Acquisition SLU Modeling
current bottleneck
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
34
SDS Flowchart
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
35
SDS Flowchart ndash Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
36
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
37
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
38
[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing
FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame
ldquolow fatrdquo fills the descriptor frame element
SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated
FrameNet sentences
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
39
Ontology Induction [ASRUrsquo13 SLTrsquo14a]
can i have a cheap restaurant
Frame capability
Frame expensiveness
Frame locale by use
1st Issue differentiate domain-specific frames from generic frames for SDSs
GoodGood
Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014
slot candidate
Best Student Paper Award
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
40
1
Utterance 1i would like a cheap restaurant Train
hellip hellip
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1 Test
1 97 95
Frame Semantic Parsing
show me a list of cheap restaurantsTest Utterance
Word Observation Slot Candidate
Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award
Idea increase weights of domain-specific slots and decrease weights of others
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
41
1st Issue How to adapt generic slots to a domain-specific setting
Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot CandidateTrain
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
Slot Induction
Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication
i like
1 1
capability
1
locale_by_use
food expensiveness
seeking
relational_quantitydesiring
Utterance 1i would like a cheap restaurant
hellip hellip
find a restaurant with chinese foodUtterance 2
show me a list of cheap restaurantsTest Utterance
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
42
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
43
Knowledge Graph Construction Syntactic dependency parsing on utterances
ccomp
amoddobjnsubj det
can i have a cheap restaurantcapability expensiveness locale_by_use
Word-based lexical knowledge graph
Slot-based semantic knowledge graph
restaurantcan
have
i
acheap
w
w
capabilitylocale_by_use expensiveness
s
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
44
Dependency-based word embeddings
Dependency-based slot embeddings
Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)
can = have =
expensiveness = capability =
can i have a cheap restaurant
ccomp
amoddobjnsubj det
have acapability expensiveness locale_by_use
ccomp
amoddobjnsubj det
Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
45
Edge Weight Measurement Compute edge weights to represent relation importance
Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings
+
+
w1
w2
w3
w4
w5
w6
w7
s2
s1 s3
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
46
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
Slot Induction
Knowledge Graph Propagation Model119877119908
119878119863
119877119904119878119863
Structure information is integrated to make the self-training data more reliable
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
47
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance hidden semantics
2nd Issue unobserved semantics may benefit understanding
Semantic Decoding [ACL-IJCNLPrsquo15]
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
48
Reasoning with Matrix Factorization
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
9790 9585
93 929805 05
Slot Induction
Feature Model + Knowledge Graph Propagation Model
119877119908119878119863
119877119904119878119863
Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
49
2nd Issue How to model the unobserved hidden semantics
Matrix Factorization (MF) (Rendle et al 2009)
The decomposed matrices represent latent semantics for utterances and wordsslots respectively
The product of two matrices fills the probability of hidden semantics
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
9790 9585
93 929805 05
|119932|
|119934|+|119930|
asymp|119932|times119941 119941times (|119934|+|119930|)times
Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
50
Bayesian Personalized Ranking for MF Model implicit feedback
not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts
Objective
1
119891 +iquest iquest119891 minus119891 minus
The objective is to learn a set of well-ranked semantic slots per utterance
119906119909
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
51
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance
Matrix Factorization SLU (MF-SLU)
MF-SLU can estimate probabilities for slot candidates given test utterances
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
52
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
53
Experimental Setup Dataset Cambridge University SLU Corpus
Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type
Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots
Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
54
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
Approach ASR TranscriptsBaseline
SLUSupport Vector Machine 325 366
Multinomial Logistic Regression 340 388
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
55
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
The MF-SLU effectively models implicit information to decode semantics
The structure information further improves the results
Approach ASR Transcripts
Baseline SLU
Support Vector Machine 325 366Multinomial Logistic Regression 340 388
Proposed MF-SLU
Feature Model 376 453
Feature Model +Knowledge Graph Propagation
435
(+279)534
(+376)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
56
Experiments of Semantic DecodingEffectiveness of Relations
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
In the integrated structure information both semantic and dependency relations are useful for understanding
Approach ASR Transcripts
Feature Model 376 453
Feature + Knowledge Graph Propagation
Semantic 414 516
Dependency 416 490
All 435 (+157) 534 (+179)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Experiments for Structure LearningRelation Discovery Analysis
Discover inter-slot relations connecting important slot pairs
The reference ontology with the most frequent syntactic dependencies
locale_by_use
food expensiveness
seeking
relational_quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMODdesiring
DOBJ
type
food pricerange
DOBJ
AMOD AMOD
AMOD
taskarea
PREP_IN
The automatically learned domain ontology aligns well with the reference one
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57
The data-driven one is more objective while expert-annotated one is more subjective
58
Contributions of Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge
MF-SLU for Semantic Decoding is able to1) unify the automatically
acquired knowledge2) adapt to a domain-
specific setting 3) and then allows
systems to model implicit semantics for better understanding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
16
SDS Process ndash Available Domain Ontology
find a cheap eating place for taiwanese foodUser
target
foodprice AMODNN
seeking PREP_FOR
Organized Domain KnowledgeIntelligent
Agent
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
17
SDS Process ndash Available Domain Ontology
User
target
foodprice AMODNN
seeking PREP_FOR
Organized Domain KnowledgeIntelligent
Agent
Ontology Induction(semantic slot)
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
18
SDS Process ndash Available Domain Ontology
User
target
foodprice AMODNN
seeking PREP_FOR
Organized Domain KnowledgeIntelligent
Agent
Ontology Induction(semantic slot)
Structure Learning(inter-slot relation)
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
19
SDS Process ndash Spoken Language Understanding (SLU)
User
target
foodprice AMODNN
seeking PREP_FORIntelligent
Agent
seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
20
find a cheap eating place for taiwanese food
SDS Process ndash Spoken Language Understanding (SLU)
User
target
foodprice AMODNN
seeking PREP_FORIntelligent
Agent
seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo
Semantic Decoding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
21
find a cheap eating place for taiwanese food
SDS Process ndash Dialogue Management (DM)
User
target
foodprice AMODNN
seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent
Agent
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
22
find a cheap eating place for taiwanese food
SDS Process ndash Dialogue Management (DM)
User
target
foodprice AMODNN
seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent
Agent
Surface Form Derivation(natural language)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
23
SDS Process ndash Dialogue Management (DM)
User
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo
Din Tai FungBoiling Point
Predicted intent navigation
Intelligent Agent
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
24
SDS Process ndash Dialogue Management (DM)
User
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo
Din Tai FungBoiling Point
Predicted intent navigation
Intelligent Agent
Intent Prediction
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
25
SDS Process ndash Natural Language Generation (NLG)
User
Intelligent Agent
Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there (navigation)
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
26
Required Knowledge
target
foodprice AMODNN
seeking PREP_FOR
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo
Predicted intent navigation
User
Required Domain-Specific Information
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
27
Challenges for SDS An SDS in a new domain requires
1) A hand-crafted domain ontology2) Utterances labelled with semantic representations3) An SLU component for mapping utterances into semantic representations
Manual work results in high cost long duration and poor scalability of system development
The goal is to enable an SDS to 1) automatically infer domain knowledge and then to 2) create the data for SLU modelingin order to handle the open-domain requests
seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquoasian foodrdquo
find a cheap eating place for asian food
fully unsupervised
Prior Focus
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
28
Contributions
target
foodprice AMODNN
seeking PREP_FOR
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquoasian foodrdquo
Predicted intent navigation
find a cheap eating place for taiwanese foodUser
Ontology Induction
Structure Learning
Surface Form Derivation
Semantic Decoding
Intent Prediction
(natural language)
(inter-slot relation)
(semantic slot)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
29
ContributionsUser
Ontology Induction
Structure Learning
Surface Form Derivation
Semantic Decoding
Intent Prediction
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
30
Ontology Induction Structure Learning Surface Form Derivation
Semantic Decoding Intent Prediction
ContributionsUser
Knowledge Acquisition SLU Modeling
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
31
Knowledge Acquisition1) Given unlabelled conversations how can a system automatically
induce and organize domain-specific concepts
Restaurant Asking
Conversations
target
foodprice
seeking
quantity
PREP_FOR
PREP_FOR
NN AMOD
AMODAMOD
Organized Domain Knowledge
Unlabelled Collection
Knowledge Acquisition
Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
32
SLU Modeling2) With the automatically acquired knowledge how can a system
understand utterance semantics and user intents
Organized Domain
Knowledge
price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation
SLU Modeling
SLU Component
ldquocan i have a cheap restaurantrdquo
SLU Modeling Semantic Decoding Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
33
SDS Architecture ndash Contributions
DomainDMASR SLU
NLG
Knowledge Acquisition SLU Modeling
current bottleneck
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
34
SDS Flowchart
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
35
SDS Flowchart ndash Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
36
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
37
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
38
[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing
FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame
ldquolow fatrdquo fills the descriptor frame element
SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated
FrameNet sentences
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
39
Ontology Induction [ASRUrsquo13 SLTrsquo14a]
can i have a cheap restaurant
Frame capability
Frame expensiveness
Frame locale by use
1st Issue differentiate domain-specific frames from generic frames for SDSs
GoodGood
Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014
slot candidate
Best Student Paper Award
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
40
1
Utterance 1i would like a cheap restaurant Train
hellip hellip
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1 Test
1 97 95
Frame Semantic Parsing
show me a list of cheap restaurantsTest Utterance
Word Observation Slot Candidate
Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award
Idea increase weights of domain-specific slots and decrease weights of others
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
41
1st Issue How to adapt generic slots to a domain-specific setting
Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot CandidateTrain
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
Slot Induction
Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication
i like
1 1
capability
1
locale_by_use
food expensiveness
seeking
relational_quantitydesiring
Utterance 1i would like a cheap restaurant
hellip hellip
find a restaurant with chinese foodUtterance 2
show me a list of cheap restaurantsTest Utterance
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
42
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
43
Knowledge Graph Construction Syntactic dependency parsing on utterances
ccomp
amoddobjnsubj det
can i have a cheap restaurantcapability expensiveness locale_by_use
Word-based lexical knowledge graph
Slot-based semantic knowledge graph
restaurantcan
have
i
acheap
w
w
capabilitylocale_by_use expensiveness
s
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
44
Dependency-based word embeddings
Dependency-based slot embeddings
Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)
can = have =
expensiveness = capability =
can i have a cheap restaurant
ccomp
amoddobjnsubj det
have acapability expensiveness locale_by_use
ccomp
amoddobjnsubj det
Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
45
Edge Weight Measurement Compute edge weights to represent relation importance
Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings
+
+
w1
w2
w3
w4
w5
w6
w7
s2
s1 s3
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
46
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
Slot Induction
Knowledge Graph Propagation Model119877119908
119878119863
119877119904119878119863
Structure information is integrated to make the self-training data more reliable
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
47
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance hidden semantics
2nd Issue unobserved semantics may benefit understanding
Semantic Decoding [ACL-IJCNLPrsquo15]
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
48
Reasoning with Matrix Factorization
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
9790 9585
93 929805 05
Slot Induction
Feature Model + Knowledge Graph Propagation Model
119877119908119878119863
119877119904119878119863
Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
49
2nd Issue How to model the unobserved hidden semantics
Matrix Factorization (MF) (Rendle et al 2009)
The decomposed matrices represent latent semantics for utterances and wordsslots respectively
The product of two matrices fills the probability of hidden semantics
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
9790 9585
93 929805 05
|119932|
|119934|+|119930|
asymp|119932|times119941 119941times (|119934|+|119930|)times
Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
50
Bayesian Personalized Ranking for MF Model implicit feedback
not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts
Objective
1
119891 +iquest iquest119891 minus119891 minus
The objective is to learn a set of well-ranked semantic slots per utterance
119906119909
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
51
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance
Matrix Factorization SLU (MF-SLU)
MF-SLU can estimate probabilities for slot candidates given test utterances
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
52
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
53
Experimental Setup Dataset Cambridge University SLU Corpus
Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type
Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots
Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
54
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
Approach ASR TranscriptsBaseline
SLUSupport Vector Machine 325 366
Multinomial Logistic Regression 340 388
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
55
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
The MF-SLU effectively models implicit information to decode semantics
The structure information further improves the results
Approach ASR Transcripts
Baseline SLU
Support Vector Machine 325 366Multinomial Logistic Regression 340 388
Proposed MF-SLU
Feature Model 376 453
Feature Model +Knowledge Graph Propagation
435
(+279)534
(+376)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
56
Experiments of Semantic DecodingEffectiveness of Relations
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
In the integrated structure information both semantic and dependency relations are useful for understanding
Approach ASR Transcripts
Feature Model 376 453
Feature + Knowledge Graph Propagation
Semantic 414 516
Dependency 416 490
All 435 (+157) 534 (+179)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Experiments for Structure LearningRelation Discovery Analysis
Discover inter-slot relations connecting important slot pairs
The reference ontology with the most frequent syntactic dependencies
locale_by_use
food expensiveness
seeking
relational_quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMODdesiring
DOBJ
type
food pricerange
DOBJ
AMOD AMOD
AMOD
taskarea
PREP_IN
The automatically learned domain ontology aligns well with the reference one
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57
The data-driven one is more objective while expert-annotated one is more subjective
58
Contributions of Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge
MF-SLU for Semantic Decoding is able to1) unify the automatically
acquired knowledge2) adapt to a domain-
specific setting 3) and then allows
systems to model implicit semantics for better understanding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
17
SDS Process ndash Available Domain Ontology
User
target
foodprice AMODNN
seeking PREP_FOR
Organized Domain KnowledgeIntelligent
Agent
Ontology Induction(semantic slot)
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
18
SDS Process ndash Available Domain Ontology
User
target
foodprice AMODNN
seeking PREP_FOR
Organized Domain KnowledgeIntelligent
Agent
Ontology Induction(semantic slot)
Structure Learning(inter-slot relation)
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
19
SDS Process ndash Spoken Language Understanding (SLU)
User
target
foodprice AMODNN
seeking PREP_FORIntelligent
Agent
seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
20
find a cheap eating place for taiwanese food
SDS Process ndash Spoken Language Understanding (SLU)
User
target
foodprice AMODNN
seeking PREP_FORIntelligent
Agent
seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo
Semantic Decoding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
21
find a cheap eating place for taiwanese food
SDS Process ndash Dialogue Management (DM)
User
target
foodprice AMODNN
seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent
Agent
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
22
find a cheap eating place for taiwanese food
SDS Process ndash Dialogue Management (DM)
User
target
foodprice AMODNN
seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent
Agent
Surface Form Derivation(natural language)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
23
SDS Process ndash Dialogue Management (DM)
User
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo
Din Tai FungBoiling Point
Predicted intent navigation
Intelligent Agent
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
24
SDS Process ndash Dialogue Management (DM)
User
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo
Din Tai FungBoiling Point
Predicted intent navigation
Intelligent Agent
Intent Prediction
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
25
SDS Process ndash Natural Language Generation (NLG)
User
Intelligent Agent
Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there (navigation)
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
26
Required Knowledge
target
foodprice AMODNN
seeking PREP_FOR
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo
Predicted intent navigation
User
Required Domain-Specific Information
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
27
Challenges for SDS An SDS in a new domain requires
1) A hand-crafted domain ontology2) Utterances labelled with semantic representations3) An SLU component for mapping utterances into semantic representations
Manual work results in high cost long duration and poor scalability of system development
The goal is to enable an SDS to 1) automatically infer domain knowledge and then to 2) create the data for SLU modelingin order to handle the open-domain requests
seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquoasian foodrdquo
find a cheap eating place for asian food
fully unsupervised
Prior Focus
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
28
Contributions
target
foodprice AMODNN
seeking PREP_FOR
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquoasian foodrdquo
Predicted intent navigation
find a cheap eating place for taiwanese foodUser
Ontology Induction
Structure Learning
Surface Form Derivation
Semantic Decoding
Intent Prediction
(natural language)
(inter-slot relation)
(semantic slot)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
29
ContributionsUser
Ontology Induction
Structure Learning
Surface Form Derivation
Semantic Decoding
Intent Prediction
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
30
Ontology Induction Structure Learning Surface Form Derivation
Semantic Decoding Intent Prediction
ContributionsUser
Knowledge Acquisition SLU Modeling
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
31
Knowledge Acquisition1) Given unlabelled conversations how can a system automatically
induce and organize domain-specific concepts
Restaurant Asking
Conversations
target
foodprice
seeking
quantity
PREP_FOR
PREP_FOR
NN AMOD
AMODAMOD
Organized Domain Knowledge
Unlabelled Collection
Knowledge Acquisition
Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
32
SLU Modeling2) With the automatically acquired knowledge how can a system
understand utterance semantics and user intents
Organized Domain
Knowledge
price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation
SLU Modeling
SLU Component
ldquocan i have a cheap restaurantrdquo
SLU Modeling Semantic Decoding Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
33
SDS Architecture ndash Contributions
DomainDMASR SLU
NLG
Knowledge Acquisition SLU Modeling
current bottleneck
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
34
SDS Flowchart
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
35
SDS Flowchart ndash Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
36
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
37
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
38
[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing
FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame
ldquolow fatrdquo fills the descriptor frame element
SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated
FrameNet sentences
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
39
Ontology Induction [ASRUrsquo13 SLTrsquo14a]
can i have a cheap restaurant
Frame capability
Frame expensiveness
Frame locale by use
1st Issue differentiate domain-specific frames from generic frames for SDSs
GoodGood
Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014
slot candidate
Best Student Paper Award
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
40
1
Utterance 1i would like a cheap restaurant Train
hellip hellip
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1 Test
1 97 95
Frame Semantic Parsing
show me a list of cheap restaurantsTest Utterance
Word Observation Slot Candidate
Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award
Idea increase weights of domain-specific slots and decrease weights of others
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
41
1st Issue How to adapt generic slots to a domain-specific setting
Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot CandidateTrain
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
Slot Induction
Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication
i like
1 1
capability
1
locale_by_use
food expensiveness
seeking
relational_quantitydesiring
Utterance 1i would like a cheap restaurant
hellip hellip
find a restaurant with chinese foodUtterance 2
show me a list of cheap restaurantsTest Utterance
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
42
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
43
Knowledge Graph Construction Syntactic dependency parsing on utterances
ccomp
amoddobjnsubj det
can i have a cheap restaurantcapability expensiveness locale_by_use
Word-based lexical knowledge graph
Slot-based semantic knowledge graph
restaurantcan
have
i
acheap
w
w
capabilitylocale_by_use expensiveness
s
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
44
Dependency-based word embeddings
Dependency-based slot embeddings
Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)
can = have =
expensiveness = capability =
can i have a cheap restaurant
ccomp
amoddobjnsubj det
have acapability expensiveness locale_by_use
ccomp
amoddobjnsubj det
Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
45
Edge Weight Measurement Compute edge weights to represent relation importance
Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings
+
+
w1
w2
w3
w4
w5
w6
w7
s2
s1 s3
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
46
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
Slot Induction
Knowledge Graph Propagation Model119877119908
119878119863
119877119904119878119863
Structure information is integrated to make the self-training data more reliable
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
47
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance hidden semantics
2nd Issue unobserved semantics may benefit understanding
Semantic Decoding [ACL-IJCNLPrsquo15]
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
48
Reasoning with Matrix Factorization
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
9790 9585
93 929805 05
Slot Induction
Feature Model + Knowledge Graph Propagation Model
119877119908119878119863
119877119904119878119863
Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
49
2nd Issue How to model the unobserved hidden semantics
Matrix Factorization (MF) (Rendle et al 2009)
The decomposed matrices represent latent semantics for utterances and wordsslots respectively
The product of two matrices fills the probability of hidden semantics
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
9790 9585
93 929805 05
|119932|
|119934|+|119930|
asymp|119932|times119941 119941times (|119934|+|119930|)times
Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
50
Bayesian Personalized Ranking for MF Model implicit feedback
not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts
Objective
1
119891 +iquest iquest119891 minus119891 minus
The objective is to learn a set of well-ranked semantic slots per utterance
119906119909
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
51
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance
Matrix Factorization SLU (MF-SLU)
MF-SLU can estimate probabilities for slot candidates given test utterances
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
52
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
53
Experimental Setup Dataset Cambridge University SLU Corpus
Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type
Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots
Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
54
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
Approach ASR TranscriptsBaseline
SLUSupport Vector Machine 325 366
Multinomial Logistic Regression 340 388
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
55
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
The MF-SLU effectively models implicit information to decode semantics
The structure information further improves the results
Approach ASR Transcripts
Baseline SLU
Support Vector Machine 325 366Multinomial Logistic Regression 340 388
Proposed MF-SLU
Feature Model 376 453
Feature Model +Knowledge Graph Propagation
435
(+279)534
(+376)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
56
Experiments of Semantic DecodingEffectiveness of Relations
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
In the integrated structure information both semantic and dependency relations are useful for understanding
Approach ASR Transcripts
Feature Model 376 453
Feature + Knowledge Graph Propagation
Semantic 414 516
Dependency 416 490
All 435 (+157) 534 (+179)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Experiments for Structure LearningRelation Discovery Analysis
Discover inter-slot relations connecting important slot pairs
The reference ontology with the most frequent syntactic dependencies
locale_by_use
food expensiveness
seeking
relational_quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMODdesiring
DOBJ
type
food pricerange
DOBJ
AMOD AMOD
AMOD
taskarea
PREP_IN
The automatically learned domain ontology aligns well with the reference one
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57
The data-driven one is more objective while expert-annotated one is more subjective
58
Contributions of Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge
MF-SLU for Semantic Decoding is able to1) unify the automatically
acquired knowledge2) adapt to a domain-
specific setting 3) and then allows
systems to model implicit semantics for better understanding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
18
SDS Process ndash Available Domain Ontology
User
target
foodprice AMODNN
seeking PREP_FOR
Organized Domain KnowledgeIntelligent
Agent
Ontology Induction(semantic slot)
Structure Learning(inter-slot relation)
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
19
SDS Process ndash Spoken Language Understanding (SLU)
User
target
foodprice AMODNN
seeking PREP_FORIntelligent
Agent
seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
20
find a cheap eating place for taiwanese food
SDS Process ndash Spoken Language Understanding (SLU)
User
target
foodprice AMODNN
seeking PREP_FORIntelligent
Agent
seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo
Semantic Decoding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
21
find a cheap eating place for taiwanese food
SDS Process ndash Dialogue Management (DM)
User
target
foodprice AMODNN
seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent
Agent
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
22
find a cheap eating place for taiwanese food
SDS Process ndash Dialogue Management (DM)
User
target
foodprice AMODNN
seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent
Agent
Surface Form Derivation(natural language)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
23
SDS Process ndash Dialogue Management (DM)
User
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo
Din Tai FungBoiling Point
Predicted intent navigation
Intelligent Agent
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
24
SDS Process ndash Dialogue Management (DM)
User
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo
Din Tai FungBoiling Point
Predicted intent navigation
Intelligent Agent
Intent Prediction
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
25
SDS Process ndash Natural Language Generation (NLG)
User
Intelligent Agent
Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there (navigation)
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
26
Required Knowledge
target
foodprice AMODNN
seeking PREP_FOR
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo
Predicted intent navigation
User
Required Domain-Specific Information
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
27
Challenges for SDS An SDS in a new domain requires
1) A hand-crafted domain ontology2) Utterances labelled with semantic representations3) An SLU component for mapping utterances into semantic representations
Manual work results in high cost long duration and poor scalability of system development
The goal is to enable an SDS to 1) automatically infer domain knowledge and then to 2) create the data for SLU modelingin order to handle the open-domain requests
seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquoasian foodrdquo
find a cheap eating place for asian food
fully unsupervised
Prior Focus
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
28
Contributions
target
foodprice AMODNN
seeking PREP_FOR
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquoasian foodrdquo
Predicted intent navigation
find a cheap eating place for taiwanese foodUser
Ontology Induction
Structure Learning
Surface Form Derivation
Semantic Decoding
Intent Prediction
(natural language)
(inter-slot relation)
(semantic slot)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
29
ContributionsUser
Ontology Induction
Structure Learning
Surface Form Derivation
Semantic Decoding
Intent Prediction
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
30
Ontology Induction Structure Learning Surface Form Derivation
Semantic Decoding Intent Prediction
ContributionsUser
Knowledge Acquisition SLU Modeling
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
31
Knowledge Acquisition1) Given unlabelled conversations how can a system automatically
induce and organize domain-specific concepts
Restaurant Asking
Conversations
target
foodprice
seeking
quantity
PREP_FOR
PREP_FOR
NN AMOD
AMODAMOD
Organized Domain Knowledge
Unlabelled Collection
Knowledge Acquisition
Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
32
SLU Modeling2) With the automatically acquired knowledge how can a system
understand utterance semantics and user intents
Organized Domain
Knowledge
price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation
SLU Modeling
SLU Component
ldquocan i have a cheap restaurantrdquo
SLU Modeling Semantic Decoding Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
33
SDS Architecture ndash Contributions
DomainDMASR SLU
NLG
Knowledge Acquisition SLU Modeling
current bottleneck
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
34
SDS Flowchart
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
35
SDS Flowchart ndash Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
36
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
37
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
38
[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing
FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame
ldquolow fatrdquo fills the descriptor frame element
SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated
FrameNet sentences
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
39
Ontology Induction [ASRUrsquo13 SLTrsquo14a]
can i have a cheap restaurant
Frame capability
Frame expensiveness
Frame locale by use
1st Issue differentiate domain-specific frames from generic frames for SDSs
GoodGood
Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014
slot candidate
Best Student Paper Award
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
40
1
Utterance 1i would like a cheap restaurant Train
hellip hellip
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1 Test
1 97 95
Frame Semantic Parsing
show me a list of cheap restaurantsTest Utterance
Word Observation Slot Candidate
Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award
Idea increase weights of domain-specific slots and decrease weights of others
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
41
1st Issue How to adapt generic slots to a domain-specific setting
Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot CandidateTrain
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
Slot Induction
Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication
i like
1 1
capability
1
locale_by_use
food expensiveness
seeking
relational_quantitydesiring
Utterance 1i would like a cheap restaurant
hellip hellip
find a restaurant with chinese foodUtterance 2
show me a list of cheap restaurantsTest Utterance
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
42
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
43
Knowledge Graph Construction Syntactic dependency parsing on utterances
ccomp
amoddobjnsubj det
can i have a cheap restaurantcapability expensiveness locale_by_use
Word-based lexical knowledge graph
Slot-based semantic knowledge graph
restaurantcan
have
i
acheap
w
w
capabilitylocale_by_use expensiveness
s
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
44
Dependency-based word embeddings
Dependency-based slot embeddings
Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)
can = have =
expensiveness = capability =
can i have a cheap restaurant
ccomp
amoddobjnsubj det
have acapability expensiveness locale_by_use
ccomp
amoddobjnsubj det
Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
45
Edge Weight Measurement Compute edge weights to represent relation importance
Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings
+
+
w1
w2
w3
w4
w5
w6
w7
s2
s1 s3
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
46
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
Slot Induction
Knowledge Graph Propagation Model119877119908
119878119863
119877119904119878119863
Structure information is integrated to make the self-training data more reliable
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
47
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance hidden semantics
2nd Issue unobserved semantics may benefit understanding
Semantic Decoding [ACL-IJCNLPrsquo15]
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
48
Reasoning with Matrix Factorization
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
9790 9585
93 929805 05
Slot Induction
Feature Model + Knowledge Graph Propagation Model
119877119908119878119863
119877119904119878119863
Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
49
2nd Issue How to model the unobserved hidden semantics
Matrix Factorization (MF) (Rendle et al 2009)
The decomposed matrices represent latent semantics for utterances and wordsslots respectively
The product of two matrices fills the probability of hidden semantics
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
9790 9585
93 929805 05
|119932|
|119934|+|119930|
asymp|119932|times119941 119941times (|119934|+|119930|)times
Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
50
Bayesian Personalized Ranking for MF Model implicit feedback
not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts
Objective
1
119891 +iquest iquest119891 minus119891 minus
The objective is to learn a set of well-ranked semantic slots per utterance
119906119909
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
51
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance
Matrix Factorization SLU (MF-SLU)
MF-SLU can estimate probabilities for slot candidates given test utterances
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
52
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
53
Experimental Setup Dataset Cambridge University SLU Corpus
Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type
Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots
Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
54
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
Approach ASR TranscriptsBaseline
SLUSupport Vector Machine 325 366
Multinomial Logistic Regression 340 388
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
55
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
The MF-SLU effectively models implicit information to decode semantics
The structure information further improves the results
Approach ASR Transcripts
Baseline SLU
Support Vector Machine 325 366Multinomial Logistic Regression 340 388
Proposed MF-SLU
Feature Model 376 453
Feature Model +Knowledge Graph Propagation
435
(+279)534
(+376)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
56
Experiments of Semantic DecodingEffectiveness of Relations
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
In the integrated structure information both semantic and dependency relations are useful for understanding
Approach ASR Transcripts
Feature Model 376 453
Feature + Knowledge Graph Propagation
Semantic 414 516
Dependency 416 490
All 435 (+157) 534 (+179)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Experiments for Structure LearningRelation Discovery Analysis
Discover inter-slot relations connecting important slot pairs
The reference ontology with the most frequent syntactic dependencies
locale_by_use
food expensiveness
seeking
relational_quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMODdesiring
DOBJ
type
food pricerange
DOBJ
AMOD AMOD
AMOD
taskarea
PREP_IN
The automatically learned domain ontology aligns well with the reference one
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57
The data-driven one is more objective while expert-annotated one is more subjective
58
Contributions of Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge
MF-SLU for Semantic Decoding is able to1) unify the automatically
acquired knowledge2) adapt to a domain-
specific setting 3) and then allows
systems to model implicit semantics for better understanding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
19
SDS Process ndash Spoken Language Understanding (SLU)
User
target
foodprice AMODNN
seeking PREP_FORIntelligent
Agent
seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
20
find a cheap eating place for taiwanese food
SDS Process ndash Spoken Language Understanding (SLU)
User
target
foodprice AMODNN
seeking PREP_FORIntelligent
Agent
seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo
Semantic Decoding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
21
find a cheap eating place for taiwanese food
SDS Process ndash Dialogue Management (DM)
User
target
foodprice AMODNN
seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent
Agent
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
22
find a cheap eating place for taiwanese food
SDS Process ndash Dialogue Management (DM)
User
target
foodprice AMODNN
seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent
Agent
Surface Form Derivation(natural language)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
23
SDS Process ndash Dialogue Management (DM)
User
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo
Din Tai FungBoiling Point
Predicted intent navigation
Intelligent Agent
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
24
SDS Process ndash Dialogue Management (DM)
User
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo
Din Tai FungBoiling Point
Predicted intent navigation
Intelligent Agent
Intent Prediction
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
25
SDS Process ndash Natural Language Generation (NLG)
User
Intelligent Agent
Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there (navigation)
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
26
Required Knowledge
target
foodprice AMODNN
seeking PREP_FOR
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo
Predicted intent navigation
User
Required Domain-Specific Information
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
27
Challenges for SDS An SDS in a new domain requires
1) A hand-crafted domain ontology2) Utterances labelled with semantic representations3) An SLU component for mapping utterances into semantic representations
Manual work results in high cost long duration and poor scalability of system development
The goal is to enable an SDS to 1) automatically infer domain knowledge and then to 2) create the data for SLU modelingin order to handle the open-domain requests
seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquoasian foodrdquo
find a cheap eating place for asian food
fully unsupervised
Prior Focus
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
28
Contributions
target
foodprice AMODNN
seeking PREP_FOR
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquoasian foodrdquo
Predicted intent navigation
find a cheap eating place for taiwanese foodUser
Ontology Induction
Structure Learning
Surface Form Derivation
Semantic Decoding
Intent Prediction
(natural language)
(inter-slot relation)
(semantic slot)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
29
ContributionsUser
Ontology Induction
Structure Learning
Surface Form Derivation
Semantic Decoding
Intent Prediction
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
30
Ontology Induction Structure Learning Surface Form Derivation
Semantic Decoding Intent Prediction
ContributionsUser
Knowledge Acquisition SLU Modeling
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
31
Knowledge Acquisition1) Given unlabelled conversations how can a system automatically
induce and organize domain-specific concepts
Restaurant Asking
Conversations
target
foodprice
seeking
quantity
PREP_FOR
PREP_FOR
NN AMOD
AMODAMOD
Organized Domain Knowledge
Unlabelled Collection
Knowledge Acquisition
Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
32
SLU Modeling2) With the automatically acquired knowledge how can a system
understand utterance semantics and user intents
Organized Domain
Knowledge
price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation
SLU Modeling
SLU Component
ldquocan i have a cheap restaurantrdquo
SLU Modeling Semantic Decoding Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
33
SDS Architecture ndash Contributions
DomainDMASR SLU
NLG
Knowledge Acquisition SLU Modeling
current bottleneck
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
34
SDS Flowchart
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
35
SDS Flowchart ndash Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
36
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
37
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
38
[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing
FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame
ldquolow fatrdquo fills the descriptor frame element
SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated
FrameNet sentences
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
39
Ontology Induction [ASRUrsquo13 SLTrsquo14a]
can i have a cheap restaurant
Frame capability
Frame expensiveness
Frame locale by use
1st Issue differentiate domain-specific frames from generic frames for SDSs
GoodGood
Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014
slot candidate
Best Student Paper Award
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
40
1
Utterance 1i would like a cheap restaurant Train
hellip hellip
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1 Test
1 97 95
Frame Semantic Parsing
show me a list of cheap restaurantsTest Utterance
Word Observation Slot Candidate
Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award
Idea increase weights of domain-specific slots and decrease weights of others
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
41
1st Issue How to adapt generic slots to a domain-specific setting
Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot CandidateTrain
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
Slot Induction
Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication
i like
1 1
capability
1
locale_by_use
food expensiveness
seeking
relational_quantitydesiring
Utterance 1i would like a cheap restaurant
hellip hellip
find a restaurant with chinese foodUtterance 2
show me a list of cheap restaurantsTest Utterance
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
42
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
43
Knowledge Graph Construction Syntactic dependency parsing on utterances
ccomp
amoddobjnsubj det
can i have a cheap restaurantcapability expensiveness locale_by_use
Word-based lexical knowledge graph
Slot-based semantic knowledge graph
restaurantcan
have
i
acheap
w
w
capabilitylocale_by_use expensiveness
s
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
44
Dependency-based word embeddings
Dependency-based slot embeddings
Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)
can = have =
expensiveness = capability =
can i have a cheap restaurant
ccomp
amoddobjnsubj det
have acapability expensiveness locale_by_use
ccomp
amoddobjnsubj det
Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
45
Edge Weight Measurement Compute edge weights to represent relation importance
Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings
+
+
w1
w2
w3
w4
w5
w6
w7
s2
s1 s3
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
46
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
Slot Induction
Knowledge Graph Propagation Model119877119908
119878119863
119877119904119878119863
Structure information is integrated to make the self-training data more reliable
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
47
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance hidden semantics
2nd Issue unobserved semantics may benefit understanding
Semantic Decoding [ACL-IJCNLPrsquo15]
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
48
Reasoning with Matrix Factorization
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
9790 9585
93 929805 05
Slot Induction
Feature Model + Knowledge Graph Propagation Model
119877119908119878119863
119877119904119878119863
Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
49
2nd Issue How to model the unobserved hidden semantics
Matrix Factorization (MF) (Rendle et al 2009)
The decomposed matrices represent latent semantics for utterances and wordsslots respectively
The product of two matrices fills the probability of hidden semantics
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
9790 9585
93 929805 05
|119932|
|119934|+|119930|
asymp|119932|times119941 119941times (|119934|+|119930|)times
Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
50
Bayesian Personalized Ranking for MF Model implicit feedback
not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts
Objective
1
119891 +iquest iquest119891 minus119891 minus
The objective is to learn a set of well-ranked semantic slots per utterance
119906119909
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
51
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance
Matrix Factorization SLU (MF-SLU)
MF-SLU can estimate probabilities for slot candidates given test utterances
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
52
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
53
Experimental Setup Dataset Cambridge University SLU Corpus
Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type
Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots
Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
54
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
Approach ASR TranscriptsBaseline
SLUSupport Vector Machine 325 366
Multinomial Logistic Regression 340 388
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
55
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
The MF-SLU effectively models implicit information to decode semantics
The structure information further improves the results
Approach ASR Transcripts
Baseline SLU
Support Vector Machine 325 366Multinomial Logistic Regression 340 388
Proposed MF-SLU
Feature Model 376 453
Feature Model +Knowledge Graph Propagation
435
(+279)534
(+376)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
56
Experiments of Semantic DecodingEffectiveness of Relations
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
In the integrated structure information both semantic and dependency relations are useful for understanding
Approach ASR Transcripts
Feature Model 376 453
Feature + Knowledge Graph Propagation
Semantic 414 516
Dependency 416 490
All 435 (+157) 534 (+179)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Experiments for Structure LearningRelation Discovery Analysis
Discover inter-slot relations connecting important slot pairs
The reference ontology with the most frequent syntactic dependencies
locale_by_use
food expensiveness
seeking
relational_quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMODdesiring
DOBJ
type
food pricerange
DOBJ
AMOD AMOD
AMOD
taskarea
PREP_IN
The automatically learned domain ontology aligns well with the reference one
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57
The data-driven one is more objective while expert-annotated one is more subjective
58
Contributions of Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge
MF-SLU for Semantic Decoding is able to1) unify the automatically
acquired knowledge2) adapt to a domain-
specific setting 3) and then allows
systems to model implicit semantics for better understanding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
20
find a cheap eating place for taiwanese food
SDS Process ndash Spoken Language Understanding (SLU)
User
target
foodprice AMODNN
seeking PREP_FORIntelligent
Agent
seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo
Semantic Decoding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
21
find a cheap eating place for taiwanese food
SDS Process ndash Dialogue Management (DM)
User
target
foodprice AMODNN
seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent
Agent
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
22
find a cheap eating place for taiwanese food
SDS Process ndash Dialogue Management (DM)
User
target
foodprice AMODNN
seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent
Agent
Surface Form Derivation(natural language)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
23
SDS Process ndash Dialogue Management (DM)
User
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo
Din Tai FungBoiling Point
Predicted intent navigation
Intelligent Agent
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
24
SDS Process ndash Dialogue Management (DM)
User
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo
Din Tai FungBoiling Point
Predicted intent navigation
Intelligent Agent
Intent Prediction
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
25
SDS Process ndash Natural Language Generation (NLG)
User
Intelligent Agent
Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there (navigation)
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
26
Required Knowledge
target
foodprice AMODNN
seeking PREP_FOR
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo
Predicted intent navigation
User
Required Domain-Specific Information
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
27
Challenges for SDS An SDS in a new domain requires
1) A hand-crafted domain ontology2) Utterances labelled with semantic representations3) An SLU component for mapping utterances into semantic representations
Manual work results in high cost long duration and poor scalability of system development
The goal is to enable an SDS to 1) automatically infer domain knowledge and then to 2) create the data for SLU modelingin order to handle the open-domain requests
seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquoasian foodrdquo
find a cheap eating place for asian food
fully unsupervised
Prior Focus
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
28
Contributions
target
foodprice AMODNN
seeking PREP_FOR
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquoasian foodrdquo
Predicted intent navigation
find a cheap eating place for taiwanese foodUser
Ontology Induction
Structure Learning
Surface Form Derivation
Semantic Decoding
Intent Prediction
(natural language)
(inter-slot relation)
(semantic slot)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
29
ContributionsUser
Ontology Induction
Structure Learning
Surface Form Derivation
Semantic Decoding
Intent Prediction
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
30
Ontology Induction Structure Learning Surface Form Derivation
Semantic Decoding Intent Prediction
ContributionsUser
Knowledge Acquisition SLU Modeling
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
31
Knowledge Acquisition1) Given unlabelled conversations how can a system automatically
induce and organize domain-specific concepts
Restaurant Asking
Conversations
target
foodprice
seeking
quantity
PREP_FOR
PREP_FOR
NN AMOD
AMODAMOD
Organized Domain Knowledge
Unlabelled Collection
Knowledge Acquisition
Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
32
SLU Modeling2) With the automatically acquired knowledge how can a system
understand utterance semantics and user intents
Organized Domain
Knowledge
price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation
SLU Modeling
SLU Component
ldquocan i have a cheap restaurantrdquo
SLU Modeling Semantic Decoding Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
33
SDS Architecture ndash Contributions
DomainDMASR SLU
NLG
Knowledge Acquisition SLU Modeling
current bottleneck
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
34
SDS Flowchart
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
35
SDS Flowchart ndash Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
36
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
37
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
38
[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing
FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame
ldquolow fatrdquo fills the descriptor frame element
SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated
FrameNet sentences
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
39
Ontology Induction [ASRUrsquo13 SLTrsquo14a]
can i have a cheap restaurant
Frame capability
Frame expensiveness
Frame locale by use
1st Issue differentiate domain-specific frames from generic frames for SDSs
GoodGood
Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014
slot candidate
Best Student Paper Award
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
40
1
Utterance 1i would like a cheap restaurant Train
hellip hellip
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1 Test
1 97 95
Frame Semantic Parsing
show me a list of cheap restaurantsTest Utterance
Word Observation Slot Candidate
Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award
Idea increase weights of domain-specific slots and decrease weights of others
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
41
1st Issue How to adapt generic slots to a domain-specific setting
Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot CandidateTrain
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
Slot Induction
Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication
i like
1 1
capability
1
locale_by_use
food expensiveness
seeking
relational_quantitydesiring
Utterance 1i would like a cheap restaurant
hellip hellip
find a restaurant with chinese foodUtterance 2
show me a list of cheap restaurantsTest Utterance
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
42
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
43
Knowledge Graph Construction Syntactic dependency parsing on utterances
ccomp
amoddobjnsubj det
can i have a cheap restaurantcapability expensiveness locale_by_use
Word-based lexical knowledge graph
Slot-based semantic knowledge graph
restaurantcan
have
i
acheap
w
w
capabilitylocale_by_use expensiveness
s
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
44
Dependency-based word embeddings
Dependency-based slot embeddings
Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)
can = have =
expensiveness = capability =
can i have a cheap restaurant
ccomp
amoddobjnsubj det
have acapability expensiveness locale_by_use
ccomp
amoddobjnsubj det
Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
45
Edge Weight Measurement Compute edge weights to represent relation importance
Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings
+
+
w1
w2
w3
w4
w5
w6
w7
s2
s1 s3
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
46
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
Slot Induction
Knowledge Graph Propagation Model119877119908
119878119863
119877119904119878119863
Structure information is integrated to make the self-training data more reliable
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
47
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance hidden semantics
2nd Issue unobserved semantics may benefit understanding
Semantic Decoding [ACL-IJCNLPrsquo15]
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
48
Reasoning with Matrix Factorization
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
9790 9585
93 929805 05
Slot Induction
Feature Model + Knowledge Graph Propagation Model
119877119908119878119863
119877119904119878119863
Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
49
2nd Issue How to model the unobserved hidden semantics
Matrix Factorization (MF) (Rendle et al 2009)
The decomposed matrices represent latent semantics for utterances and wordsslots respectively
The product of two matrices fills the probability of hidden semantics
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
9790 9585
93 929805 05
|119932|
|119934|+|119930|
asymp|119932|times119941 119941times (|119934|+|119930|)times
Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
50
Bayesian Personalized Ranking for MF Model implicit feedback
not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts
Objective
1
119891 +iquest iquest119891 minus119891 minus
The objective is to learn a set of well-ranked semantic slots per utterance
119906119909
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
51
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance
Matrix Factorization SLU (MF-SLU)
MF-SLU can estimate probabilities for slot candidates given test utterances
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
52
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
53
Experimental Setup Dataset Cambridge University SLU Corpus
Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type
Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots
Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
54
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
Approach ASR TranscriptsBaseline
SLUSupport Vector Machine 325 366
Multinomial Logistic Regression 340 388
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
55
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
The MF-SLU effectively models implicit information to decode semantics
The structure information further improves the results
Approach ASR Transcripts
Baseline SLU
Support Vector Machine 325 366Multinomial Logistic Regression 340 388
Proposed MF-SLU
Feature Model 376 453
Feature Model +Knowledge Graph Propagation
435
(+279)534
(+376)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
56
Experiments of Semantic DecodingEffectiveness of Relations
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
In the integrated structure information both semantic and dependency relations are useful for understanding
Approach ASR Transcripts
Feature Model 376 453
Feature + Knowledge Graph Propagation
Semantic 414 516
Dependency 416 490
All 435 (+157) 534 (+179)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Experiments for Structure LearningRelation Discovery Analysis
Discover inter-slot relations connecting important slot pairs
The reference ontology with the most frequent syntactic dependencies
locale_by_use
food expensiveness
seeking
relational_quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMODdesiring
DOBJ
type
food pricerange
DOBJ
AMOD AMOD
AMOD
taskarea
PREP_IN
The automatically learned domain ontology aligns well with the reference one
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57
The data-driven one is more objective while expert-annotated one is more subjective
58
Contributions of Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge
MF-SLU for Semantic Decoding is able to1) unify the automatically
acquired knowledge2) adapt to a domain-
specific setting 3) and then allows
systems to model implicit semantics for better understanding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
21
find a cheap eating place for taiwanese food
SDS Process ndash Dialogue Management (DM)
User
target
foodprice AMODNN
seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent
Agent
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
22
find a cheap eating place for taiwanese food
SDS Process ndash Dialogue Management (DM)
User
target
foodprice AMODNN
seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent
Agent
Surface Form Derivation(natural language)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
23
SDS Process ndash Dialogue Management (DM)
User
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo
Din Tai FungBoiling Point
Predicted intent navigation
Intelligent Agent
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
24
SDS Process ndash Dialogue Management (DM)
User
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo
Din Tai FungBoiling Point
Predicted intent navigation
Intelligent Agent
Intent Prediction
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
25
SDS Process ndash Natural Language Generation (NLG)
User
Intelligent Agent
Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there (navigation)
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
26
Required Knowledge
target
foodprice AMODNN
seeking PREP_FOR
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo
Predicted intent navigation
User
Required Domain-Specific Information
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
27
Challenges for SDS An SDS in a new domain requires
1) A hand-crafted domain ontology2) Utterances labelled with semantic representations3) An SLU component for mapping utterances into semantic representations
Manual work results in high cost long duration and poor scalability of system development
The goal is to enable an SDS to 1) automatically infer domain knowledge and then to 2) create the data for SLU modelingin order to handle the open-domain requests
seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquoasian foodrdquo
find a cheap eating place for asian food
fully unsupervised
Prior Focus
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
28
Contributions
target
foodprice AMODNN
seeking PREP_FOR
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquoasian foodrdquo
Predicted intent navigation
find a cheap eating place for taiwanese foodUser
Ontology Induction
Structure Learning
Surface Form Derivation
Semantic Decoding
Intent Prediction
(natural language)
(inter-slot relation)
(semantic slot)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
29
ContributionsUser
Ontology Induction
Structure Learning
Surface Form Derivation
Semantic Decoding
Intent Prediction
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
30
Ontology Induction Structure Learning Surface Form Derivation
Semantic Decoding Intent Prediction
ContributionsUser
Knowledge Acquisition SLU Modeling
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
31
Knowledge Acquisition1) Given unlabelled conversations how can a system automatically
induce and organize domain-specific concepts
Restaurant Asking
Conversations
target
foodprice
seeking
quantity
PREP_FOR
PREP_FOR
NN AMOD
AMODAMOD
Organized Domain Knowledge
Unlabelled Collection
Knowledge Acquisition
Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
32
SLU Modeling2) With the automatically acquired knowledge how can a system
understand utterance semantics and user intents
Organized Domain
Knowledge
price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation
SLU Modeling
SLU Component
ldquocan i have a cheap restaurantrdquo
SLU Modeling Semantic Decoding Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
33
SDS Architecture ndash Contributions
DomainDMASR SLU
NLG
Knowledge Acquisition SLU Modeling
current bottleneck
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
34
SDS Flowchart
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
35
SDS Flowchart ndash Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
36
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
37
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
38
[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing
FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame
ldquolow fatrdquo fills the descriptor frame element
SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated
FrameNet sentences
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
39
Ontology Induction [ASRUrsquo13 SLTrsquo14a]
can i have a cheap restaurant
Frame capability
Frame expensiveness
Frame locale by use
1st Issue differentiate domain-specific frames from generic frames for SDSs
GoodGood
Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014
slot candidate
Best Student Paper Award
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
40
1
Utterance 1i would like a cheap restaurant Train
hellip hellip
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1 Test
1 97 95
Frame Semantic Parsing
show me a list of cheap restaurantsTest Utterance
Word Observation Slot Candidate
Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award
Idea increase weights of domain-specific slots and decrease weights of others
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
41
1st Issue How to adapt generic slots to a domain-specific setting
Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot CandidateTrain
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
Slot Induction
Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication
i like
1 1
capability
1
locale_by_use
food expensiveness
seeking
relational_quantitydesiring
Utterance 1i would like a cheap restaurant
hellip hellip
find a restaurant with chinese foodUtterance 2
show me a list of cheap restaurantsTest Utterance
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
42
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
43
Knowledge Graph Construction Syntactic dependency parsing on utterances
ccomp
amoddobjnsubj det
can i have a cheap restaurantcapability expensiveness locale_by_use
Word-based lexical knowledge graph
Slot-based semantic knowledge graph
restaurantcan
have
i
acheap
w
w
capabilitylocale_by_use expensiveness
s
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
44
Dependency-based word embeddings
Dependency-based slot embeddings
Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)
can = have =
expensiveness = capability =
can i have a cheap restaurant
ccomp
amoddobjnsubj det
have acapability expensiveness locale_by_use
ccomp
amoddobjnsubj det
Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
45
Edge Weight Measurement Compute edge weights to represent relation importance
Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings
+
+
w1
w2
w3
w4
w5
w6
w7
s2
s1 s3
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
46
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
Slot Induction
Knowledge Graph Propagation Model119877119908
119878119863
119877119904119878119863
Structure information is integrated to make the self-training data more reliable
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
47
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance hidden semantics
2nd Issue unobserved semantics may benefit understanding
Semantic Decoding [ACL-IJCNLPrsquo15]
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
48
Reasoning with Matrix Factorization
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
9790 9585
93 929805 05
Slot Induction
Feature Model + Knowledge Graph Propagation Model
119877119908119878119863
119877119904119878119863
Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
49
2nd Issue How to model the unobserved hidden semantics
Matrix Factorization (MF) (Rendle et al 2009)
The decomposed matrices represent latent semantics for utterances and wordsslots respectively
The product of two matrices fills the probability of hidden semantics
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
9790 9585
93 929805 05
|119932|
|119934|+|119930|
asymp|119932|times119941 119941times (|119934|+|119930|)times
Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
50
Bayesian Personalized Ranking for MF Model implicit feedback
not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts
Objective
1
119891 +iquest iquest119891 minus119891 minus
The objective is to learn a set of well-ranked semantic slots per utterance
119906119909
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
51
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance
Matrix Factorization SLU (MF-SLU)
MF-SLU can estimate probabilities for slot candidates given test utterances
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
52
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
53
Experimental Setup Dataset Cambridge University SLU Corpus
Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type
Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots
Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
54
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
Approach ASR TranscriptsBaseline
SLUSupport Vector Machine 325 366
Multinomial Logistic Regression 340 388
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
55
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
The MF-SLU effectively models implicit information to decode semantics
The structure information further improves the results
Approach ASR Transcripts
Baseline SLU
Support Vector Machine 325 366Multinomial Logistic Regression 340 388
Proposed MF-SLU
Feature Model 376 453
Feature Model +Knowledge Graph Propagation
435
(+279)534
(+376)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
56
Experiments of Semantic DecodingEffectiveness of Relations
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
In the integrated structure information both semantic and dependency relations are useful for understanding
Approach ASR Transcripts
Feature Model 376 453
Feature + Knowledge Graph Propagation
Semantic 414 516
Dependency 416 490
All 435 (+157) 534 (+179)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Experiments for Structure LearningRelation Discovery Analysis
Discover inter-slot relations connecting important slot pairs
The reference ontology with the most frequent syntactic dependencies
locale_by_use
food expensiveness
seeking
relational_quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMODdesiring
DOBJ
type
food pricerange
DOBJ
AMOD AMOD
AMOD
taskarea
PREP_IN
The automatically learned domain ontology aligns well with the reference one
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57
The data-driven one is more objective while expert-annotated one is more subjective
58
Contributions of Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge
MF-SLU for Semantic Decoding is able to1) unify the automatically
acquired knowledge2) adapt to a domain-
specific setting 3) and then allows
systems to model implicit semantics for better understanding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
22
find a cheap eating place for taiwanese food
SDS Process ndash Dialogue Management (DM)
User
target
foodprice AMODNN
seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent
Agent
Surface Form Derivation(natural language)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
23
SDS Process ndash Dialogue Management (DM)
User
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo
Din Tai FungBoiling Point
Predicted intent navigation
Intelligent Agent
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
24
SDS Process ndash Dialogue Management (DM)
User
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo
Din Tai FungBoiling Point
Predicted intent navigation
Intelligent Agent
Intent Prediction
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
25
SDS Process ndash Natural Language Generation (NLG)
User
Intelligent Agent
Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there (navigation)
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
26
Required Knowledge
target
foodprice AMODNN
seeking PREP_FOR
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo
Predicted intent navigation
User
Required Domain-Specific Information
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
27
Challenges for SDS An SDS in a new domain requires
1) A hand-crafted domain ontology2) Utterances labelled with semantic representations3) An SLU component for mapping utterances into semantic representations
Manual work results in high cost long duration and poor scalability of system development
The goal is to enable an SDS to 1) automatically infer domain knowledge and then to 2) create the data for SLU modelingin order to handle the open-domain requests
seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquoasian foodrdquo
find a cheap eating place for asian food
fully unsupervised
Prior Focus
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
28
Contributions
target
foodprice AMODNN
seeking PREP_FOR
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquoasian foodrdquo
Predicted intent navigation
find a cheap eating place for taiwanese foodUser
Ontology Induction
Structure Learning
Surface Form Derivation
Semantic Decoding
Intent Prediction
(natural language)
(inter-slot relation)
(semantic slot)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
29
ContributionsUser
Ontology Induction
Structure Learning
Surface Form Derivation
Semantic Decoding
Intent Prediction
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
30
Ontology Induction Structure Learning Surface Form Derivation
Semantic Decoding Intent Prediction
ContributionsUser
Knowledge Acquisition SLU Modeling
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
31
Knowledge Acquisition1) Given unlabelled conversations how can a system automatically
induce and organize domain-specific concepts
Restaurant Asking
Conversations
target
foodprice
seeking
quantity
PREP_FOR
PREP_FOR
NN AMOD
AMODAMOD
Organized Domain Knowledge
Unlabelled Collection
Knowledge Acquisition
Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
32
SLU Modeling2) With the automatically acquired knowledge how can a system
understand utterance semantics and user intents
Organized Domain
Knowledge
price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation
SLU Modeling
SLU Component
ldquocan i have a cheap restaurantrdquo
SLU Modeling Semantic Decoding Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
33
SDS Architecture ndash Contributions
DomainDMASR SLU
NLG
Knowledge Acquisition SLU Modeling
current bottleneck
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
34
SDS Flowchart
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
35
SDS Flowchart ndash Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
36
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
37
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
38
[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing
FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame
ldquolow fatrdquo fills the descriptor frame element
SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated
FrameNet sentences
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
39
Ontology Induction [ASRUrsquo13 SLTrsquo14a]
can i have a cheap restaurant
Frame capability
Frame expensiveness
Frame locale by use
1st Issue differentiate domain-specific frames from generic frames for SDSs
GoodGood
Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014
slot candidate
Best Student Paper Award
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
40
1
Utterance 1i would like a cheap restaurant Train
hellip hellip
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1 Test
1 97 95
Frame Semantic Parsing
show me a list of cheap restaurantsTest Utterance
Word Observation Slot Candidate
Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award
Idea increase weights of domain-specific slots and decrease weights of others
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
41
1st Issue How to adapt generic slots to a domain-specific setting
Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot CandidateTrain
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
Slot Induction
Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication
i like
1 1
capability
1
locale_by_use
food expensiveness
seeking
relational_quantitydesiring
Utterance 1i would like a cheap restaurant
hellip hellip
find a restaurant with chinese foodUtterance 2
show me a list of cheap restaurantsTest Utterance
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
42
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
43
Knowledge Graph Construction Syntactic dependency parsing on utterances
ccomp
amoddobjnsubj det
can i have a cheap restaurantcapability expensiveness locale_by_use
Word-based lexical knowledge graph
Slot-based semantic knowledge graph
restaurantcan
have
i
acheap
w
w
capabilitylocale_by_use expensiveness
s
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
44
Dependency-based word embeddings
Dependency-based slot embeddings
Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)
can = have =
expensiveness = capability =
can i have a cheap restaurant
ccomp
amoddobjnsubj det
have acapability expensiveness locale_by_use
ccomp
amoddobjnsubj det
Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
45
Edge Weight Measurement Compute edge weights to represent relation importance
Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings
+
+
w1
w2
w3
w4
w5
w6
w7
s2
s1 s3
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
46
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
Slot Induction
Knowledge Graph Propagation Model119877119908
119878119863
119877119904119878119863
Structure information is integrated to make the self-training data more reliable
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
47
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance hidden semantics
2nd Issue unobserved semantics may benefit understanding
Semantic Decoding [ACL-IJCNLPrsquo15]
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
48
Reasoning with Matrix Factorization
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
9790 9585
93 929805 05
Slot Induction
Feature Model + Knowledge Graph Propagation Model
119877119908119878119863
119877119904119878119863
Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
49
2nd Issue How to model the unobserved hidden semantics
Matrix Factorization (MF) (Rendle et al 2009)
The decomposed matrices represent latent semantics for utterances and wordsslots respectively
The product of two matrices fills the probability of hidden semantics
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
9790 9585
93 929805 05
|119932|
|119934|+|119930|
asymp|119932|times119941 119941times (|119934|+|119930|)times
Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
50
Bayesian Personalized Ranking for MF Model implicit feedback
not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts
Objective
1
119891 +iquest iquest119891 minus119891 minus
The objective is to learn a set of well-ranked semantic slots per utterance
119906119909
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
51
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance
Matrix Factorization SLU (MF-SLU)
MF-SLU can estimate probabilities for slot candidates given test utterances
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
52
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
53
Experimental Setup Dataset Cambridge University SLU Corpus
Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type
Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots
Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
54
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
Approach ASR TranscriptsBaseline
SLUSupport Vector Machine 325 366
Multinomial Logistic Regression 340 388
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
55
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
The MF-SLU effectively models implicit information to decode semantics
The structure information further improves the results
Approach ASR Transcripts
Baseline SLU
Support Vector Machine 325 366Multinomial Logistic Regression 340 388
Proposed MF-SLU
Feature Model 376 453
Feature Model +Knowledge Graph Propagation
435
(+279)534
(+376)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
56
Experiments of Semantic DecodingEffectiveness of Relations
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
In the integrated structure information both semantic and dependency relations are useful for understanding
Approach ASR Transcripts
Feature Model 376 453
Feature + Knowledge Graph Propagation
Semantic 414 516
Dependency 416 490
All 435 (+157) 534 (+179)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Experiments for Structure LearningRelation Discovery Analysis
Discover inter-slot relations connecting important slot pairs
The reference ontology with the most frequent syntactic dependencies
locale_by_use
food expensiveness
seeking
relational_quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMODdesiring
DOBJ
type
food pricerange
DOBJ
AMOD AMOD
AMOD
taskarea
PREP_IN
The automatically learned domain ontology aligns well with the reference one
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57
The data-driven one is more objective while expert-annotated one is more subjective
58
Contributions of Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge
MF-SLU for Semantic Decoding is able to1) unify the automatically
acquired knowledge2) adapt to a domain-
specific setting 3) and then allows
systems to model implicit semantics for better understanding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
23
SDS Process ndash Dialogue Management (DM)
User
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo
Din Tai FungBoiling Point
Predicted intent navigation
Intelligent Agent
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
24
SDS Process ndash Dialogue Management (DM)
User
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo
Din Tai FungBoiling Point
Predicted intent navigation
Intelligent Agent
Intent Prediction
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
25
SDS Process ndash Natural Language Generation (NLG)
User
Intelligent Agent
Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there (navigation)
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
26
Required Knowledge
target
foodprice AMODNN
seeking PREP_FOR
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo
Predicted intent navigation
User
Required Domain-Specific Information
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
27
Challenges for SDS An SDS in a new domain requires
1) A hand-crafted domain ontology2) Utterances labelled with semantic representations3) An SLU component for mapping utterances into semantic representations
Manual work results in high cost long duration and poor scalability of system development
The goal is to enable an SDS to 1) automatically infer domain knowledge and then to 2) create the data for SLU modelingin order to handle the open-domain requests
seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquoasian foodrdquo
find a cheap eating place for asian food
fully unsupervised
Prior Focus
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
28
Contributions
target
foodprice AMODNN
seeking PREP_FOR
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquoasian foodrdquo
Predicted intent navigation
find a cheap eating place for taiwanese foodUser
Ontology Induction
Structure Learning
Surface Form Derivation
Semantic Decoding
Intent Prediction
(natural language)
(inter-slot relation)
(semantic slot)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
29
ContributionsUser
Ontology Induction
Structure Learning
Surface Form Derivation
Semantic Decoding
Intent Prediction
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
30
Ontology Induction Structure Learning Surface Form Derivation
Semantic Decoding Intent Prediction
ContributionsUser
Knowledge Acquisition SLU Modeling
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
31
Knowledge Acquisition1) Given unlabelled conversations how can a system automatically
induce and organize domain-specific concepts
Restaurant Asking
Conversations
target
foodprice
seeking
quantity
PREP_FOR
PREP_FOR
NN AMOD
AMODAMOD
Organized Domain Knowledge
Unlabelled Collection
Knowledge Acquisition
Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
32
SLU Modeling2) With the automatically acquired knowledge how can a system
understand utterance semantics and user intents
Organized Domain
Knowledge
price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation
SLU Modeling
SLU Component
ldquocan i have a cheap restaurantrdquo
SLU Modeling Semantic Decoding Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
33
SDS Architecture ndash Contributions
DomainDMASR SLU
NLG
Knowledge Acquisition SLU Modeling
current bottleneck
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
34
SDS Flowchart
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
35
SDS Flowchart ndash Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
36
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
37
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
38
[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing
FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame
ldquolow fatrdquo fills the descriptor frame element
SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated
FrameNet sentences
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
39
Ontology Induction [ASRUrsquo13 SLTrsquo14a]
can i have a cheap restaurant
Frame capability
Frame expensiveness
Frame locale by use
1st Issue differentiate domain-specific frames from generic frames for SDSs
GoodGood
Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014
slot candidate
Best Student Paper Award
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
40
1
Utterance 1i would like a cheap restaurant Train
hellip hellip
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1 Test
1 97 95
Frame Semantic Parsing
show me a list of cheap restaurantsTest Utterance
Word Observation Slot Candidate
Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award
Idea increase weights of domain-specific slots and decrease weights of others
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
41
1st Issue How to adapt generic slots to a domain-specific setting
Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot CandidateTrain
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
Slot Induction
Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication
i like
1 1
capability
1
locale_by_use
food expensiveness
seeking
relational_quantitydesiring
Utterance 1i would like a cheap restaurant
hellip hellip
find a restaurant with chinese foodUtterance 2
show me a list of cheap restaurantsTest Utterance
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
42
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
43
Knowledge Graph Construction Syntactic dependency parsing on utterances
ccomp
amoddobjnsubj det
can i have a cheap restaurantcapability expensiveness locale_by_use
Word-based lexical knowledge graph
Slot-based semantic knowledge graph
restaurantcan
have
i
acheap
w
w
capabilitylocale_by_use expensiveness
s
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
44
Dependency-based word embeddings
Dependency-based slot embeddings
Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)
can = have =
expensiveness = capability =
can i have a cheap restaurant
ccomp
amoddobjnsubj det
have acapability expensiveness locale_by_use
ccomp
amoddobjnsubj det
Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
45
Edge Weight Measurement Compute edge weights to represent relation importance
Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings
+
+
w1
w2
w3
w4
w5
w6
w7
s2
s1 s3
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
46
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
Slot Induction
Knowledge Graph Propagation Model119877119908
119878119863
119877119904119878119863
Structure information is integrated to make the self-training data more reliable
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
47
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance hidden semantics
2nd Issue unobserved semantics may benefit understanding
Semantic Decoding [ACL-IJCNLPrsquo15]
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
48
Reasoning with Matrix Factorization
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
9790 9585
93 929805 05
Slot Induction
Feature Model + Knowledge Graph Propagation Model
119877119908119878119863
119877119904119878119863
Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
49
2nd Issue How to model the unobserved hidden semantics
Matrix Factorization (MF) (Rendle et al 2009)
The decomposed matrices represent latent semantics for utterances and wordsslots respectively
The product of two matrices fills the probability of hidden semantics
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
9790 9585
93 929805 05
|119932|
|119934|+|119930|
asymp|119932|times119941 119941times (|119934|+|119930|)times
Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
50
Bayesian Personalized Ranking for MF Model implicit feedback
not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts
Objective
1
119891 +iquest iquest119891 minus119891 minus
The objective is to learn a set of well-ranked semantic slots per utterance
119906119909
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
51
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance
Matrix Factorization SLU (MF-SLU)
MF-SLU can estimate probabilities for slot candidates given test utterances
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
52
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
53
Experimental Setup Dataset Cambridge University SLU Corpus
Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type
Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots
Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
54
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
Approach ASR TranscriptsBaseline
SLUSupport Vector Machine 325 366
Multinomial Logistic Regression 340 388
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
55
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
The MF-SLU effectively models implicit information to decode semantics
The structure information further improves the results
Approach ASR Transcripts
Baseline SLU
Support Vector Machine 325 366Multinomial Logistic Regression 340 388
Proposed MF-SLU
Feature Model 376 453
Feature Model +Knowledge Graph Propagation
435
(+279)534
(+376)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
56
Experiments of Semantic DecodingEffectiveness of Relations
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
In the integrated structure information both semantic and dependency relations are useful for understanding
Approach ASR Transcripts
Feature Model 376 453
Feature + Knowledge Graph Propagation
Semantic 414 516
Dependency 416 490
All 435 (+157) 534 (+179)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Experiments for Structure LearningRelation Discovery Analysis
Discover inter-slot relations connecting important slot pairs
The reference ontology with the most frequent syntactic dependencies
locale_by_use
food expensiveness
seeking
relational_quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMODdesiring
DOBJ
type
food pricerange
DOBJ
AMOD AMOD
AMOD
taskarea
PREP_IN
The automatically learned domain ontology aligns well with the reference one
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57
The data-driven one is more objective while expert-annotated one is more subjective
58
Contributions of Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge
MF-SLU for Semantic Decoding is able to1) unify the automatically
acquired knowledge2) adapt to a domain-
specific setting 3) and then allows
systems to model implicit semantics for better understanding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
24
SDS Process ndash Dialogue Management (DM)
User
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo
Din Tai FungBoiling Point
Predicted intent navigation
Intelligent Agent
Intent Prediction
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
25
SDS Process ndash Natural Language Generation (NLG)
User
Intelligent Agent
Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there (navigation)
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
26
Required Knowledge
target
foodprice AMODNN
seeking PREP_FOR
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo
Predicted intent navigation
User
Required Domain-Specific Information
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
27
Challenges for SDS An SDS in a new domain requires
1) A hand-crafted domain ontology2) Utterances labelled with semantic representations3) An SLU component for mapping utterances into semantic representations
Manual work results in high cost long duration and poor scalability of system development
The goal is to enable an SDS to 1) automatically infer domain knowledge and then to 2) create the data for SLU modelingin order to handle the open-domain requests
seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquoasian foodrdquo
find a cheap eating place for asian food
fully unsupervised
Prior Focus
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
28
Contributions
target
foodprice AMODNN
seeking PREP_FOR
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquoasian foodrdquo
Predicted intent navigation
find a cheap eating place for taiwanese foodUser
Ontology Induction
Structure Learning
Surface Form Derivation
Semantic Decoding
Intent Prediction
(natural language)
(inter-slot relation)
(semantic slot)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
29
ContributionsUser
Ontology Induction
Structure Learning
Surface Form Derivation
Semantic Decoding
Intent Prediction
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
30
Ontology Induction Structure Learning Surface Form Derivation
Semantic Decoding Intent Prediction
ContributionsUser
Knowledge Acquisition SLU Modeling
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
31
Knowledge Acquisition1) Given unlabelled conversations how can a system automatically
induce and organize domain-specific concepts
Restaurant Asking
Conversations
target
foodprice
seeking
quantity
PREP_FOR
PREP_FOR
NN AMOD
AMODAMOD
Organized Domain Knowledge
Unlabelled Collection
Knowledge Acquisition
Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
32
SLU Modeling2) With the automatically acquired knowledge how can a system
understand utterance semantics and user intents
Organized Domain
Knowledge
price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation
SLU Modeling
SLU Component
ldquocan i have a cheap restaurantrdquo
SLU Modeling Semantic Decoding Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
33
SDS Architecture ndash Contributions
DomainDMASR SLU
NLG
Knowledge Acquisition SLU Modeling
current bottleneck
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
34
SDS Flowchart
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
35
SDS Flowchart ndash Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
36
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
37
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
38
[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing
FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame
ldquolow fatrdquo fills the descriptor frame element
SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated
FrameNet sentences
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
39
Ontology Induction [ASRUrsquo13 SLTrsquo14a]
can i have a cheap restaurant
Frame capability
Frame expensiveness
Frame locale by use
1st Issue differentiate domain-specific frames from generic frames for SDSs
GoodGood
Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014
slot candidate
Best Student Paper Award
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
40
1
Utterance 1i would like a cheap restaurant Train
hellip hellip
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1 Test
1 97 95
Frame Semantic Parsing
show me a list of cheap restaurantsTest Utterance
Word Observation Slot Candidate
Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award
Idea increase weights of domain-specific slots and decrease weights of others
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
41
1st Issue How to adapt generic slots to a domain-specific setting
Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot CandidateTrain
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
Slot Induction
Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication
i like
1 1
capability
1
locale_by_use
food expensiveness
seeking
relational_quantitydesiring
Utterance 1i would like a cheap restaurant
hellip hellip
find a restaurant with chinese foodUtterance 2
show me a list of cheap restaurantsTest Utterance
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
42
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
43
Knowledge Graph Construction Syntactic dependency parsing on utterances
ccomp
amoddobjnsubj det
can i have a cheap restaurantcapability expensiveness locale_by_use
Word-based lexical knowledge graph
Slot-based semantic knowledge graph
restaurantcan
have
i
acheap
w
w
capabilitylocale_by_use expensiveness
s
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
44
Dependency-based word embeddings
Dependency-based slot embeddings
Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)
can = have =
expensiveness = capability =
can i have a cheap restaurant
ccomp
amoddobjnsubj det
have acapability expensiveness locale_by_use
ccomp
amoddobjnsubj det
Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
45
Edge Weight Measurement Compute edge weights to represent relation importance
Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings
+
+
w1
w2
w3
w4
w5
w6
w7
s2
s1 s3
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
46
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
Slot Induction
Knowledge Graph Propagation Model119877119908
119878119863
119877119904119878119863
Structure information is integrated to make the self-training data more reliable
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
47
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance hidden semantics
2nd Issue unobserved semantics may benefit understanding
Semantic Decoding [ACL-IJCNLPrsquo15]
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
48
Reasoning with Matrix Factorization
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
9790 9585
93 929805 05
Slot Induction
Feature Model + Knowledge Graph Propagation Model
119877119908119878119863
119877119904119878119863
Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
49
2nd Issue How to model the unobserved hidden semantics
Matrix Factorization (MF) (Rendle et al 2009)
The decomposed matrices represent latent semantics for utterances and wordsslots respectively
The product of two matrices fills the probability of hidden semantics
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
9790 9585
93 929805 05
|119932|
|119934|+|119930|
asymp|119932|times119941 119941times (|119934|+|119930|)times
Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
50
Bayesian Personalized Ranking for MF Model implicit feedback
not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts
Objective
1
119891 +iquest iquest119891 minus119891 minus
The objective is to learn a set of well-ranked semantic slots per utterance
119906119909
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
51
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance
Matrix Factorization SLU (MF-SLU)
MF-SLU can estimate probabilities for slot candidates given test utterances
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
52
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
53
Experimental Setup Dataset Cambridge University SLU Corpus
Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type
Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots
Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
54
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
Approach ASR TranscriptsBaseline
SLUSupport Vector Machine 325 366
Multinomial Logistic Regression 340 388
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
55
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
The MF-SLU effectively models implicit information to decode semantics
The structure information further improves the results
Approach ASR Transcripts
Baseline SLU
Support Vector Machine 325 366Multinomial Logistic Regression 340 388
Proposed MF-SLU
Feature Model 376 453
Feature Model +Knowledge Graph Propagation
435
(+279)534
(+376)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
56
Experiments of Semantic DecodingEffectiveness of Relations
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
In the integrated structure information both semantic and dependency relations are useful for understanding
Approach ASR Transcripts
Feature Model 376 453
Feature + Knowledge Graph Propagation
Semantic 414 516
Dependency 416 490
All 435 (+157) 534 (+179)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Experiments for Structure LearningRelation Discovery Analysis
Discover inter-slot relations connecting important slot pairs
The reference ontology with the most frequent syntactic dependencies
locale_by_use
food expensiveness
seeking
relational_quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMODdesiring
DOBJ
type
food pricerange
DOBJ
AMOD AMOD
AMOD
taskarea
PREP_IN
The automatically learned domain ontology aligns well with the reference one
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57
The data-driven one is more objective while expert-annotated one is more subjective
58
Contributions of Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge
MF-SLU for Semantic Decoding is able to1) unify the automatically
acquired knowledge2) adapt to a domain-
specific setting 3) and then allows
systems to model implicit semantics for better understanding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
25
SDS Process ndash Natural Language Generation (NLG)
User
Intelligent Agent
Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there (navigation)
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
26
Required Knowledge
target
foodprice AMODNN
seeking PREP_FOR
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo
Predicted intent navigation
User
Required Domain-Specific Information
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
27
Challenges for SDS An SDS in a new domain requires
1) A hand-crafted domain ontology2) Utterances labelled with semantic representations3) An SLU component for mapping utterances into semantic representations
Manual work results in high cost long duration and poor scalability of system development
The goal is to enable an SDS to 1) automatically infer domain knowledge and then to 2) create the data for SLU modelingin order to handle the open-domain requests
seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquoasian foodrdquo
find a cheap eating place for asian food
fully unsupervised
Prior Focus
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
28
Contributions
target
foodprice AMODNN
seeking PREP_FOR
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquoasian foodrdquo
Predicted intent navigation
find a cheap eating place for taiwanese foodUser
Ontology Induction
Structure Learning
Surface Form Derivation
Semantic Decoding
Intent Prediction
(natural language)
(inter-slot relation)
(semantic slot)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
29
ContributionsUser
Ontology Induction
Structure Learning
Surface Form Derivation
Semantic Decoding
Intent Prediction
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
30
Ontology Induction Structure Learning Surface Form Derivation
Semantic Decoding Intent Prediction
ContributionsUser
Knowledge Acquisition SLU Modeling
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
31
Knowledge Acquisition1) Given unlabelled conversations how can a system automatically
induce and organize domain-specific concepts
Restaurant Asking
Conversations
target
foodprice
seeking
quantity
PREP_FOR
PREP_FOR
NN AMOD
AMODAMOD
Organized Domain Knowledge
Unlabelled Collection
Knowledge Acquisition
Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
32
SLU Modeling2) With the automatically acquired knowledge how can a system
understand utterance semantics and user intents
Organized Domain
Knowledge
price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation
SLU Modeling
SLU Component
ldquocan i have a cheap restaurantrdquo
SLU Modeling Semantic Decoding Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
33
SDS Architecture ndash Contributions
DomainDMASR SLU
NLG
Knowledge Acquisition SLU Modeling
current bottleneck
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
34
SDS Flowchart
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
35
SDS Flowchart ndash Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
36
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
37
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
38
[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing
FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame
ldquolow fatrdquo fills the descriptor frame element
SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated
FrameNet sentences
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
39
Ontology Induction [ASRUrsquo13 SLTrsquo14a]
can i have a cheap restaurant
Frame capability
Frame expensiveness
Frame locale by use
1st Issue differentiate domain-specific frames from generic frames for SDSs
GoodGood
Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014
slot candidate
Best Student Paper Award
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
40
1
Utterance 1i would like a cheap restaurant Train
hellip hellip
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1 Test
1 97 95
Frame Semantic Parsing
show me a list of cheap restaurantsTest Utterance
Word Observation Slot Candidate
Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award
Idea increase weights of domain-specific slots and decrease weights of others
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
41
1st Issue How to adapt generic slots to a domain-specific setting
Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot CandidateTrain
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
Slot Induction
Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication
i like
1 1
capability
1
locale_by_use
food expensiveness
seeking
relational_quantitydesiring
Utterance 1i would like a cheap restaurant
hellip hellip
find a restaurant with chinese foodUtterance 2
show me a list of cheap restaurantsTest Utterance
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
42
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
43
Knowledge Graph Construction Syntactic dependency parsing on utterances
ccomp
amoddobjnsubj det
can i have a cheap restaurantcapability expensiveness locale_by_use
Word-based lexical knowledge graph
Slot-based semantic knowledge graph
restaurantcan
have
i
acheap
w
w
capabilitylocale_by_use expensiveness
s
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
44
Dependency-based word embeddings
Dependency-based slot embeddings
Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)
can = have =
expensiveness = capability =
can i have a cheap restaurant
ccomp
amoddobjnsubj det
have acapability expensiveness locale_by_use
ccomp
amoddobjnsubj det
Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
45
Edge Weight Measurement Compute edge weights to represent relation importance
Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings
+
+
w1
w2
w3
w4
w5
w6
w7
s2
s1 s3
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
46
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
Slot Induction
Knowledge Graph Propagation Model119877119908
119878119863
119877119904119878119863
Structure information is integrated to make the self-training data more reliable
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
47
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance hidden semantics
2nd Issue unobserved semantics may benefit understanding
Semantic Decoding [ACL-IJCNLPrsquo15]
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
48
Reasoning with Matrix Factorization
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
9790 9585
93 929805 05
Slot Induction
Feature Model + Knowledge Graph Propagation Model
119877119908119878119863
119877119904119878119863
Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
49
2nd Issue How to model the unobserved hidden semantics
Matrix Factorization (MF) (Rendle et al 2009)
The decomposed matrices represent latent semantics for utterances and wordsslots respectively
The product of two matrices fills the probability of hidden semantics
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
9790 9585
93 929805 05
|119932|
|119934|+|119930|
asymp|119932|times119941 119941times (|119934|+|119930|)times
Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
50
Bayesian Personalized Ranking for MF Model implicit feedback
not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts
Objective
1
119891 +iquest iquest119891 minus119891 minus
The objective is to learn a set of well-ranked semantic slots per utterance
119906119909
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
51
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance
Matrix Factorization SLU (MF-SLU)
MF-SLU can estimate probabilities for slot candidates given test utterances
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
52
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
53
Experimental Setup Dataset Cambridge University SLU Corpus
Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type
Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots
Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
54
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
Approach ASR TranscriptsBaseline
SLUSupport Vector Machine 325 366
Multinomial Logistic Regression 340 388
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
55
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
The MF-SLU effectively models implicit information to decode semantics
The structure information further improves the results
Approach ASR Transcripts
Baseline SLU
Support Vector Machine 325 366Multinomial Logistic Regression 340 388
Proposed MF-SLU
Feature Model 376 453
Feature Model +Knowledge Graph Propagation
435
(+279)534
(+376)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
56
Experiments of Semantic DecodingEffectiveness of Relations
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
In the integrated structure information both semantic and dependency relations are useful for understanding
Approach ASR Transcripts
Feature Model 376 453
Feature + Knowledge Graph Propagation
Semantic 414 516
Dependency 416 490
All 435 (+157) 534 (+179)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Experiments for Structure LearningRelation Discovery Analysis
Discover inter-slot relations connecting important slot pairs
The reference ontology with the most frequent syntactic dependencies
locale_by_use
food expensiveness
seeking
relational_quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMODdesiring
DOBJ
type
food pricerange
DOBJ
AMOD AMOD
AMOD
taskarea
PREP_IN
The automatically learned domain ontology aligns well with the reference one
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57
The data-driven one is more objective while expert-annotated one is more subjective
58
Contributions of Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge
MF-SLU for Semantic Decoding is able to1) unify the automatically
acquired knowledge2) adapt to a domain-
specific setting 3) and then allows
systems to model implicit semantics for better understanding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
26
Required Knowledge
target
foodprice AMODNN
seeking PREP_FOR
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo
Predicted intent navigation
User
Required Domain-Specific Information
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
27
Challenges for SDS An SDS in a new domain requires
1) A hand-crafted domain ontology2) Utterances labelled with semantic representations3) An SLU component for mapping utterances into semantic representations
Manual work results in high cost long duration and poor scalability of system development
The goal is to enable an SDS to 1) automatically infer domain knowledge and then to 2) create the data for SLU modelingin order to handle the open-domain requests
seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquoasian foodrdquo
find a cheap eating place for asian food
fully unsupervised
Prior Focus
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
28
Contributions
target
foodprice AMODNN
seeking PREP_FOR
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquoasian foodrdquo
Predicted intent navigation
find a cheap eating place for taiwanese foodUser
Ontology Induction
Structure Learning
Surface Form Derivation
Semantic Decoding
Intent Prediction
(natural language)
(inter-slot relation)
(semantic slot)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
29
ContributionsUser
Ontology Induction
Structure Learning
Surface Form Derivation
Semantic Decoding
Intent Prediction
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
30
Ontology Induction Structure Learning Surface Form Derivation
Semantic Decoding Intent Prediction
ContributionsUser
Knowledge Acquisition SLU Modeling
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
31
Knowledge Acquisition1) Given unlabelled conversations how can a system automatically
induce and organize domain-specific concepts
Restaurant Asking
Conversations
target
foodprice
seeking
quantity
PREP_FOR
PREP_FOR
NN AMOD
AMODAMOD
Organized Domain Knowledge
Unlabelled Collection
Knowledge Acquisition
Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
32
SLU Modeling2) With the automatically acquired knowledge how can a system
understand utterance semantics and user intents
Organized Domain
Knowledge
price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation
SLU Modeling
SLU Component
ldquocan i have a cheap restaurantrdquo
SLU Modeling Semantic Decoding Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
33
SDS Architecture ndash Contributions
DomainDMASR SLU
NLG
Knowledge Acquisition SLU Modeling
current bottleneck
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
34
SDS Flowchart
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
35
SDS Flowchart ndash Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
36
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
37
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
38
[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing
FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame
ldquolow fatrdquo fills the descriptor frame element
SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated
FrameNet sentences
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
39
Ontology Induction [ASRUrsquo13 SLTrsquo14a]
can i have a cheap restaurant
Frame capability
Frame expensiveness
Frame locale by use
1st Issue differentiate domain-specific frames from generic frames for SDSs
GoodGood
Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014
slot candidate
Best Student Paper Award
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
40
1
Utterance 1i would like a cheap restaurant Train
hellip hellip
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1 Test
1 97 95
Frame Semantic Parsing
show me a list of cheap restaurantsTest Utterance
Word Observation Slot Candidate
Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award
Idea increase weights of domain-specific slots and decrease weights of others
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
41
1st Issue How to adapt generic slots to a domain-specific setting
Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot CandidateTrain
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
Slot Induction
Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication
i like
1 1
capability
1
locale_by_use
food expensiveness
seeking
relational_quantitydesiring
Utterance 1i would like a cheap restaurant
hellip hellip
find a restaurant with chinese foodUtterance 2
show me a list of cheap restaurantsTest Utterance
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
42
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
43
Knowledge Graph Construction Syntactic dependency parsing on utterances
ccomp
amoddobjnsubj det
can i have a cheap restaurantcapability expensiveness locale_by_use
Word-based lexical knowledge graph
Slot-based semantic knowledge graph
restaurantcan
have
i
acheap
w
w
capabilitylocale_by_use expensiveness
s
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
44
Dependency-based word embeddings
Dependency-based slot embeddings
Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)
can = have =
expensiveness = capability =
can i have a cheap restaurant
ccomp
amoddobjnsubj det
have acapability expensiveness locale_by_use
ccomp
amoddobjnsubj det
Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
45
Edge Weight Measurement Compute edge weights to represent relation importance
Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings
+
+
w1
w2
w3
w4
w5
w6
w7
s2
s1 s3
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
46
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
Slot Induction
Knowledge Graph Propagation Model119877119908
119878119863
119877119904119878119863
Structure information is integrated to make the self-training data more reliable
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
47
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance hidden semantics
2nd Issue unobserved semantics may benefit understanding
Semantic Decoding [ACL-IJCNLPrsquo15]
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
48
Reasoning with Matrix Factorization
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
9790 9585
93 929805 05
Slot Induction
Feature Model + Knowledge Graph Propagation Model
119877119908119878119863
119877119904119878119863
Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
49
2nd Issue How to model the unobserved hidden semantics
Matrix Factorization (MF) (Rendle et al 2009)
The decomposed matrices represent latent semantics for utterances and wordsslots respectively
The product of two matrices fills the probability of hidden semantics
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
9790 9585
93 929805 05
|119932|
|119934|+|119930|
asymp|119932|times119941 119941times (|119934|+|119930|)times
Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
50
Bayesian Personalized Ranking for MF Model implicit feedback
not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts
Objective
1
119891 +iquest iquest119891 minus119891 minus
The objective is to learn a set of well-ranked semantic slots per utterance
119906119909
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
51
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance
Matrix Factorization SLU (MF-SLU)
MF-SLU can estimate probabilities for slot candidates given test utterances
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
52
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
53
Experimental Setup Dataset Cambridge University SLU Corpus
Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type
Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots
Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
54
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
Approach ASR TranscriptsBaseline
SLUSupport Vector Machine 325 366
Multinomial Logistic Regression 340 388
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
55
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
The MF-SLU effectively models implicit information to decode semantics
The structure information further improves the results
Approach ASR Transcripts
Baseline SLU
Support Vector Machine 325 366Multinomial Logistic Regression 340 388
Proposed MF-SLU
Feature Model 376 453
Feature Model +Knowledge Graph Propagation
435
(+279)534
(+376)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
56
Experiments of Semantic DecodingEffectiveness of Relations
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
In the integrated structure information both semantic and dependency relations are useful for understanding
Approach ASR Transcripts
Feature Model 376 453
Feature + Knowledge Graph Propagation
Semantic 414 516
Dependency 416 490
All 435 (+157) 534 (+179)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Experiments for Structure LearningRelation Discovery Analysis
Discover inter-slot relations connecting important slot pairs
The reference ontology with the most frequent syntactic dependencies
locale_by_use
food expensiveness
seeking
relational_quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMODdesiring
DOBJ
type
food pricerange
DOBJ
AMOD AMOD
AMOD
taskarea
PREP_IN
The automatically learned domain ontology aligns well with the reference one
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57
The data-driven one is more objective while expert-annotated one is more subjective
58
Contributions of Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge
MF-SLU for Semantic Decoding is able to1) unify the automatically
acquired knowledge2) adapt to a domain-
specific setting 3) and then allows
systems to model implicit semantics for better understanding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
27
Challenges for SDS An SDS in a new domain requires
1) A hand-crafted domain ontology2) Utterances labelled with semantic representations3) An SLU component for mapping utterances into semantic representations
Manual work results in high cost long duration and poor scalability of system development
The goal is to enable an SDS to 1) automatically infer domain knowledge and then to 2) create the data for SLU modelingin order to handle the open-domain requests
seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquoasian foodrdquo
find a cheap eating place for asian food
fully unsupervised
Prior Focus
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
28
Contributions
target
foodprice AMODNN
seeking PREP_FOR
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquoasian foodrdquo
Predicted intent navigation
find a cheap eating place for taiwanese foodUser
Ontology Induction
Structure Learning
Surface Form Derivation
Semantic Decoding
Intent Prediction
(natural language)
(inter-slot relation)
(semantic slot)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
29
ContributionsUser
Ontology Induction
Structure Learning
Surface Form Derivation
Semantic Decoding
Intent Prediction
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
30
Ontology Induction Structure Learning Surface Form Derivation
Semantic Decoding Intent Prediction
ContributionsUser
Knowledge Acquisition SLU Modeling
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
31
Knowledge Acquisition1) Given unlabelled conversations how can a system automatically
induce and organize domain-specific concepts
Restaurant Asking
Conversations
target
foodprice
seeking
quantity
PREP_FOR
PREP_FOR
NN AMOD
AMODAMOD
Organized Domain Knowledge
Unlabelled Collection
Knowledge Acquisition
Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
32
SLU Modeling2) With the automatically acquired knowledge how can a system
understand utterance semantics and user intents
Organized Domain
Knowledge
price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation
SLU Modeling
SLU Component
ldquocan i have a cheap restaurantrdquo
SLU Modeling Semantic Decoding Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
33
SDS Architecture ndash Contributions
DomainDMASR SLU
NLG
Knowledge Acquisition SLU Modeling
current bottleneck
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
34
SDS Flowchart
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
35
SDS Flowchart ndash Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
36
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
37
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
38
[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing
FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame
ldquolow fatrdquo fills the descriptor frame element
SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated
FrameNet sentences
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
39
Ontology Induction [ASRUrsquo13 SLTrsquo14a]
can i have a cheap restaurant
Frame capability
Frame expensiveness
Frame locale by use
1st Issue differentiate domain-specific frames from generic frames for SDSs
GoodGood
Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014
slot candidate
Best Student Paper Award
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
40
1
Utterance 1i would like a cheap restaurant Train
hellip hellip
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1 Test
1 97 95
Frame Semantic Parsing
show me a list of cheap restaurantsTest Utterance
Word Observation Slot Candidate
Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award
Idea increase weights of domain-specific slots and decrease weights of others
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
41
1st Issue How to adapt generic slots to a domain-specific setting
Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot CandidateTrain
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
Slot Induction
Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication
i like
1 1
capability
1
locale_by_use
food expensiveness
seeking
relational_quantitydesiring
Utterance 1i would like a cheap restaurant
hellip hellip
find a restaurant with chinese foodUtterance 2
show me a list of cheap restaurantsTest Utterance
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
42
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
43
Knowledge Graph Construction Syntactic dependency parsing on utterances
ccomp
amoddobjnsubj det
can i have a cheap restaurantcapability expensiveness locale_by_use
Word-based lexical knowledge graph
Slot-based semantic knowledge graph
restaurantcan
have
i
acheap
w
w
capabilitylocale_by_use expensiveness
s
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
44
Dependency-based word embeddings
Dependency-based slot embeddings
Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)
can = have =
expensiveness = capability =
can i have a cheap restaurant
ccomp
amoddobjnsubj det
have acapability expensiveness locale_by_use
ccomp
amoddobjnsubj det
Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
45
Edge Weight Measurement Compute edge weights to represent relation importance
Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings
+
+
w1
w2
w3
w4
w5
w6
w7
s2
s1 s3
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
46
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
Slot Induction
Knowledge Graph Propagation Model119877119908
119878119863
119877119904119878119863
Structure information is integrated to make the self-training data more reliable
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
47
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance hidden semantics
2nd Issue unobserved semantics may benefit understanding
Semantic Decoding [ACL-IJCNLPrsquo15]
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
48
Reasoning with Matrix Factorization
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
9790 9585
93 929805 05
Slot Induction
Feature Model + Knowledge Graph Propagation Model
119877119908119878119863
119877119904119878119863
Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
49
2nd Issue How to model the unobserved hidden semantics
Matrix Factorization (MF) (Rendle et al 2009)
The decomposed matrices represent latent semantics for utterances and wordsslots respectively
The product of two matrices fills the probability of hidden semantics
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
9790 9585
93 929805 05
|119932|
|119934|+|119930|
asymp|119932|times119941 119941times (|119934|+|119930|)times
Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
50
Bayesian Personalized Ranking for MF Model implicit feedback
not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts
Objective
1
119891 +iquest iquest119891 minus119891 minus
The objective is to learn a set of well-ranked semantic slots per utterance
119906119909
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
51
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance
Matrix Factorization SLU (MF-SLU)
MF-SLU can estimate probabilities for slot candidates given test utterances
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
52
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
53
Experimental Setup Dataset Cambridge University SLU Corpus
Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type
Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots
Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
54
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
Approach ASR TranscriptsBaseline
SLUSupport Vector Machine 325 366
Multinomial Logistic Regression 340 388
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
55
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
The MF-SLU effectively models implicit information to decode semantics
The structure information further improves the results
Approach ASR Transcripts
Baseline SLU
Support Vector Machine 325 366Multinomial Logistic Regression 340 388
Proposed MF-SLU
Feature Model 376 453
Feature Model +Knowledge Graph Propagation
435
(+279)534
(+376)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
56
Experiments of Semantic DecodingEffectiveness of Relations
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
In the integrated structure information both semantic and dependency relations are useful for understanding
Approach ASR Transcripts
Feature Model 376 453
Feature + Knowledge Graph Propagation
Semantic 414 516
Dependency 416 490
All 435 (+157) 534 (+179)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Experiments for Structure LearningRelation Discovery Analysis
Discover inter-slot relations connecting important slot pairs
The reference ontology with the most frequent syntactic dependencies
locale_by_use
food expensiveness
seeking
relational_quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMODdesiring
DOBJ
type
food pricerange
DOBJ
AMOD AMOD
AMOD
taskarea
PREP_IN
The automatically learned domain ontology aligns well with the reference one
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57
The data-driven one is more objective while expert-annotated one is more subjective
58
Contributions of Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge
MF-SLU for Semantic Decoding is able to1) unify the automatically
acquired knowledge2) adapt to a domain-
specific setting 3) and then allows
systems to model implicit semantics for better understanding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
28
Contributions
target
foodprice AMODNN
seeking PREP_FOR
SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquoasian foodrdquo
Predicted intent navigation
find a cheap eating place for taiwanese foodUser
Ontology Induction
Structure Learning
Surface Form Derivation
Semantic Decoding
Intent Prediction
(natural language)
(inter-slot relation)
(semantic slot)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
29
ContributionsUser
Ontology Induction
Structure Learning
Surface Form Derivation
Semantic Decoding
Intent Prediction
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
30
Ontology Induction Structure Learning Surface Form Derivation
Semantic Decoding Intent Prediction
ContributionsUser
Knowledge Acquisition SLU Modeling
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
31
Knowledge Acquisition1) Given unlabelled conversations how can a system automatically
induce and organize domain-specific concepts
Restaurant Asking
Conversations
target
foodprice
seeking
quantity
PREP_FOR
PREP_FOR
NN AMOD
AMODAMOD
Organized Domain Knowledge
Unlabelled Collection
Knowledge Acquisition
Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
32
SLU Modeling2) With the automatically acquired knowledge how can a system
understand utterance semantics and user intents
Organized Domain
Knowledge
price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation
SLU Modeling
SLU Component
ldquocan i have a cheap restaurantrdquo
SLU Modeling Semantic Decoding Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
33
SDS Architecture ndash Contributions
DomainDMASR SLU
NLG
Knowledge Acquisition SLU Modeling
current bottleneck
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
34
SDS Flowchart
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
35
SDS Flowchart ndash Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
36
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
37
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
38
[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing
FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame
ldquolow fatrdquo fills the descriptor frame element
SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated
FrameNet sentences
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
39
Ontology Induction [ASRUrsquo13 SLTrsquo14a]
can i have a cheap restaurant
Frame capability
Frame expensiveness
Frame locale by use
1st Issue differentiate domain-specific frames from generic frames for SDSs
GoodGood
Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014
slot candidate
Best Student Paper Award
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
40
1
Utterance 1i would like a cheap restaurant Train
hellip hellip
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1 Test
1 97 95
Frame Semantic Parsing
show me a list of cheap restaurantsTest Utterance
Word Observation Slot Candidate
Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award
Idea increase weights of domain-specific slots and decrease weights of others
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
41
1st Issue How to adapt generic slots to a domain-specific setting
Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot CandidateTrain
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
Slot Induction
Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication
i like
1 1
capability
1
locale_by_use
food expensiveness
seeking
relational_quantitydesiring
Utterance 1i would like a cheap restaurant
hellip hellip
find a restaurant with chinese foodUtterance 2
show me a list of cheap restaurantsTest Utterance
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
42
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
43
Knowledge Graph Construction Syntactic dependency parsing on utterances
ccomp
amoddobjnsubj det
can i have a cheap restaurantcapability expensiveness locale_by_use
Word-based lexical knowledge graph
Slot-based semantic knowledge graph
restaurantcan
have
i
acheap
w
w
capabilitylocale_by_use expensiveness
s
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
44
Dependency-based word embeddings
Dependency-based slot embeddings
Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)
can = have =
expensiveness = capability =
can i have a cheap restaurant
ccomp
amoddobjnsubj det
have acapability expensiveness locale_by_use
ccomp
amoddobjnsubj det
Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
45
Edge Weight Measurement Compute edge weights to represent relation importance
Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings
+
+
w1
w2
w3
w4
w5
w6
w7
s2
s1 s3
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
46
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
Slot Induction
Knowledge Graph Propagation Model119877119908
119878119863
119877119904119878119863
Structure information is integrated to make the self-training data more reliable
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
47
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance hidden semantics
2nd Issue unobserved semantics may benefit understanding
Semantic Decoding [ACL-IJCNLPrsquo15]
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
48
Reasoning with Matrix Factorization
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
9790 9585
93 929805 05
Slot Induction
Feature Model + Knowledge Graph Propagation Model
119877119908119878119863
119877119904119878119863
Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
49
2nd Issue How to model the unobserved hidden semantics
Matrix Factorization (MF) (Rendle et al 2009)
The decomposed matrices represent latent semantics for utterances and wordsslots respectively
The product of two matrices fills the probability of hidden semantics
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
9790 9585
93 929805 05
|119932|
|119934|+|119930|
asymp|119932|times119941 119941times (|119934|+|119930|)times
Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
50
Bayesian Personalized Ranking for MF Model implicit feedback
not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts
Objective
1
119891 +iquest iquest119891 minus119891 minus
The objective is to learn a set of well-ranked semantic slots per utterance
119906119909
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
51
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance
Matrix Factorization SLU (MF-SLU)
MF-SLU can estimate probabilities for slot candidates given test utterances
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
52
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
53
Experimental Setup Dataset Cambridge University SLU Corpus
Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type
Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots
Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
54
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
Approach ASR TranscriptsBaseline
SLUSupport Vector Machine 325 366
Multinomial Logistic Regression 340 388
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
55
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
The MF-SLU effectively models implicit information to decode semantics
The structure information further improves the results
Approach ASR Transcripts
Baseline SLU
Support Vector Machine 325 366Multinomial Logistic Regression 340 388
Proposed MF-SLU
Feature Model 376 453
Feature Model +Knowledge Graph Propagation
435
(+279)534
(+376)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
56
Experiments of Semantic DecodingEffectiveness of Relations
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
In the integrated structure information both semantic and dependency relations are useful for understanding
Approach ASR Transcripts
Feature Model 376 453
Feature + Knowledge Graph Propagation
Semantic 414 516
Dependency 416 490
All 435 (+157) 534 (+179)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Experiments for Structure LearningRelation Discovery Analysis
Discover inter-slot relations connecting important slot pairs
The reference ontology with the most frequent syntactic dependencies
locale_by_use
food expensiveness
seeking
relational_quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMODdesiring
DOBJ
type
food pricerange
DOBJ
AMOD AMOD
AMOD
taskarea
PREP_IN
The automatically learned domain ontology aligns well with the reference one
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57
The data-driven one is more objective while expert-annotated one is more subjective
58
Contributions of Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge
MF-SLU for Semantic Decoding is able to1) unify the automatically
acquired knowledge2) adapt to a domain-
specific setting 3) and then allows
systems to model implicit semantics for better understanding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
29
ContributionsUser
Ontology Induction
Structure Learning
Surface Form Derivation
Semantic Decoding
Intent Prediction
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
30
Ontology Induction Structure Learning Surface Form Derivation
Semantic Decoding Intent Prediction
ContributionsUser
Knowledge Acquisition SLU Modeling
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
31
Knowledge Acquisition1) Given unlabelled conversations how can a system automatically
induce and organize domain-specific concepts
Restaurant Asking
Conversations
target
foodprice
seeking
quantity
PREP_FOR
PREP_FOR
NN AMOD
AMODAMOD
Organized Domain Knowledge
Unlabelled Collection
Knowledge Acquisition
Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
32
SLU Modeling2) With the automatically acquired knowledge how can a system
understand utterance semantics and user intents
Organized Domain
Knowledge
price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation
SLU Modeling
SLU Component
ldquocan i have a cheap restaurantrdquo
SLU Modeling Semantic Decoding Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
33
SDS Architecture ndash Contributions
DomainDMASR SLU
NLG
Knowledge Acquisition SLU Modeling
current bottleneck
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
34
SDS Flowchart
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
35
SDS Flowchart ndash Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
36
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
37
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
38
[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing
FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame
ldquolow fatrdquo fills the descriptor frame element
SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated
FrameNet sentences
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
39
Ontology Induction [ASRUrsquo13 SLTrsquo14a]
can i have a cheap restaurant
Frame capability
Frame expensiveness
Frame locale by use
1st Issue differentiate domain-specific frames from generic frames for SDSs
GoodGood
Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014
slot candidate
Best Student Paper Award
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
40
1
Utterance 1i would like a cheap restaurant Train
hellip hellip
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1 Test
1 97 95
Frame Semantic Parsing
show me a list of cheap restaurantsTest Utterance
Word Observation Slot Candidate
Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award
Idea increase weights of domain-specific slots and decrease weights of others
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
41
1st Issue How to adapt generic slots to a domain-specific setting
Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot CandidateTrain
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
Slot Induction
Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication
i like
1 1
capability
1
locale_by_use
food expensiveness
seeking
relational_quantitydesiring
Utterance 1i would like a cheap restaurant
hellip hellip
find a restaurant with chinese foodUtterance 2
show me a list of cheap restaurantsTest Utterance
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
42
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
43
Knowledge Graph Construction Syntactic dependency parsing on utterances
ccomp
amoddobjnsubj det
can i have a cheap restaurantcapability expensiveness locale_by_use
Word-based lexical knowledge graph
Slot-based semantic knowledge graph
restaurantcan
have
i
acheap
w
w
capabilitylocale_by_use expensiveness
s
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
44
Dependency-based word embeddings
Dependency-based slot embeddings
Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)
can = have =
expensiveness = capability =
can i have a cheap restaurant
ccomp
amoddobjnsubj det
have acapability expensiveness locale_by_use
ccomp
amoddobjnsubj det
Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
45
Edge Weight Measurement Compute edge weights to represent relation importance
Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings
+
+
w1
w2
w3
w4
w5
w6
w7
s2
s1 s3
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
46
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
Slot Induction
Knowledge Graph Propagation Model119877119908
119878119863
119877119904119878119863
Structure information is integrated to make the self-training data more reliable
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
47
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance hidden semantics
2nd Issue unobserved semantics may benefit understanding
Semantic Decoding [ACL-IJCNLPrsquo15]
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
48
Reasoning with Matrix Factorization
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
9790 9585
93 929805 05
Slot Induction
Feature Model + Knowledge Graph Propagation Model
119877119908119878119863
119877119904119878119863
Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
49
2nd Issue How to model the unobserved hidden semantics
Matrix Factorization (MF) (Rendle et al 2009)
The decomposed matrices represent latent semantics for utterances and wordsslots respectively
The product of two matrices fills the probability of hidden semantics
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
9790 9585
93 929805 05
|119932|
|119934|+|119930|
asymp|119932|times119941 119941times (|119934|+|119930|)times
Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
50
Bayesian Personalized Ranking for MF Model implicit feedback
not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts
Objective
1
119891 +iquest iquest119891 minus119891 minus
The objective is to learn a set of well-ranked semantic slots per utterance
119906119909
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
51
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance
Matrix Factorization SLU (MF-SLU)
MF-SLU can estimate probabilities for slot candidates given test utterances
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
52
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
53
Experimental Setup Dataset Cambridge University SLU Corpus
Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type
Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots
Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
54
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
Approach ASR TranscriptsBaseline
SLUSupport Vector Machine 325 366
Multinomial Logistic Regression 340 388
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
55
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
The MF-SLU effectively models implicit information to decode semantics
The structure information further improves the results
Approach ASR Transcripts
Baseline SLU
Support Vector Machine 325 366Multinomial Logistic Regression 340 388
Proposed MF-SLU
Feature Model 376 453
Feature Model +Knowledge Graph Propagation
435
(+279)534
(+376)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
56
Experiments of Semantic DecodingEffectiveness of Relations
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
In the integrated structure information both semantic and dependency relations are useful for understanding
Approach ASR Transcripts
Feature Model 376 453
Feature + Knowledge Graph Propagation
Semantic 414 516
Dependency 416 490
All 435 (+157) 534 (+179)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Experiments for Structure LearningRelation Discovery Analysis
Discover inter-slot relations connecting important slot pairs
The reference ontology with the most frequent syntactic dependencies
locale_by_use
food expensiveness
seeking
relational_quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMODdesiring
DOBJ
type
food pricerange
DOBJ
AMOD AMOD
AMOD
taskarea
PREP_IN
The automatically learned domain ontology aligns well with the reference one
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57
The data-driven one is more objective while expert-annotated one is more subjective
58
Contributions of Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge
MF-SLU for Semantic Decoding is able to1) unify the automatically
acquired knowledge2) adapt to a domain-
specific setting 3) and then allows
systems to model implicit semantics for better understanding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
30
Ontology Induction Structure Learning Surface Form Derivation
Semantic Decoding Intent Prediction
ContributionsUser
Knowledge Acquisition SLU Modeling
find a cheap eating place for taiwanese food
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
31
Knowledge Acquisition1) Given unlabelled conversations how can a system automatically
induce and organize domain-specific concepts
Restaurant Asking
Conversations
target
foodprice
seeking
quantity
PREP_FOR
PREP_FOR
NN AMOD
AMODAMOD
Organized Domain Knowledge
Unlabelled Collection
Knowledge Acquisition
Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
32
SLU Modeling2) With the automatically acquired knowledge how can a system
understand utterance semantics and user intents
Organized Domain
Knowledge
price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation
SLU Modeling
SLU Component
ldquocan i have a cheap restaurantrdquo
SLU Modeling Semantic Decoding Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
33
SDS Architecture ndash Contributions
DomainDMASR SLU
NLG
Knowledge Acquisition SLU Modeling
current bottleneck
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
34
SDS Flowchart
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
35
SDS Flowchart ndash Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
36
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
37
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
38
[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing
FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame
ldquolow fatrdquo fills the descriptor frame element
SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated
FrameNet sentences
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
39
Ontology Induction [ASRUrsquo13 SLTrsquo14a]
can i have a cheap restaurant
Frame capability
Frame expensiveness
Frame locale by use
1st Issue differentiate domain-specific frames from generic frames for SDSs
GoodGood
Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014
slot candidate
Best Student Paper Award
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
40
1
Utterance 1i would like a cheap restaurant Train
hellip hellip
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1 Test
1 97 95
Frame Semantic Parsing
show me a list of cheap restaurantsTest Utterance
Word Observation Slot Candidate
Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award
Idea increase weights of domain-specific slots and decrease weights of others
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
41
1st Issue How to adapt generic slots to a domain-specific setting
Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot CandidateTrain
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
Slot Induction
Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication
i like
1 1
capability
1
locale_by_use
food expensiveness
seeking
relational_quantitydesiring
Utterance 1i would like a cheap restaurant
hellip hellip
find a restaurant with chinese foodUtterance 2
show me a list of cheap restaurantsTest Utterance
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
42
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
43
Knowledge Graph Construction Syntactic dependency parsing on utterances
ccomp
amoddobjnsubj det
can i have a cheap restaurantcapability expensiveness locale_by_use
Word-based lexical knowledge graph
Slot-based semantic knowledge graph
restaurantcan
have
i
acheap
w
w
capabilitylocale_by_use expensiveness
s
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
44
Dependency-based word embeddings
Dependency-based slot embeddings
Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)
can = have =
expensiveness = capability =
can i have a cheap restaurant
ccomp
amoddobjnsubj det
have acapability expensiveness locale_by_use
ccomp
amoddobjnsubj det
Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
45
Edge Weight Measurement Compute edge weights to represent relation importance
Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings
+
+
w1
w2
w3
w4
w5
w6
w7
s2
s1 s3
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
46
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
Slot Induction
Knowledge Graph Propagation Model119877119908
119878119863
119877119904119878119863
Structure information is integrated to make the self-training data more reliable
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
47
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance hidden semantics
2nd Issue unobserved semantics may benefit understanding
Semantic Decoding [ACL-IJCNLPrsquo15]
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
48
Reasoning with Matrix Factorization
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
9790 9585
93 929805 05
Slot Induction
Feature Model + Knowledge Graph Propagation Model
119877119908119878119863
119877119904119878119863
Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
49
2nd Issue How to model the unobserved hidden semantics
Matrix Factorization (MF) (Rendle et al 2009)
The decomposed matrices represent latent semantics for utterances and wordsslots respectively
The product of two matrices fills the probability of hidden semantics
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
9790 9585
93 929805 05
|119932|
|119934|+|119930|
asymp|119932|times119941 119941times (|119934|+|119930|)times
Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
50
Bayesian Personalized Ranking for MF Model implicit feedback
not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts
Objective
1
119891 +iquest iquest119891 minus119891 minus
The objective is to learn a set of well-ranked semantic slots per utterance
119906119909
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
51
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance
Matrix Factorization SLU (MF-SLU)
MF-SLU can estimate probabilities for slot candidates given test utterances
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
52
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
53
Experimental Setup Dataset Cambridge University SLU Corpus
Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type
Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots
Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
54
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
Approach ASR TranscriptsBaseline
SLUSupport Vector Machine 325 366
Multinomial Logistic Regression 340 388
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
55
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
The MF-SLU effectively models implicit information to decode semantics
The structure information further improves the results
Approach ASR Transcripts
Baseline SLU
Support Vector Machine 325 366Multinomial Logistic Regression 340 388
Proposed MF-SLU
Feature Model 376 453
Feature Model +Knowledge Graph Propagation
435
(+279)534
(+376)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
56
Experiments of Semantic DecodingEffectiveness of Relations
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
In the integrated structure information both semantic and dependency relations are useful for understanding
Approach ASR Transcripts
Feature Model 376 453
Feature + Knowledge Graph Propagation
Semantic 414 516
Dependency 416 490
All 435 (+157) 534 (+179)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Experiments for Structure LearningRelation Discovery Analysis
Discover inter-slot relations connecting important slot pairs
The reference ontology with the most frequent syntactic dependencies
locale_by_use
food expensiveness
seeking
relational_quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMODdesiring
DOBJ
type
food pricerange
DOBJ
AMOD AMOD
AMOD
taskarea
PREP_IN
The automatically learned domain ontology aligns well with the reference one
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57
The data-driven one is more objective while expert-annotated one is more subjective
58
Contributions of Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge
MF-SLU for Semantic Decoding is able to1) unify the automatically
acquired knowledge2) adapt to a domain-
specific setting 3) and then allows
systems to model implicit semantics for better understanding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
31
Knowledge Acquisition1) Given unlabelled conversations how can a system automatically
induce and organize domain-specific concepts
Restaurant Asking
Conversations
target
foodprice
seeking
quantity
PREP_FOR
PREP_FOR
NN AMOD
AMODAMOD
Organized Domain Knowledge
Unlabelled Collection
Knowledge Acquisition
Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
32
SLU Modeling2) With the automatically acquired knowledge how can a system
understand utterance semantics and user intents
Organized Domain
Knowledge
price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation
SLU Modeling
SLU Component
ldquocan i have a cheap restaurantrdquo
SLU Modeling Semantic Decoding Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
33
SDS Architecture ndash Contributions
DomainDMASR SLU
NLG
Knowledge Acquisition SLU Modeling
current bottleneck
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
34
SDS Flowchart
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
35
SDS Flowchart ndash Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
36
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
37
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
38
[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing
FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame
ldquolow fatrdquo fills the descriptor frame element
SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated
FrameNet sentences
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
39
Ontology Induction [ASRUrsquo13 SLTrsquo14a]
can i have a cheap restaurant
Frame capability
Frame expensiveness
Frame locale by use
1st Issue differentiate domain-specific frames from generic frames for SDSs
GoodGood
Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014
slot candidate
Best Student Paper Award
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
40
1
Utterance 1i would like a cheap restaurant Train
hellip hellip
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1 Test
1 97 95
Frame Semantic Parsing
show me a list of cheap restaurantsTest Utterance
Word Observation Slot Candidate
Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award
Idea increase weights of domain-specific slots and decrease weights of others
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
41
1st Issue How to adapt generic slots to a domain-specific setting
Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot CandidateTrain
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
Slot Induction
Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication
i like
1 1
capability
1
locale_by_use
food expensiveness
seeking
relational_quantitydesiring
Utterance 1i would like a cheap restaurant
hellip hellip
find a restaurant with chinese foodUtterance 2
show me a list of cheap restaurantsTest Utterance
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
42
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
43
Knowledge Graph Construction Syntactic dependency parsing on utterances
ccomp
amoddobjnsubj det
can i have a cheap restaurantcapability expensiveness locale_by_use
Word-based lexical knowledge graph
Slot-based semantic knowledge graph
restaurantcan
have
i
acheap
w
w
capabilitylocale_by_use expensiveness
s
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
44
Dependency-based word embeddings
Dependency-based slot embeddings
Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)
can = have =
expensiveness = capability =
can i have a cheap restaurant
ccomp
amoddobjnsubj det
have acapability expensiveness locale_by_use
ccomp
amoddobjnsubj det
Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
45
Edge Weight Measurement Compute edge weights to represent relation importance
Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings
+
+
w1
w2
w3
w4
w5
w6
w7
s2
s1 s3
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
46
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
Slot Induction
Knowledge Graph Propagation Model119877119908
119878119863
119877119904119878119863
Structure information is integrated to make the self-training data more reliable
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
47
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance hidden semantics
2nd Issue unobserved semantics may benefit understanding
Semantic Decoding [ACL-IJCNLPrsquo15]
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
48
Reasoning with Matrix Factorization
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
9790 9585
93 929805 05
Slot Induction
Feature Model + Knowledge Graph Propagation Model
119877119908119878119863
119877119904119878119863
Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
49
2nd Issue How to model the unobserved hidden semantics
Matrix Factorization (MF) (Rendle et al 2009)
The decomposed matrices represent latent semantics for utterances and wordsslots respectively
The product of two matrices fills the probability of hidden semantics
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
9790 9585
93 929805 05
|119932|
|119934|+|119930|
asymp|119932|times119941 119941times (|119934|+|119930|)times
Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
50
Bayesian Personalized Ranking for MF Model implicit feedback
not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts
Objective
1
119891 +iquest iquest119891 minus119891 minus
The objective is to learn a set of well-ranked semantic slots per utterance
119906119909
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
51
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance
Matrix Factorization SLU (MF-SLU)
MF-SLU can estimate probabilities for slot candidates given test utterances
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
52
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
53
Experimental Setup Dataset Cambridge University SLU Corpus
Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type
Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots
Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
54
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
Approach ASR TranscriptsBaseline
SLUSupport Vector Machine 325 366
Multinomial Logistic Regression 340 388
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
55
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
The MF-SLU effectively models implicit information to decode semantics
The structure information further improves the results
Approach ASR Transcripts
Baseline SLU
Support Vector Machine 325 366Multinomial Logistic Regression 340 388
Proposed MF-SLU
Feature Model 376 453
Feature Model +Knowledge Graph Propagation
435
(+279)534
(+376)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
56
Experiments of Semantic DecodingEffectiveness of Relations
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
In the integrated structure information both semantic and dependency relations are useful for understanding
Approach ASR Transcripts
Feature Model 376 453
Feature + Knowledge Graph Propagation
Semantic 414 516
Dependency 416 490
All 435 (+157) 534 (+179)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Experiments for Structure LearningRelation Discovery Analysis
Discover inter-slot relations connecting important slot pairs
The reference ontology with the most frequent syntactic dependencies
locale_by_use
food expensiveness
seeking
relational_quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMODdesiring
DOBJ
type
food pricerange
DOBJ
AMOD AMOD
AMOD
taskarea
PREP_IN
The automatically learned domain ontology aligns well with the reference one
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57
The data-driven one is more objective while expert-annotated one is more subjective
58
Contributions of Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge
MF-SLU for Semantic Decoding is able to1) unify the automatically
acquired knowledge2) adapt to a domain-
specific setting 3) and then allows
systems to model implicit semantics for better understanding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
32
SLU Modeling2) With the automatically acquired knowledge how can a system
understand utterance semantics and user intents
Organized Domain
Knowledge
price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation
SLU Modeling
SLU Component
ldquocan i have a cheap restaurantrdquo
SLU Modeling Semantic Decoding Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
33
SDS Architecture ndash Contributions
DomainDMASR SLU
NLG
Knowledge Acquisition SLU Modeling
current bottleneck
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
34
SDS Flowchart
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
35
SDS Flowchart ndash Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
36
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
37
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
38
[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing
FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame
ldquolow fatrdquo fills the descriptor frame element
SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated
FrameNet sentences
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
39
Ontology Induction [ASRUrsquo13 SLTrsquo14a]
can i have a cheap restaurant
Frame capability
Frame expensiveness
Frame locale by use
1st Issue differentiate domain-specific frames from generic frames for SDSs
GoodGood
Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014
slot candidate
Best Student Paper Award
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
40
1
Utterance 1i would like a cheap restaurant Train
hellip hellip
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1 Test
1 97 95
Frame Semantic Parsing
show me a list of cheap restaurantsTest Utterance
Word Observation Slot Candidate
Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award
Idea increase weights of domain-specific slots and decrease weights of others
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
41
1st Issue How to adapt generic slots to a domain-specific setting
Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot CandidateTrain
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
Slot Induction
Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication
i like
1 1
capability
1
locale_by_use
food expensiveness
seeking
relational_quantitydesiring
Utterance 1i would like a cheap restaurant
hellip hellip
find a restaurant with chinese foodUtterance 2
show me a list of cheap restaurantsTest Utterance
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
42
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
43
Knowledge Graph Construction Syntactic dependency parsing on utterances
ccomp
amoddobjnsubj det
can i have a cheap restaurantcapability expensiveness locale_by_use
Word-based lexical knowledge graph
Slot-based semantic knowledge graph
restaurantcan
have
i
acheap
w
w
capabilitylocale_by_use expensiveness
s
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
44
Dependency-based word embeddings
Dependency-based slot embeddings
Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)
can = have =
expensiveness = capability =
can i have a cheap restaurant
ccomp
amoddobjnsubj det
have acapability expensiveness locale_by_use
ccomp
amoddobjnsubj det
Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
45
Edge Weight Measurement Compute edge weights to represent relation importance
Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings
+
+
w1
w2
w3
w4
w5
w6
w7
s2
s1 s3
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
46
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
Slot Induction
Knowledge Graph Propagation Model119877119908
119878119863
119877119904119878119863
Structure information is integrated to make the self-training data more reliable
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
47
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance hidden semantics
2nd Issue unobserved semantics may benefit understanding
Semantic Decoding [ACL-IJCNLPrsquo15]
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
48
Reasoning with Matrix Factorization
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
9790 9585
93 929805 05
Slot Induction
Feature Model + Knowledge Graph Propagation Model
119877119908119878119863
119877119904119878119863
Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
49
2nd Issue How to model the unobserved hidden semantics
Matrix Factorization (MF) (Rendle et al 2009)
The decomposed matrices represent latent semantics for utterances and wordsslots respectively
The product of two matrices fills the probability of hidden semantics
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
9790 9585
93 929805 05
|119932|
|119934|+|119930|
asymp|119932|times119941 119941times (|119934|+|119930|)times
Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
50
Bayesian Personalized Ranking for MF Model implicit feedback
not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts
Objective
1
119891 +iquest iquest119891 minus119891 minus
The objective is to learn a set of well-ranked semantic slots per utterance
119906119909
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
51
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance
Matrix Factorization SLU (MF-SLU)
MF-SLU can estimate probabilities for slot candidates given test utterances
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
52
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
53
Experimental Setup Dataset Cambridge University SLU Corpus
Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type
Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots
Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
54
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
Approach ASR TranscriptsBaseline
SLUSupport Vector Machine 325 366
Multinomial Logistic Regression 340 388
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
55
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
The MF-SLU effectively models implicit information to decode semantics
The structure information further improves the results
Approach ASR Transcripts
Baseline SLU
Support Vector Machine 325 366Multinomial Logistic Regression 340 388
Proposed MF-SLU
Feature Model 376 453
Feature Model +Knowledge Graph Propagation
435
(+279)534
(+376)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
56
Experiments of Semantic DecodingEffectiveness of Relations
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
In the integrated structure information both semantic and dependency relations are useful for understanding
Approach ASR Transcripts
Feature Model 376 453
Feature + Knowledge Graph Propagation
Semantic 414 516
Dependency 416 490
All 435 (+157) 534 (+179)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Experiments for Structure LearningRelation Discovery Analysis
Discover inter-slot relations connecting important slot pairs
The reference ontology with the most frequent syntactic dependencies
locale_by_use
food expensiveness
seeking
relational_quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMODdesiring
DOBJ
type
food pricerange
DOBJ
AMOD AMOD
AMOD
taskarea
PREP_IN
The automatically learned domain ontology aligns well with the reference one
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57
The data-driven one is more objective while expert-annotated one is more subjective
58
Contributions of Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge
MF-SLU for Semantic Decoding is able to1) unify the automatically
acquired knowledge2) adapt to a domain-
specific setting 3) and then allows
systems to model implicit semantics for better understanding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
33
SDS Architecture ndash Contributions
DomainDMASR SLU
NLG
Knowledge Acquisition SLU Modeling
current bottleneck
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
34
SDS Flowchart
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
35
SDS Flowchart ndash Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
36
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
37
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
38
[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing
FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame
ldquolow fatrdquo fills the descriptor frame element
SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated
FrameNet sentences
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
39
Ontology Induction [ASRUrsquo13 SLTrsquo14a]
can i have a cheap restaurant
Frame capability
Frame expensiveness
Frame locale by use
1st Issue differentiate domain-specific frames from generic frames for SDSs
GoodGood
Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014
slot candidate
Best Student Paper Award
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
40
1
Utterance 1i would like a cheap restaurant Train
hellip hellip
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1 Test
1 97 95
Frame Semantic Parsing
show me a list of cheap restaurantsTest Utterance
Word Observation Slot Candidate
Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award
Idea increase weights of domain-specific slots and decrease weights of others
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
41
1st Issue How to adapt generic slots to a domain-specific setting
Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot CandidateTrain
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
Slot Induction
Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication
i like
1 1
capability
1
locale_by_use
food expensiveness
seeking
relational_quantitydesiring
Utterance 1i would like a cheap restaurant
hellip hellip
find a restaurant with chinese foodUtterance 2
show me a list of cheap restaurantsTest Utterance
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
42
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
43
Knowledge Graph Construction Syntactic dependency parsing on utterances
ccomp
amoddobjnsubj det
can i have a cheap restaurantcapability expensiveness locale_by_use
Word-based lexical knowledge graph
Slot-based semantic knowledge graph
restaurantcan
have
i
acheap
w
w
capabilitylocale_by_use expensiveness
s
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
44
Dependency-based word embeddings
Dependency-based slot embeddings
Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)
can = have =
expensiveness = capability =
can i have a cheap restaurant
ccomp
amoddobjnsubj det
have acapability expensiveness locale_by_use
ccomp
amoddobjnsubj det
Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
45
Edge Weight Measurement Compute edge weights to represent relation importance
Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings
+
+
w1
w2
w3
w4
w5
w6
w7
s2
s1 s3
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
46
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
Slot Induction
Knowledge Graph Propagation Model119877119908
119878119863
119877119904119878119863
Structure information is integrated to make the self-training data more reliable
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
47
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance hidden semantics
2nd Issue unobserved semantics may benefit understanding
Semantic Decoding [ACL-IJCNLPrsquo15]
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
48
Reasoning with Matrix Factorization
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
9790 9585
93 929805 05
Slot Induction
Feature Model + Knowledge Graph Propagation Model
119877119908119878119863
119877119904119878119863
Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
49
2nd Issue How to model the unobserved hidden semantics
Matrix Factorization (MF) (Rendle et al 2009)
The decomposed matrices represent latent semantics for utterances and wordsslots respectively
The product of two matrices fills the probability of hidden semantics
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
9790 9585
93 929805 05
|119932|
|119934|+|119930|
asymp|119932|times119941 119941times (|119934|+|119930|)times
Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
50
Bayesian Personalized Ranking for MF Model implicit feedback
not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts
Objective
1
119891 +iquest iquest119891 minus119891 minus
The objective is to learn a set of well-ranked semantic slots per utterance
119906119909
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
51
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance
Matrix Factorization SLU (MF-SLU)
MF-SLU can estimate probabilities for slot candidates given test utterances
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
52
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
53
Experimental Setup Dataset Cambridge University SLU Corpus
Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type
Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots
Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
54
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
Approach ASR TranscriptsBaseline
SLUSupport Vector Machine 325 366
Multinomial Logistic Regression 340 388
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
55
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
The MF-SLU effectively models implicit information to decode semantics
The structure information further improves the results
Approach ASR Transcripts
Baseline SLU
Support Vector Machine 325 366Multinomial Logistic Regression 340 388
Proposed MF-SLU
Feature Model 376 453
Feature Model +Knowledge Graph Propagation
435
(+279)534
(+376)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
56
Experiments of Semantic DecodingEffectiveness of Relations
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
In the integrated structure information both semantic and dependency relations are useful for understanding
Approach ASR Transcripts
Feature Model 376 453
Feature + Knowledge Graph Propagation
Semantic 414 516
Dependency 416 490
All 435 (+157) 534 (+179)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Experiments for Structure LearningRelation Discovery Analysis
Discover inter-slot relations connecting important slot pairs
The reference ontology with the most frequent syntactic dependencies
locale_by_use
food expensiveness
seeking
relational_quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMODdesiring
DOBJ
type
food pricerange
DOBJ
AMOD AMOD
AMOD
taskarea
PREP_IN
The automatically learned domain ontology aligns well with the reference one
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57
The data-driven one is more objective while expert-annotated one is more subjective
58
Contributions of Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge
MF-SLU for Semantic Decoding is able to1) unify the automatically
acquired knowledge2) adapt to a domain-
specific setting 3) and then allows
systems to model implicit semantics for better understanding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
34
SDS Flowchart
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
35
SDS Flowchart ndash Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
36
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
37
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
38
[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing
FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame
ldquolow fatrdquo fills the descriptor frame element
SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated
FrameNet sentences
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
39
Ontology Induction [ASRUrsquo13 SLTrsquo14a]
can i have a cheap restaurant
Frame capability
Frame expensiveness
Frame locale by use
1st Issue differentiate domain-specific frames from generic frames for SDSs
GoodGood
Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014
slot candidate
Best Student Paper Award
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
40
1
Utterance 1i would like a cheap restaurant Train
hellip hellip
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1 Test
1 97 95
Frame Semantic Parsing
show me a list of cheap restaurantsTest Utterance
Word Observation Slot Candidate
Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award
Idea increase weights of domain-specific slots and decrease weights of others
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
41
1st Issue How to adapt generic slots to a domain-specific setting
Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot CandidateTrain
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
Slot Induction
Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication
i like
1 1
capability
1
locale_by_use
food expensiveness
seeking
relational_quantitydesiring
Utterance 1i would like a cheap restaurant
hellip hellip
find a restaurant with chinese foodUtterance 2
show me a list of cheap restaurantsTest Utterance
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
42
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
43
Knowledge Graph Construction Syntactic dependency parsing on utterances
ccomp
amoddobjnsubj det
can i have a cheap restaurantcapability expensiveness locale_by_use
Word-based lexical knowledge graph
Slot-based semantic knowledge graph
restaurantcan
have
i
acheap
w
w
capabilitylocale_by_use expensiveness
s
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
44
Dependency-based word embeddings
Dependency-based slot embeddings
Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)
can = have =
expensiveness = capability =
can i have a cheap restaurant
ccomp
amoddobjnsubj det
have acapability expensiveness locale_by_use
ccomp
amoddobjnsubj det
Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
45
Edge Weight Measurement Compute edge weights to represent relation importance
Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings
+
+
w1
w2
w3
w4
w5
w6
w7
s2
s1 s3
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
46
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
Slot Induction
Knowledge Graph Propagation Model119877119908
119878119863
119877119904119878119863
Structure information is integrated to make the self-training data more reliable
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
47
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance hidden semantics
2nd Issue unobserved semantics may benefit understanding
Semantic Decoding [ACL-IJCNLPrsquo15]
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
48
Reasoning with Matrix Factorization
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
9790 9585
93 929805 05
Slot Induction
Feature Model + Knowledge Graph Propagation Model
119877119908119878119863
119877119904119878119863
Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
49
2nd Issue How to model the unobserved hidden semantics
Matrix Factorization (MF) (Rendle et al 2009)
The decomposed matrices represent latent semantics for utterances and wordsslots respectively
The product of two matrices fills the probability of hidden semantics
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
9790 9585
93 929805 05
|119932|
|119934|+|119930|
asymp|119932|times119941 119941times (|119934|+|119930|)times
Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
50
Bayesian Personalized Ranking for MF Model implicit feedback
not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts
Objective
1
119891 +iquest iquest119891 minus119891 minus
The objective is to learn a set of well-ranked semantic slots per utterance
119906119909
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
51
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance
Matrix Factorization SLU (MF-SLU)
MF-SLU can estimate probabilities for slot candidates given test utterances
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
52
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
53
Experimental Setup Dataset Cambridge University SLU Corpus
Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type
Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots
Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
54
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
Approach ASR TranscriptsBaseline
SLUSupport Vector Machine 325 366
Multinomial Logistic Regression 340 388
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
55
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
The MF-SLU effectively models implicit information to decode semantics
The structure information further improves the results
Approach ASR Transcripts
Baseline SLU
Support Vector Machine 325 366Multinomial Logistic Regression 340 388
Proposed MF-SLU
Feature Model 376 453
Feature Model +Knowledge Graph Propagation
435
(+279)534
(+376)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
56
Experiments of Semantic DecodingEffectiveness of Relations
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
In the integrated structure information both semantic and dependency relations are useful for understanding
Approach ASR Transcripts
Feature Model 376 453
Feature + Knowledge Graph Propagation
Semantic 414 516
Dependency 416 490
All 435 (+157) 534 (+179)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Experiments for Structure LearningRelation Discovery Analysis
Discover inter-slot relations connecting important slot pairs
The reference ontology with the most frequent syntactic dependencies
locale_by_use
food expensiveness
seeking
relational_quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMODdesiring
DOBJ
type
food pricerange
DOBJ
AMOD AMOD
AMOD
taskarea
PREP_IN
The automatically learned domain ontology aligns well with the reference one
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57
The data-driven one is more objective while expert-annotated one is more subjective
58
Contributions of Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge
MF-SLU for Semantic Decoding is able to1) unify the automatically
acquired knowledge2) adapt to a domain-
specific setting 3) and then allows
systems to model implicit semantics for better understanding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
35
SDS Flowchart ndash Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
36
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
37
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
38
[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing
FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame
ldquolow fatrdquo fills the descriptor frame element
SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated
FrameNet sentences
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
39
Ontology Induction [ASRUrsquo13 SLTrsquo14a]
can i have a cheap restaurant
Frame capability
Frame expensiveness
Frame locale by use
1st Issue differentiate domain-specific frames from generic frames for SDSs
GoodGood
Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014
slot candidate
Best Student Paper Award
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
40
1
Utterance 1i would like a cheap restaurant Train
hellip hellip
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1 Test
1 97 95
Frame Semantic Parsing
show me a list of cheap restaurantsTest Utterance
Word Observation Slot Candidate
Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award
Idea increase weights of domain-specific slots and decrease weights of others
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
41
1st Issue How to adapt generic slots to a domain-specific setting
Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot CandidateTrain
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
Slot Induction
Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication
i like
1 1
capability
1
locale_by_use
food expensiveness
seeking
relational_quantitydesiring
Utterance 1i would like a cheap restaurant
hellip hellip
find a restaurant with chinese foodUtterance 2
show me a list of cheap restaurantsTest Utterance
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
42
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
43
Knowledge Graph Construction Syntactic dependency parsing on utterances
ccomp
amoddobjnsubj det
can i have a cheap restaurantcapability expensiveness locale_by_use
Word-based lexical knowledge graph
Slot-based semantic knowledge graph
restaurantcan
have
i
acheap
w
w
capabilitylocale_by_use expensiveness
s
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
44
Dependency-based word embeddings
Dependency-based slot embeddings
Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)
can = have =
expensiveness = capability =
can i have a cheap restaurant
ccomp
amoddobjnsubj det
have acapability expensiveness locale_by_use
ccomp
amoddobjnsubj det
Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
45
Edge Weight Measurement Compute edge weights to represent relation importance
Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings
+
+
w1
w2
w3
w4
w5
w6
w7
s2
s1 s3
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
46
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
Slot Induction
Knowledge Graph Propagation Model119877119908
119878119863
119877119904119878119863
Structure information is integrated to make the self-training data more reliable
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
47
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance hidden semantics
2nd Issue unobserved semantics may benefit understanding
Semantic Decoding [ACL-IJCNLPrsquo15]
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
48
Reasoning with Matrix Factorization
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
9790 9585
93 929805 05
Slot Induction
Feature Model + Knowledge Graph Propagation Model
119877119908119878119863
119877119904119878119863
Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
49
2nd Issue How to model the unobserved hidden semantics
Matrix Factorization (MF) (Rendle et al 2009)
The decomposed matrices represent latent semantics for utterances and wordsslots respectively
The product of two matrices fills the probability of hidden semantics
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
9790 9585
93 929805 05
|119932|
|119934|+|119930|
asymp|119932|times119941 119941times (|119934|+|119930|)times
Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
50
Bayesian Personalized Ranking for MF Model implicit feedback
not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts
Objective
1
119891 +iquest iquest119891 minus119891 minus
The objective is to learn a set of well-ranked semantic slots per utterance
119906119909
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
51
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance
Matrix Factorization SLU (MF-SLU)
MF-SLU can estimate probabilities for slot candidates given test utterances
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
52
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
53
Experimental Setup Dataset Cambridge University SLU Corpus
Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type
Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots
Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
54
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
Approach ASR TranscriptsBaseline
SLUSupport Vector Machine 325 366
Multinomial Logistic Regression 340 388
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
55
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
The MF-SLU effectively models implicit information to decode semantics
The structure information further improves the results
Approach ASR Transcripts
Baseline SLU
Support Vector Machine 325 366Multinomial Logistic Regression 340 388
Proposed MF-SLU
Feature Model 376 453
Feature Model +Knowledge Graph Propagation
435
(+279)534
(+376)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
56
Experiments of Semantic DecodingEffectiveness of Relations
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
In the integrated structure information both semantic and dependency relations are useful for understanding
Approach ASR Transcripts
Feature Model 376 453
Feature + Knowledge Graph Propagation
Semantic 414 516
Dependency 416 490
All 435 (+157) 534 (+179)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Experiments for Structure LearningRelation Discovery Analysis
Discover inter-slot relations connecting important slot pairs
The reference ontology with the most frequent syntactic dependencies
locale_by_use
food expensiveness
seeking
relational_quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMODdesiring
DOBJ
type
food pricerange
DOBJ
AMOD AMOD
AMOD
taskarea
PREP_IN
The automatically learned domain ontology aligns well with the reference one
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57
The data-driven one is more objective while expert-annotated one is more subjective
58
Contributions of Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge
MF-SLU for Semantic Decoding is able to1) unify the automatically
acquired knowledge2) adapt to a domain-
specific setting 3) and then allows
systems to model implicit semantics for better understanding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
36
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
37
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
38
[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing
FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame
ldquolow fatrdquo fills the descriptor frame element
SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated
FrameNet sentences
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
39
Ontology Induction [ASRUrsquo13 SLTrsquo14a]
can i have a cheap restaurant
Frame capability
Frame expensiveness
Frame locale by use
1st Issue differentiate domain-specific frames from generic frames for SDSs
GoodGood
Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014
slot candidate
Best Student Paper Award
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
40
1
Utterance 1i would like a cheap restaurant Train
hellip hellip
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1 Test
1 97 95
Frame Semantic Parsing
show me a list of cheap restaurantsTest Utterance
Word Observation Slot Candidate
Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award
Idea increase weights of domain-specific slots and decrease weights of others
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
41
1st Issue How to adapt generic slots to a domain-specific setting
Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot CandidateTrain
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
Slot Induction
Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication
i like
1 1
capability
1
locale_by_use
food expensiveness
seeking
relational_quantitydesiring
Utterance 1i would like a cheap restaurant
hellip hellip
find a restaurant with chinese foodUtterance 2
show me a list of cheap restaurantsTest Utterance
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
42
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
43
Knowledge Graph Construction Syntactic dependency parsing on utterances
ccomp
amoddobjnsubj det
can i have a cheap restaurantcapability expensiveness locale_by_use
Word-based lexical knowledge graph
Slot-based semantic knowledge graph
restaurantcan
have
i
acheap
w
w
capabilitylocale_by_use expensiveness
s
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
44
Dependency-based word embeddings
Dependency-based slot embeddings
Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)
can = have =
expensiveness = capability =
can i have a cheap restaurant
ccomp
amoddobjnsubj det
have acapability expensiveness locale_by_use
ccomp
amoddobjnsubj det
Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
45
Edge Weight Measurement Compute edge weights to represent relation importance
Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings
+
+
w1
w2
w3
w4
w5
w6
w7
s2
s1 s3
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
46
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
Slot Induction
Knowledge Graph Propagation Model119877119908
119878119863
119877119904119878119863
Structure information is integrated to make the self-training data more reliable
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
47
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance hidden semantics
2nd Issue unobserved semantics may benefit understanding
Semantic Decoding [ACL-IJCNLPrsquo15]
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
48
Reasoning with Matrix Factorization
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
9790 9585
93 929805 05
Slot Induction
Feature Model + Knowledge Graph Propagation Model
119877119908119878119863
119877119904119878119863
Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
49
2nd Issue How to model the unobserved hidden semantics
Matrix Factorization (MF) (Rendle et al 2009)
The decomposed matrices represent latent semantics for utterances and wordsslots respectively
The product of two matrices fills the probability of hidden semantics
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
9790 9585
93 929805 05
|119932|
|119934|+|119930|
asymp|119932|times119941 119941times (|119934|+|119930|)times
Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
50
Bayesian Personalized Ranking for MF Model implicit feedback
not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts
Objective
1
119891 +iquest iquest119891 minus119891 minus
The objective is to learn a set of well-ranked semantic slots per utterance
119906119909
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
51
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance
Matrix Factorization SLU (MF-SLU)
MF-SLU can estimate probabilities for slot candidates given test utterances
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
52
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
53
Experimental Setup Dataset Cambridge University SLU Corpus
Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type
Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots
Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
54
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
Approach ASR TranscriptsBaseline
SLUSupport Vector Machine 325 366
Multinomial Logistic Regression 340 388
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
55
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
The MF-SLU effectively models implicit information to decode semantics
The structure information further improves the results
Approach ASR Transcripts
Baseline SLU
Support Vector Machine 325 366Multinomial Logistic Regression 340 388
Proposed MF-SLU
Feature Model 376 453
Feature Model +Knowledge Graph Propagation
435
(+279)534
(+376)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
56
Experiments of Semantic DecodingEffectiveness of Relations
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
In the integrated structure information both semantic and dependency relations are useful for understanding
Approach ASR Transcripts
Feature Model 376 453
Feature + Knowledge Graph Propagation
Semantic 414 516
Dependency 416 490
All 435 (+157) 534 (+179)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Experiments for Structure LearningRelation Discovery Analysis
Discover inter-slot relations connecting important slot pairs
The reference ontology with the most frequent syntactic dependencies
locale_by_use
food expensiveness
seeking
relational_quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMODdesiring
DOBJ
type
food pricerange
DOBJ
AMOD AMOD
AMOD
taskarea
PREP_IN
The automatically learned domain ontology aligns well with the reference one
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57
The data-driven one is more objective while expert-annotated one is more subjective
58
Contributions of Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge
MF-SLU for Semantic Decoding is able to1) unify the automatically
acquired knowledge2) adapt to a domain-
specific setting 3) and then allows
systems to model implicit semantics for better understanding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
37
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
38
[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing
FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame
ldquolow fatrdquo fills the descriptor frame element
SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated
FrameNet sentences
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
39
Ontology Induction [ASRUrsquo13 SLTrsquo14a]
can i have a cheap restaurant
Frame capability
Frame expensiveness
Frame locale by use
1st Issue differentiate domain-specific frames from generic frames for SDSs
GoodGood
Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014
slot candidate
Best Student Paper Award
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
40
1
Utterance 1i would like a cheap restaurant Train
hellip hellip
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1 Test
1 97 95
Frame Semantic Parsing
show me a list of cheap restaurantsTest Utterance
Word Observation Slot Candidate
Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award
Idea increase weights of domain-specific slots and decrease weights of others
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
41
1st Issue How to adapt generic slots to a domain-specific setting
Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot CandidateTrain
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
Slot Induction
Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication
i like
1 1
capability
1
locale_by_use
food expensiveness
seeking
relational_quantitydesiring
Utterance 1i would like a cheap restaurant
hellip hellip
find a restaurant with chinese foodUtterance 2
show me a list of cheap restaurantsTest Utterance
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
42
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
43
Knowledge Graph Construction Syntactic dependency parsing on utterances
ccomp
amoddobjnsubj det
can i have a cheap restaurantcapability expensiveness locale_by_use
Word-based lexical knowledge graph
Slot-based semantic knowledge graph
restaurantcan
have
i
acheap
w
w
capabilitylocale_by_use expensiveness
s
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
44
Dependency-based word embeddings
Dependency-based slot embeddings
Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)
can = have =
expensiveness = capability =
can i have a cheap restaurant
ccomp
amoddobjnsubj det
have acapability expensiveness locale_by_use
ccomp
amoddobjnsubj det
Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
45
Edge Weight Measurement Compute edge weights to represent relation importance
Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings
+
+
w1
w2
w3
w4
w5
w6
w7
s2
s1 s3
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
46
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
Slot Induction
Knowledge Graph Propagation Model119877119908
119878119863
119877119904119878119863
Structure information is integrated to make the self-training data more reliable
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
47
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance hidden semantics
2nd Issue unobserved semantics may benefit understanding
Semantic Decoding [ACL-IJCNLPrsquo15]
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
48
Reasoning with Matrix Factorization
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
9790 9585
93 929805 05
Slot Induction
Feature Model + Knowledge Graph Propagation Model
119877119908119878119863
119877119904119878119863
Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
49
2nd Issue How to model the unobserved hidden semantics
Matrix Factorization (MF) (Rendle et al 2009)
The decomposed matrices represent latent semantics for utterances and wordsslots respectively
The product of two matrices fills the probability of hidden semantics
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
9790 9585
93 929805 05
|119932|
|119934|+|119930|
asymp|119932|times119941 119941times (|119934|+|119930|)times
Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
50
Bayesian Personalized Ranking for MF Model implicit feedback
not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts
Objective
1
119891 +iquest iquest119891 minus119891 minus
The objective is to learn a set of well-ranked semantic slots per utterance
119906119909
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
51
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance
Matrix Factorization SLU (MF-SLU)
MF-SLU can estimate probabilities for slot candidates given test utterances
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
52
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
53
Experimental Setup Dataset Cambridge University SLU Corpus
Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type
Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots
Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
54
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
Approach ASR TranscriptsBaseline
SLUSupport Vector Machine 325 366
Multinomial Logistic Regression 340 388
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
55
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
The MF-SLU effectively models implicit information to decode semantics
The structure information further improves the results
Approach ASR Transcripts
Baseline SLU
Support Vector Machine 325 366Multinomial Logistic Regression 340 388
Proposed MF-SLU
Feature Model 376 453
Feature Model +Knowledge Graph Propagation
435
(+279)534
(+376)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
56
Experiments of Semantic DecodingEffectiveness of Relations
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
In the integrated structure information both semantic and dependency relations are useful for understanding
Approach ASR Transcripts
Feature Model 376 453
Feature + Knowledge Graph Propagation
Semantic 414 516
Dependency 416 490
All 435 (+157) 534 (+179)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Experiments for Structure LearningRelation Discovery Analysis
Discover inter-slot relations connecting important slot pairs
The reference ontology with the most frequent syntactic dependencies
locale_by_use
food expensiveness
seeking
relational_quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMODdesiring
DOBJ
type
food pricerange
DOBJ
AMOD AMOD
AMOD
taskarea
PREP_IN
The automatically learned domain ontology aligns well with the reference one
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57
The data-driven one is more objective while expert-annotated one is more subjective
58
Contributions of Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge
MF-SLU for Semantic Decoding is able to1) unify the automatically
acquired knowledge2) adapt to a domain-
specific setting 3) and then allows
systems to model implicit semantics for better understanding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
38
[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing
FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame
ldquolow fatrdquo fills the descriptor frame element
SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated
FrameNet sentences
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
39
Ontology Induction [ASRUrsquo13 SLTrsquo14a]
can i have a cheap restaurant
Frame capability
Frame expensiveness
Frame locale by use
1st Issue differentiate domain-specific frames from generic frames for SDSs
GoodGood
Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014
slot candidate
Best Student Paper Award
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
40
1
Utterance 1i would like a cheap restaurant Train
hellip hellip
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1 Test
1 97 95
Frame Semantic Parsing
show me a list of cheap restaurantsTest Utterance
Word Observation Slot Candidate
Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award
Idea increase weights of domain-specific slots and decrease weights of others
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
41
1st Issue How to adapt generic slots to a domain-specific setting
Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot CandidateTrain
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
Slot Induction
Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication
i like
1 1
capability
1
locale_by_use
food expensiveness
seeking
relational_quantitydesiring
Utterance 1i would like a cheap restaurant
hellip hellip
find a restaurant with chinese foodUtterance 2
show me a list of cheap restaurantsTest Utterance
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
42
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
43
Knowledge Graph Construction Syntactic dependency parsing on utterances
ccomp
amoddobjnsubj det
can i have a cheap restaurantcapability expensiveness locale_by_use
Word-based lexical knowledge graph
Slot-based semantic knowledge graph
restaurantcan
have
i
acheap
w
w
capabilitylocale_by_use expensiveness
s
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
44
Dependency-based word embeddings
Dependency-based slot embeddings
Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)
can = have =
expensiveness = capability =
can i have a cheap restaurant
ccomp
amoddobjnsubj det
have acapability expensiveness locale_by_use
ccomp
amoddobjnsubj det
Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
45
Edge Weight Measurement Compute edge weights to represent relation importance
Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings
+
+
w1
w2
w3
w4
w5
w6
w7
s2
s1 s3
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
46
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
Slot Induction
Knowledge Graph Propagation Model119877119908
119878119863
119877119904119878119863
Structure information is integrated to make the self-training data more reliable
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
47
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance hidden semantics
2nd Issue unobserved semantics may benefit understanding
Semantic Decoding [ACL-IJCNLPrsquo15]
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
48
Reasoning with Matrix Factorization
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
9790 9585
93 929805 05
Slot Induction
Feature Model + Knowledge Graph Propagation Model
119877119908119878119863
119877119904119878119863
Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
49
2nd Issue How to model the unobserved hidden semantics
Matrix Factorization (MF) (Rendle et al 2009)
The decomposed matrices represent latent semantics for utterances and wordsslots respectively
The product of two matrices fills the probability of hidden semantics
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
9790 9585
93 929805 05
|119932|
|119934|+|119930|
asymp|119932|times119941 119941times (|119934|+|119930|)times
Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
50
Bayesian Personalized Ranking for MF Model implicit feedback
not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts
Objective
1
119891 +iquest iquest119891 minus119891 minus
The objective is to learn a set of well-ranked semantic slots per utterance
119906119909
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
51
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance
Matrix Factorization SLU (MF-SLU)
MF-SLU can estimate probabilities for slot candidates given test utterances
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
52
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
53
Experimental Setup Dataset Cambridge University SLU Corpus
Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type
Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots
Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
54
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
Approach ASR TranscriptsBaseline
SLUSupport Vector Machine 325 366
Multinomial Logistic Regression 340 388
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
55
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
The MF-SLU effectively models implicit information to decode semantics
The structure information further improves the results
Approach ASR Transcripts
Baseline SLU
Support Vector Machine 325 366Multinomial Logistic Regression 340 388
Proposed MF-SLU
Feature Model 376 453
Feature Model +Knowledge Graph Propagation
435
(+279)534
(+376)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
56
Experiments of Semantic DecodingEffectiveness of Relations
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
In the integrated structure information both semantic and dependency relations are useful for understanding
Approach ASR Transcripts
Feature Model 376 453
Feature + Knowledge Graph Propagation
Semantic 414 516
Dependency 416 490
All 435 (+157) 534 (+179)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Experiments for Structure LearningRelation Discovery Analysis
Discover inter-slot relations connecting important slot pairs
The reference ontology with the most frequent syntactic dependencies
locale_by_use
food expensiveness
seeking
relational_quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMODdesiring
DOBJ
type
food pricerange
DOBJ
AMOD AMOD
AMOD
taskarea
PREP_IN
The automatically learned domain ontology aligns well with the reference one
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57
The data-driven one is more objective while expert-annotated one is more subjective
58
Contributions of Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge
MF-SLU for Semantic Decoding is able to1) unify the automatically
acquired knowledge2) adapt to a domain-
specific setting 3) and then allows
systems to model implicit semantics for better understanding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
39
Ontology Induction [ASRUrsquo13 SLTrsquo14a]
can i have a cheap restaurant
Frame capability
Frame expensiveness
Frame locale by use
1st Issue differentiate domain-specific frames from generic frames for SDSs
GoodGood
Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014
slot candidate
Best Student Paper Award
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
40
1
Utterance 1i would like a cheap restaurant Train
hellip hellip
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1 Test
1 97 95
Frame Semantic Parsing
show me a list of cheap restaurantsTest Utterance
Word Observation Slot Candidate
Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award
Idea increase weights of domain-specific slots and decrease weights of others
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
41
1st Issue How to adapt generic slots to a domain-specific setting
Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot CandidateTrain
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
Slot Induction
Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication
i like
1 1
capability
1
locale_by_use
food expensiveness
seeking
relational_quantitydesiring
Utterance 1i would like a cheap restaurant
hellip hellip
find a restaurant with chinese foodUtterance 2
show me a list of cheap restaurantsTest Utterance
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
42
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
43
Knowledge Graph Construction Syntactic dependency parsing on utterances
ccomp
amoddobjnsubj det
can i have a cheap restaurantcapability expensiveness locale_by_use
Word-based lexical knowledge graph
Slot-based semantic knowledge graph
restaurantcan
have
i
acheap
w
w
capabilitylocale_by_use expensiveness
s
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
44
Dependency-based word embeddings
Dependency-based slot embeddings
Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)
can = have =
expensiveness = capability =
can i have a cheap restaurant
ccomp
amoddobjnsubj det
have acapability expensiveness locale_by_use
ccomp
amoddobjnsubj det
Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
45
Edge Weight Measurement Compute edge weights to represent relation importance
Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings
+
+
w1
w2
w3
w4
w5
w6
w7
s2
s1 s3
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
46
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
Slot Induction
Knowledge Graph Propagation Model119877119908
119878119863
119877119904119878119863
Structure information is integrated to make the self-training data more reliable
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
47
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance hidden semantics
2nd Issue unobserved semantics may benefit understanding
Semantic Decoding [ACL-IJCNLPrsquo15]
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
48
Reasoning with Matrix Factorization
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
9790 9585
93 929805 05
Slot Induction
Feature Model + Knowledge Graph Propagation Model
119877119908119878119863
119877119904119878119863
Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
49
2nd Issue How to model the unobserved hidden semantics
Matrix Factorization (MF) (Rendle et al 2009)
The decomposed matrices represent latent semantics for utterances and wordsslots respectively
The product of two matrices fills the probability of hidden semantics
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
9790 9585
93 929805 05
|119932|
|119934|+|119930|
asymp|119932|times119941 119941times (|119934|+|119930|)times
Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
50
Bayesian Personalized Ranking for MF Model implicit feedback
not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts
Objective
1
119891 +iquest iquest119891 minus119891 minus
The objective is to learn a set of well-ranked semantic slots per utterance
119906119909
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
51
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance
Matrix Factorization SLU (MF-SLU)
MF-SLU can estimate probabilities for slot candidates given test utterances
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
52
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
53
Experimental Setup Dataset Cambridge University SLU Corpus
Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type
Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots
Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
54
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
Approach ASR TranscriptsBaseline
SLUSupport Vector Machine 325 366
Multinomial Logistic Regression 340 388
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
55
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
The MF-SLU effectively models implicit information to decode semantics
The structure information further improves the results
Approach ASR Transcripts
Baseline SLU
Support Vector Machine 325 366Multinomial Logistic Regression 340 388
Proposed MF-SLU
Feature Model 376 453
Feature Model +Knowledge Graph Propagation
435
(+279)534
(+376)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
56
Experiments of Semantic DecodingEffectiveness of Relations
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
In the integrated structure information both semantic and dependency relations are useful for understanding
Approach ASR Transcripts
Feature Model 376 453
Feature + Knowledge Graph Propagation
Semantic 414 516
Dependency 416 490
All 435 (+157) 534 (+179)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Experiments for Structure LearningRelation Discovery Analysis
Discover inter-slot relations connecting important slot pairs
The reference ontology with the most frequent syntactic dependencies
locale_by_use
food expensiveness
seeking
relational_quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMODdesiring
DOBJ
type
food pricerange
DOBJ
AMOD AMOD
AMOD
taskarea
PREP_IN
The automatically learned domain ontology aligns well with the reference one
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57
The data-driven one is more objective while expert-annotated one is more subjective
58
Contributions of Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge
MF-SLU for Semantic Decoding is able to1) unify the automatically
acquired knowledge2) adapt to a domain-
specific setting 3) and then allows
systems to model implicit semantics for better understanding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
40
1
Utterance 1i would like a cheap restaurant Train
hellip hellip
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1 Test
1 97 95
Frame Semantic Parsing
show me a list of cheap restaurantsTest Utterance
Word Observation Slot Candidate
Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award
Idea increase weights of domain-specific slots and decrease weights of others
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
41
1st Issue How to adapt generic slots to a domain-specific setting
Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot CandidateTrain
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
Slot Induction
Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication
i like
1 1
capability
1
locale_by_use
food expensiveness
seeking
relational_quantitydesiring
Utterance 1i would like a cheap restaurant
hellip hellip
find a restaurant with chinese foodUtterance 2
show me a list of cheap restaurantsTest Utterance
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
42
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
43
Knowledge Graph Construction Syntactic dependency parsing on utterances
ccomp
amoddobjnsubj det
can i have a cheap restaurantcapability expensiveness locale_by_use
Word-based lexical knowledge graph
Slot-based semantic knowledge graph
restaurantcan
have
i
acheap
w
w
capabilitylocale_by_use expensiveness
s
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
44
Dependency-based word embeddings
Dependency-based slot embeddings
Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)
can = have =
expensiveness = capability =
can i have a cheap restaurant
ccomp
amoddobjnsubj det
have acapability expensiveness locale_by_use
ccomp
amoddobjnsubj det
Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
45
Edge Weight Measurement Compute edge weights to represent relation importance
Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings
+
+
w1
w2
w3
w4
w5
w6
w7
s2
s1 s3
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
46
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
Slot Induction
Knowledge Graph Propagation Model119877119908
119878119863
119877119904119878119863
Structure information is integrated to make the self-training data more reliable
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
47
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance hidden semantics
2nd Issue unobserved semantics may benefit understanding
Semantic Decoding [ACL-IJCNLPrsquo15]
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
48
Reasoning with Matrix Factorization
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
9790 9585
93 929805 05
Slot Induction
Feature Model + Knowledge Graph Propagation Model
119877119908119878119863
119877119904119878119863
Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
49
2nd Issue How to model the unobserved hidden semantics
Matrix Factorization (MF) (Rendle et al 2009)
The decomposed matrices represent latent semantics for utterances and wordsslots respectively
The product of two matrices fills the probability of hidden semantics
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
9790 9585
93 929805 05
|119932|
|119934|+|119930|
asymp|119932|times119941 119941times (|119934|+|119930|)times
Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
50
Bayesian Personalized Ranking for MF Model implicit feedback
not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts
Objective
1
119891 +iquest iquest119891 minus119891 minus
The objective is to learn a set of well-ranked semantic slots per utterance
119906119909
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
51
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance
Matrix Factorization SLU (MF-SLU)
MF-SLU can estimate probabilities for slot candidates given test utterances
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
52
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
53
Experimental Setup Dataset Cambridge University SLU Corpus
Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type
Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots
Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
54
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
Approach ASR TranscriptsBaseline
SLUSupport Vector Machine 325 366
Multinomial Logistic Regression 340 388
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
55
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
The MF-SLU effectively models implicit information to decode semantics
The structure information further improves the results
Approach ASR Transcripts
Baseline SLU
Support Vector Machine 325 366Multinomial Logistic Regression 340 388
Proposed MF-SLU
Feature Model 376 453
Feature Model +Knowledge Graph Propagation
435
(+279)534
(+376)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
56
Experiments of Semantic DecodingEffectiveness of Relations
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
In the integrated structure information both semantic and dependency relations are useful for understanding
Approach ASR Transcripts
Feature Model 376 453
Feature + Knowledge Graph Propagation
Semantic 414 516
Dependency 416 490
All 435 (+157) 534 (+179)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Experiments for Structure LearningRelation Discovery Analysis
Discover inter-slot relations connecting important slot pairs
The reference ontology with the most frequent syntactic dependencies
locale_by_use
food expensiveness
seeking
relational_quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMODdesiring
DOBJ
type
food pricerange
DOBJ
AMOD AMOD
AMOD
taskarea
PREP_IN
The automatically learned domain ontology aligns well with the reference one
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57
The data-driven one is more objective while expert-annotated one is more subjective
58
Contributions of Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge
MF-SLU for Semantic Decoding is able to1) unify the automatically
acquired knowledge2) adapt to a domain-
specific setting 3) and then allows
systems to model implicit semantics for better understanding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
41
1st Issue How to adapt generic slots to a domain-specific setting
Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot CandidateTrain
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
Slot Induction
Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication
i like
1 1
capability
1
locale_by_use
food expensiveness
seeking
relational_quantitydesiring
Utterance 1i would like a cheap restaurant
hellip hellip
find a restaurant with chinese foodUtterance 2
show me a list of cheap restaurantsTest Utterance
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
42
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
43
Knowledge Graph Construction Syntactic dependency parsing on utterances
ccomp
amoddobjnsubj det
can i have a cheap restaurantcapability expensiveness locale_by_use
Word-based lexical knowledge graph
Slot-based semantic knowledge graph
restaurantcan
have
i
acheap
w
w
capabilitylocale_by_use expensiveness
s
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
44
Dependency-based word embeddings
Dependency-based slot embeddings
Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)
can = have =
expensiveness = capability =
can i have a cheap restaurant
ccomp
amoddobjnsubj det
have acapability expensiveness locale_by_use
ccomp
amoddobjnsubj det
Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
45
Edge Weight Measurement Compute edge weights to represent relation importance
Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings
+
+
w1
w2
w3
w4
w5
w6
w7
s2
s1 s3
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
46
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
Slot Induction
Knowledge Graph Propagation Model119877119908
119878119863
119877119904119878119863
Structure information is integrated to make the self-training data more reliable
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
47
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance hidden semantics
2nd Issue unobserved semantics may benefit understanding
Semantic Decoding [ACL-IJCNLPrsquo15]
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
48
Reasoning with Matrix Factorization
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
9790 9585
93 929805 05
Slot Induction
Feature Model + Knowledge Graph Propagation Model
119877119908119878119863
119877119904119878119863
Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
49
2nd Issue How to model the unobserved hidden semantics
Matrix Factorization (MF) (Rendle et al 2009)
The decomposed matrices represent latent semantics for utterances and wordsslots respectively
The product of two matrices fills the probability of hidden semantics
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
9790 9585
93 929805 05
|119932|
|119934|+|119930|
asymp|119932|times119941 119941times (|119934|+|119930|)times
Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
50
Bayesian Personalized Ranking for MF Model implicit feedback
not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts
Objective
1
119891 +iquest iquest119891 minus119891 minus
The objective is to learn a set of well-ranked semantic slots per utterance
119906119909
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
51
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance
Matrix Factorization SLU (MF-SLU)
MF-SLU can estimate probabilities for slot candidates given test utterances
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
52
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
53
Experimental Setup Dataset Cambridge University SLU Corpus
Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type
Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots
Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
54
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
Approach ASR TranscriptsBaseline
SLUSupport Vector Machine 325 366
Multinomial Logistic Regression 340 388
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
55
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
The MF-SLU effectively models implicit information to decode semantics
The structure information further improves the results
Approach ASR Transcripts
Baseline SLU
Support Vector Machine 325 366Multinomial Logistic Regression 340 388
Proposed MF-SLU
Feature Model 376 453
Feature Model +Knowledge Graph Propagation
435
(+279)534
(+376)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
56
Experiments of Semantic DecodingEffectiveness of Relations
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
In the integrated structure information both semantic and dependency relations are useful for understanding
Approach ASR Transcripts
Feature Model 376 453
Feature + Knowledge Graph Propagation
Semantic 414 516
Dependency 416 490
All 435 (+157) 534 (+179)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Experiments for Structure LearningRelation Discovery Analysis
Discover inter-slot relations connecting important slot pairs
The reference ontology with the most frequent syntactic dependencies
locale_by_use
food expensiveness
seeking
relational_quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMODdesiring
DOBJ
type
food pricerange
DOBJ
AMOD AMOD
AMOD
taskarea
PREP_IN
The automatically learned domain ontology aligns well with the reference one
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57
The data-driven one is more objective while expert-annotated one is more subjective
58
Contributions of Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge
MF-SLU for Semantic Decoding is able to1) unify the automatically
acquired knowledge2) adapt to a domain-
specific setting 3) and then allows
systems to model implicit semantics for better understanding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
42
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
43
Knowledge Graph Construction Syntactic dependency parsing on utterances
ccomp
amoddobjnsubj det
can i have a cheap restaurantcapability expensiveness locale_by_use
Word-based lexical knowledge graph
Slot-based semantic knowledge graph
restaurantcan
have
i
acheap
w
w
capabilitylocale_by_use expensiveness
s
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
44
Dependency-based word embeddings
Dependency-based slot embeddings
Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)
can = have =
expensiveness = capability =
can i have a cheap restaurant
ccomp
amoddobjnsubj det
have acapability expensiveness locale_by_use
ccomp
amoddobjnsubj det
Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
45
Edge Weight Measurement Compute edge weights to represent relation importance
Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings
+
+
w1
w2
w3
w4
w5
w6
w7
s2
s1 s3
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
46
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
Slot Induction
Knowledge Graph Propagation Model119877119908
119878119863
119877119904119878119863
Structure information is integrated to make the self-training data more reliable
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
47
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance hidden semantics
2nd Issue unobserved semantics may benefit understanding
Semantic Decoding [ACL-IJCNLPrsquo15]
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
48
Reasoning with Matrix Factorization
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
9790 9585
93 929805 05
Slot Induction
Feature Model + Knowledge Graph Propagation Model
119877119908119878119863
119877119904119878119863
Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
49
2nd Issue How to model the unobserved hidden semantics
Matrix Factorization (MF) (Rendle et al 2009)
The decomposed matrices represent latent semantics for utterances and wordsslots respectively
The product of two matrices fills the probability of hidden semantics
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
9790 9585
93 929805 05
|119932|
|119934|+|119930|
asymp|119932|times119941 119941times (|119934|+|119930|)times
Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
50
Bayesian Personalized Ranking for MF Model implicit feedback
not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts
Objective
1
119891 +iquest iquest119891 minus119891 minus
The objective is to learn a set of well-ranked semantic slots per utterance
119906119909
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
51
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance
Matrix Factorization SLU (MF-SLU)
MF-SLU can estimate probabilities for slot candidates given test utterances
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
52
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
53
Experimental Setup Dataset Cambridge University SLU Corpus
Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type
Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots
Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
54
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
Approach ASR TranscriptsBaseline
SLUSupport Vector Machine 325 366
Multinomial Logistic Regression 340 388
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
55
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
The MF-SLU effectively models implicit information to decode semantics
The structure information further improves the results
Approach ASR Transcripts
Baseline SLU
Support Vector Machine 325 366Multinomial Logistic Regression 340 388
Proposed MF-SLU
Feature Model 376 453
Feature Model +Knowledge Graph Propagation
435
(+279)534
(+376)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
56
Experiments of Semantic DecodingEffectiveness of Relations
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
In the integrated structure information both semantic and dependency relations are useful for understanding
Approach ASR Transcripts
Feature Model 376 453
Feature + Knowledge Graph Propagation
Semantic 414 516
Dependency 416 490
All 435 (+157) 534 (+179)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Experiments for Structure LearningRelation Discovery Analysis
Discover inter-slot relations connecting important slot pairs
The reference ontology with the most frequent syntactic dependencies
locale_by_use
food expensiveness
seeking
relational_quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMODdesiring
DOBJ
type
food pricerange
DOBJ
AMOD AMOD
AMOD
taskarea
PREP_IN
The automatically learned domain ontology aligns well with the reference one
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57
The data-driven one is more objective while expert-annotated one is more subjective
58
Contributions of Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge
MF-SLU for Semantic Decoding is able to1) unify the automatically
acquired knowledge2) adapt to a domain-
specific setting 3) and then allows
systems to model implicit semantics for better understanding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
43
Knowledge Graph Construction Syntactic dependency parsing on utterances
ccomp
amoddobjnsubj det
can i have a cheap restaurantcapability expensiveness locale_by_use
Word-based lexical knowledge graph
Slot-based semantic knowledge graph
restaurantcan
have
i
acheap
w
w
capabilitylocale_by_use expensiveness
s
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
44
Dependency-based word embeddings
Dependency-based slot embeddings
Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)
can = have =
expensiveness = capability =
can i have a cheap restaurant
ccomp
amoddobjnsubj det
have acapability expensiveness locale_by_use
ccomp
amoddobjnsubj det
Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
45
Edge Weight Measurement Compute edge weights to represent relation importance
Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings
+
+
w1
w2
w3
w4
w5
w6
w7
s2
s1 s3
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
46
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
Slot Induction
Knowledge Graph Propagation Model119877119908
119878119863
119877119904119878119863
Structure information is integrated to make the self-training data more reliable
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
47
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance hidden semantics
2nd Issue unobserved semantics may benefit understanding
Semantic Decoding [ACL-IJCNLPrsquo15]
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
48
Reasoning with Matrix Factorization
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
9790 9585
93 929805 05
Slot Induction
Feature Model + Knowledge Graph Propagation Model
119877119908119878119863
119877119904119878119863
Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
49
2nd Issue How to model the unobserved hidden semantics
Matrix Factorization (MF) (Rendle et al 2009)
The decomposed matrices represent latent semantics for utterances and wordsslots respectively
The product of two matrices fills the probability of hidden semantics
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
9790 9585
93 929805 05
|119932|
|119934|+|119930|
asymp|119932|times119941 119941times (|119934|+|119930|)times
Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
50
Bayesian Personalized Ranking for MF Model implicit feedback
not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts
Objective
1
119891 +iquest iquest119891 minus119891 minus
The objective is to learn a set of well-ranked semantic slots per utterance
119906119909
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
51
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance
Matrix Factorization SLU (MF-SLU)
MF-SLU can estimate probabilities for slot candidates given test utterances
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
52
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
53
Experimental Setup Dataset Cambridge University SLU Corpus
Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type
Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots
Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
54
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
Approach ASR TranscriptsBaseline
SLUSupport Vector Machine 325 366
Multinomial Logistic Regression 340 388
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
55
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
The MF-SLU effectively models implicit information to decode semantics
The structure information further improves the results
Approach ASR Transcripts
Baseline SLU
Support Vector Machine 325 366Multinomial Logistic Regression 340 388
Proposed MF-SLU
Feature Model 376 453
Feature Model +Knowledge Graph Propagation
435
(+279)534
(+376)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
56
Experiments of Semantic DecodingEffectiveness of Relations
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
In the integrated structure information both semantic and dependency relations are useful for understanding
Approach ASR Transcripts
Feature Model 376 453
Feature + Knowledge Graph Propagation
Semantic 414 516
Dependency 416 490
All 435 (+157) 534 (+179)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Experiments for Structure LearningRelation Discovery Analysis
Discover inter-slot relations connecting important slot pairs
The reference ontology with the most frequent syntactic dependencies
locale_by_use
food expensiveness
seeking
relational_quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMODdesiring
DOBJ
type
food pricerange
DOBJ
AMOD AMOD
AMOD
taskarea
PREP_IN
The automatically learned domain ontology aligns well with the reference one
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57
The data-driven one is more objective while expert-annotated one is more subjective
58
Contributions of Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge
MF-SLU for Semantic Decoding is able to1) unify the automatically
acquired knowledge2) adapt to a domain-
specific setting 3) and then allows
systems to model implicit semantics for better understanding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
44
Dependency-based word embeddings
Dependency-based slot embeddings
Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)
can = have =
expensiveness = capability =
can i have a cheap restaurant
ccomp
amoddobjnsubj det
have acapability expensiveness locale_by_use
ccomp
amoddobjnsubj det
Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
45
Edge Weight Measurement Compute edge weights to represent relation importance
Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings
+
+
w1
w2
w3
w4
w5
w6
w7
s2
s1 s3
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
46
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
Slot Induction
Knowledge Graph Propagation Model119877119908
119878119863
119877119904119878119863
Structure information is integrated to make the self-training data more reliable
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
47
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance hidden semantics
2nd Issue unobserved semantics may benefit understanding
Semantic Decoding [ACL-IJCNLPrsquo15]
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
48
Reasoning with Matrix Factorization
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
9790 9585
93 929805 05
Slot Induction
Feature Model + Knowledge Graph Propagation Model
119877119908119878119863
119877119904119878119863
Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
49
2nd Issue How to model the unobserved hidden semantics
Matrix Factorization (MF) (Rendle et al 2009)
The decomposed matrices represent latent semantics for utterances and wordsslots respectively
The product of two matrices fills the probability of hidden semantics
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
9790 9585
93 929805 05
|119932|
|119934|+|119930|
asymp|119932|times119941 119941times (|119934|+|119930|)times
Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
50
Bayesian Personalized Ranking for MF Model implicit feedback
not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts
Objective
1
119891 +iquest iquest119891 minus119891 minus
The objective is to learn a set of well-ranked semantic slots per utterance
119906119909
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
51
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance
Matrix Factorization SLU (MF-SLU)
MF-SLU can estimate probabilities for slot candidates given test utterances
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
52
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
53
Experimental Setup Dataset Cambridge University SLU Corpus
Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type
Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots
Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
54
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
Approach ASR TranscriptsBaseline
SLUSupport Vector Machine 325 366
Multinomial Logistic Regression 340 388
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
55
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
The MF-SLU effectively models implicit information to decode semantics
The structure information further improves the results
Approach ASR Transcripts
Baseline SLU
Support Vector Machine 325 366Multinomial Logistic Regression 340 388
Proposed MF-SLU
Feature Model 376 453
Feature Model +Knowledge Graph Propagation
435
(+279)534
(+376)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
56
Experiments of Semantic DecodingEffectiveness of Relations
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
In the integrated structure information both semantic and dependency relations are useful for understanding
Approach ASR Transcripts
Feature Model 376 453
Feature + Knowledge Graph Propagation
Semantic 414 516
Dependency 416 490
All 435 (+157) 534 (+179)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Experiments for Structure LearningRelation Discovery Analysis
Discover inter-slot relations connecting important slot pairs
The reference ontology with the most frequent syntactic dependencies
locale_by_use
food expensiveness
seeking
relational_quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMODdesiring
DOBJ
type
food pricerange
DOBJ
AMOD AMOD
AMOD
taskarea
PREP_IN
The automatically learned domain ontology aligns well with the reference one
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57
The data-driven one is more objective while expert-annotated one is more subjective
58
Contributions of Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge
MF-SLU for Semantic Decoding is able to1) unify the automatically
acquired knowledge2) adapt to a domain-
specific setting 3) and then allows
systems to model implicit semantics for better understanding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
45
Edge Weight Measurement Compute edge weights to represent relation importance
Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings
+
+
w1
w2
w3
w4
w5
w6
w7
s2
s1 s3
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
46
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
Slot Induction
Knowledge Graph Propagation Model119877119908
119878119863
119877119904119878119863
Structure information is integrated to make the self-training data more reliable
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
47
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance hidden semantics
2nd Issue unobserved semantics may benefit understanding
Semantic Decoding [ACL-IJCNLPrsquo15]
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
48
Reasoning with Matrix Factorization
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
9790 9585
93 929805 05
Slot Induction
Feature Model + Knowledge Graph Propagation Model
119877119908119878119863
119877119904119878119863
Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
49
2nd Issue How to model the unobserved hidden semantics
Matrix Factorization (MF) (Rendle et al 2009)
The decomposed matrices represent latent semantics for utterances and wordsslots respectively
The product of two matrices fills the probability of hidden semantics
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
9790 9585
93 929805 05
|119932|
|119934|+|119930|
asymp|119932|times119941 119941times (|119934|+|119930|)times
Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
50
Bayesian Personalized Ranking for MF Model implicit feedback
not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts
Objective
1
119891 +iquest iquest119891 minus119891 minus
The objective is to learn a set of well-ranked semantic slots per utterance
119906119909
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
51
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance
Matrix Factorization SLU (MF-SLU)
MF-SLU can estimate probabilities for slot candidates given test utterances
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
52
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
53
Experimental Setup Dataset Cambridge University SLU Corpus
Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type
Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots
Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
54
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
Approach ASR TranscriptsBaseline
SLUSupport Vector Machine 325 366
Multinomial Logistic Regression 340 388
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
55
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
The MF-SLU effectively models implicit information to decode semantics
The structure information further improves the results
Approach ASR Transcripts
Baseline SLU
Support Vector Machine 325 366Multinomial Logistic Regression 340 388
Proposed MF-SLU
Feature Model 376 453
Feature Model +Knowledge Graph Propagation
435
(+279)534
(+376)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
56
Experiments of Semantic DecodingEffectiveness of Relations
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
In the integrated structure information both semantic and dependency relations are useful for understanding
Approach ASR Transcripts
Feature Model 376 453
Feature + Knowledge Graph Propagation
Semantic 414 516
Dependency 416 490
All 435 (+157) 534 (+179)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Experiments for Structure LearningRelation Discovery Analysis
Discover inter-slot relations connecting important slot pairs
The reference ontology with the most frequent syntactic dependencies
locale_by_use
food expensiveness
seeking
relational_quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMODdesiring
DOBJ
type
food pricerange
DOBJ
AMOD AMOD
AMOD
taskarea
PREP_IN
The automatically learned domain ontology aligns well with the reference one
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57
The data-driven one is more objective while expert-annotated one is more subjective
58
Contributions of Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge
MF-SLU for Semantic Decoding is able to1) unify the automatically
acquired knowledge2) adapt to a domain-
specific setting 3) and then allows
systems to model implicit semantics for better understanding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
46
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
Slot Induction
Knowledge Graph Propagation Model119877119908
119878119863
119877119904119878119863
Structure information is integrated to make the self-training data more reliable
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
47
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance hidden semantics
2nd Issue unobserved semantics may benefit understanding
Semantic Decoding [ACL-IJCNLPrsquo15]
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
48
Reasoning with Matrix Factorization
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
9790 9585
93 929805 05
Slot Induction
Feature Model + Knowledge Graph Propagation Model
119877119908119878119863
119877119904119878119863
Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
49
2nd Issue How to model the unobserved hidden semantics
Matrix Factorization (MF) (Rendle et al 2009)
The decomposed matrices represent latent semantics for utterances and wordsslots respectively
The product of two matrices fills the probability of hidden semantics
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
9790 9585
93 929805 05
|119932|
|119934|+|119930|
asymp|119932|times119941 119941times (|119934|+|119930|)times
Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
50
Bayesian Personalized Ranking for MF Model implicit feedback
not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts
Objective
1
119891 +iquest iquest119891 minus119891 minus
The objective is to learn a set of well-ranked semantic slots per utterance
119906119909
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
51
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance
Matrix Factorization SLU (MF-SLU)
MF-SLU can estimate probabilities for slot candidates given test utterances
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
52
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
53
Experimental Setup Dataset Cambridge University SLU Corpus
Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type
Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots
Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
54
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
Approach ASR TranscriptsBaseline
SLUSupport Vector Machine 325 366
Multinomial Logistic Regression 340 388
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
55
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
The MF-SLU effectively models implicit information to decode semantics
The structure information further improves the results
Approach ASR Transcripts
Baseline SLU
Support Vector Machine 325 366Multinomial Logistic Regression 340 388
Proposed MF-SLU
Feature Model 376 453
Feature Model +Knowledge Graph Propagation
435
(+279)534
(+376)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
56
Experiments of Semantic DecodingEffectiveness of Relations
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
In the integrated structure information both semantic and dependency relations are useful for understanding
Approach ASR Transcripts
Feature Model 376 453
Feature + Knowledge Graph Propagation
Semantic 414 516
Dependency 416 490
All 435 (+157) 534 (+179)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Experiments for Structure LearningRelation Discovery Analysis
Discover inter-slot relations connecting important slot pairs
The reference ontology with the most frequent syntactic dependencies
locale_by_use
food expensiveness
seeking
relational_quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMODdesiring
DOBJ
type
food pricerange
DOBJ
AMOD AMOD
AMOD
taskarea
PREP_IN
The automatically learned domain ontology aligns well with the reference one
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57
The data-driven one is more objective while expert-annotated one is more subjective
58
Contributions of Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge
MF-SLU for Semantic Decoding is able to1) unify the automatically
acquired knowledge2) adapt to a domain-
specific setting 3) and then allows
systems to model implicit semantics for better understanding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
47
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance hidden semantics
2nd Issue unobserved semantics may benefit understanding
Semantic Decoding [ACL-IJCNLPrsquo15]
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
48
Reasoning with Matrix Factorization
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
9790 9585
93 929805 05
Slot Induction
Feature Model + Knowledge Graph Propagation Model
119877119908119878119863
119877119904119878119863
Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
49
2nd Issue How to model the unobserved hidden semantics
Matrix Factorization (MF) (Rendle et al 2009)
The decomposed matrices represent latent semantics for utterances and wordsslots respectively
The product of two matrices fills the probability of hidden semantics
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
9790 9585
93 929805 05
|119932|
|119934|+|119930|
asymp|119932|times119941 119941times (|119934|+|119930|)times
Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
50
Bayesian Personalized Ranking for MF Model implicit feedback
not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts
Objective
1
119891 +iquest iquest119891 minus119891 minus
The objective is to learn a set of well-ranked semantic slots per utterance
119906119909
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
51
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance
Matrix Factorization SLU (MF-SLU)
MF-SLU can estimate probabilities for slot candidates given test utterances
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
52
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
53
Experimental Setup Dataset Cambridge University SLU Corpus
Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type
Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots
Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
54
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
Approach ASR TranscriptsBaseline
SLUSupport Vector Machine 325 366
Multinomial Logistic Regression 340 388
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
55
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
The MF-SLU effectively models implicit information to decode semantics
The structure information further improves the results
Approach ASR Transcripts
Baseline SLU
Support Vector Machine 325 366Multinomial Logistic Regression 340 388
Proposed MF-SLU
Feature Model 376 453
Feature Model +Knowledge Graph Propagation
435
(+279)534
(+376)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
56
Experiments of Semantic DecodingEffectiveness of Relations
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
In the integrated structure information both semantic and dependency relations are useful for understanding
Approach ASR Transcripts
Feature Model 376 453
Feature + Knowledge Graph Propagation
Semantic 414 516
Dependency 416 490
All 435 (+157) 534 (+179)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Experiments for Structure LearningRelation Discovery Analysis
Discover inter-slot relations connecting important slot pairs
The reference ontology with the most frequent syntactic dependencies
locale_by_use
food expensiveness
seeking
relational_quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMODdesiring
DOBJ
type
food pricerange
DOBJ
AMOD AMOD
AMOD
taskarea
PREP_IN
The automatically learned domain ontology aligns well with the reference one
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57
The data-driven one is more objective while expert-annotated one is more subjective
58
Contributions of Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge
MF-SLU for Semantic Decoding is able to1) unify the automatically
acquired knowledge2) adapt to a domain-
specific setting 3) and then allows
systems to model implicit semantics for better understanding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
48
Reasoning with Matrix Factorization
Word Relation Model Slot Relation Model
word relation matrix
slot relation matrix
times
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test1
1
9790 9585
93 929805 05
Slot Induction
Feature Model + Knowledge Graph Propagation Model
119877119908119878119863
119877119904119878119863
Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
49
2nd Issue How to model the unobserved hidden semantics
Matrix Factorization (MF) (Rendle et al 2009)
The decomposed matrices represent latent semantics for utterances and wordsslots respectively
The product of two matrices fills the probability of hidden semantics
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
9790 9585
93 929805 05
|119932|
|119934|+|119930|
asymp|119932|times119941 119941times (|119934|+|119930|)times
Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
50
Bayesian Personalized Ranking for MF Model implicit feedback
not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts
Objective
1
119891 +iquest iquest119891 minus119891 minus
The objective is to learn a set of well-ranked semantic slots per utterance
119906119909
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
51
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance
Matrix Factorization SLU (MF-SLU)
MF-SLU can estimate probabilities for slot candidates given test utterances
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
52
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
53
Experimental Setup Dataset Cambridge University SLU Corpus
Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type
Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots
Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
54
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
Approach ASR TranscriptsBaseline
SLUSupport Vector Machine 325 366
Multinomial Logistic Regression 340 388
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
55
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
The MF-SLU effectively models implicit information to decode semantics
The structure information further improves the results
Approach ASR Transcripts
Baseline SLU
Support Vector Machine 325 366Multinomial Logistic Regression 340 388
Proposed MF-SLU
Feature Model 376 453
Feature Model +Knowledge Graph Propagation
435
(+279)534
(+376)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
56
Experiments of Semantic DecodingEffectiveness of Relations
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
In the integrated structure information both semantic and dependency relations are useful for understanding
Approach ASR Transcripts
Feature Model 376 453
Feature + Knowledge Graph Propagation
Semantic 414 516
Dependency 416 490
All 435 (+157) 534 (+179)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Experiments for Structure LearningRelation Discovery Analysis
Discover inter-slot relations connecting important slot pairs
The reference ontology with the most frequent syntactic dependencies
locale_by_use
food expensiveness
seeking
relational_quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMODdesiring
DOBJ
type
food pricerange
DOBJ
AMOD AMOD
AMOD
taskarea
PREP_IN
The automatically learned domain ontology aligns well with the reference one
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57
The data-driven one is more objective while expert-annotated one is more subjective
58
Contributions of Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge
MF-SLU for Semantic Decoding is able to1) unify the automatically
acquired knowledge2) adapt to a domain-
specific setting 3) and then allows
systems to model implicit semantics for better understanding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
49
2nd Issue How to model the unobserved hidden semantics
Matrix Factorization (MF) (Rendle et al 2009)
The decomposed matrices represent latent semantics for utterances and wordsslots respectively
The product of two matrices fills the probability of hidden semantics
1
Word Observation Slot Candidate
Train
cheap restaurant foodexpensiveness
1
locale_by_use
11
1 1
food
1 1
1 Test
1
1
9790 9585
93 929805 05
|119932|
|119934|+|119930|
asymp|119932|times119941 119941times (|119934|+|119930|)times
Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
50
Bayesian Personalized Ranking for MF Model implicit feedback
not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts
Objective
1
119891 +iquest iquest119891 minus119891 minus
The objective is to learn a set of well-ranked semantic slots per utterance
119906119909
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
51
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance
Matrix Factorization SLU (MF-SLU)
MF-SLU can estimate probabilities for slot candidates given test utterances
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
52
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
53
Experimental Setup Dataset Cambridge University SLU Corpus
Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type
Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots
Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
54
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
Approach ASR TranscriptsBaseline
SLUSupport Vector Machine 325 366
Multinomial Logistic Regression 340 388
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
55
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
The MF-SLU effectively models implicit information to decode semantics
The structure information further improves the results
Approach ASR Transcripts
Baseline SLU
Support Vector Machine 325 366Multinomial Logistic Regression 340 388
Proposed MF-SLU
Feature Model 376 453
Feature Model +Knowledge Graph Propagation
435
(+279)534
(+376)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
56
Experiments of Semantic DecodingEffectiveness of Relations
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
In the integrated structure information both semantic and dependency relations are useful for understanding
Approach ASR Transcripts
Feature Model 376 453
Feature + Knowledge Graph Propagation
Semantic 414 516
Dependency 416 490
All 435 (+157) 534 (+179)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Experiments for Structure LearningRelation Discovery Analysis
Discover inter-slot relations connecting important slot pairs
The reference ontology with the most frequent syntactic dependencies
locale_by_use
food expensiveness
seeking
relational_quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMODdesiring
DOBJ
type
food pricerange
DOBJ
AMOD AMOD
AMOD
taskarea
PREP_IN
The automatically learned domain ontology aligns well with the reference one
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57
The data-driven one is more objective while expert-annotated one is more subjective
58
Contributions of Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge
MF-SLU for Semantic Decoding is able to1) unify the automatically
acquired knowledge2) adapt to a domain-
specific setting 3) and then allows
systems to model implicit semantics for better understanding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
50
Bayesian Personalized Ranking for MF Model implicit feedback
not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts
Objective
1
119891 +iquest iquest119891 minus119891 minus
The objective is to learn a set of well-ranked semantic slots per utterance
119906119909
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
51
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance
Matrix Factorization SLU (MF-SLU)
MF-SLU can estimate probabilities for slot candidates given test utterances
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
52
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
53
Experimental Setup Dataset Cambridge University SLU Corpus
Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type
Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots
Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
54
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
Approach ASR TranscriptsBaseline
SLUSupport Vector Machine 325 366
Multinomial Logistic Regression 340 388
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
55
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
The MF-SLU effectively models implicit information to decode semantics
The structure information further improves the results
Approach ASR Transcripts
Baseline SLU
Support Vector Machine 325 366Multinomial Logistic Regression 340 388
Proposed MF-SLU
Feature Model 376 453
Feature Model +Knowledge Graph Propagation
435
(+279)534
(+376)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
56
Experiments of Semantic DecodingEffectiveness of Relations
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
In the integrated structure information both semantic and dependency relations are useful for understanding
Approach ASR Transcripts
Feature Model 376 453
Feature + Knowledge Graph Propagation
Semantic 414 516
Dependency 416 490
All 435 (+157) 534 (+179)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Experiments for Structure LearningRelation Discovery Analysis
Discover inter-slot relations connecting important slot pairs
The reference ontology with the most frequent syntactic dependencies
locale_by_use
food expensiveness
seeking
relational_quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMODdesiring
DOBJ
type
food pricerange
DOBJ
AMOD AMOD
AMOD
taskarea
PREP_IN
The automatically learned domain ontology aligns well with the reference one
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57
The data-driven one is more objective while expert-annotated one is more subjective
58
Contributions of Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge
MF-SLU for Semantic Decoding is able to1) unify the automatically
acquired knowledge2) adapt to a domain-
specific setting 3) and then allows
systems to model implicit semantics for better understanding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
51
Ontology Induction
SLUFw Fs
Structure Learning
times
1
Utterance 1i would like a cheap restaurant
Word Observation Slot Candidate
Train
hellip
cheap restaurant foodexpensiveness
1
locale_by_use
11
find a restaurant with chinese foodUtterance 2
1 1
food
1 1
1
Test1 9790 9585
Ontology Induction
show me a list of cheap restaurantsTest Utterance
Matrix Factorization SLU (MF-SLU)
MF-SLU can estimate probabilities for slot candidates given test utterances
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
52
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
53
Experimental Setup Dataset Cambridge University SLU Corpus
Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type
Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots
Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
54
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
Approach ASR TranscriptsBaseline
SLUSupport Vector Machine 325 366
Multinomial Logistic Regression 340 388
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
55
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
The MF-SLU effectively models implicit information to decode semantics
The structure information further improves the results
Approach ASR Transcripts
Baseline SLU
Support Vector Machine 325 366Multinomial Logistic Regression 340 388
Proposed MF-SLU
Feature Model 376 453
Feature Model +Knowledge Graph Propagation
435
(+279)534
(+376)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
56
Experiments of Semantic DecodingEffectiveness of Relations
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
In the integrated structure information both semantic and dependency relations are useful for understanding
Approach ASR Transcripts
Feature Model 376 453
Feature + Knowledge Graph Propagation
Semantic 414 516
Dependency 416 490
All 435 (+157) 534 (+179)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Experiments for Structure LearningRelation Discovery Analysis
Discover inter-slot relations connecting important slot pairs
The reference ontology with the most frequent syntactic dependencies
locale_by_use
food expensiveness
seeking
relational_quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMODdesiring
DOBJ
type
food pricerange
DOBJ
AMOD AMOD
AMOD
taskarea
PREP_IN
The automatically learned domain ontology aligns well with the reference one
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57
The data-driven one is more objective while expert-annotated one is more subjective
58
Contributions of Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge
MF-SLU for Semantic Decoding is able to1) unify the automatically
acquired knowledge2) adapt to a domain-
specific setting 3) and then allows
systems to model implicit semantics for better understanding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
52
Semantic Decoding [ACL-IJCNLPrsquo15]
Input user utterances
Output semantic concepts included in each individual utterance
Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015
SLU Model
target=ldquorestaurantrdquoprice=ldquocheaprdquo
ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing
Unlabeled Collection
Semantic KG
Ontology InductionFw Fs
Feature Model
Rw
Rs
Knowledge Graph Propagation Model
Word Relation Model
Lexical KG
Slot Relation Model
Structure Learning
times
Semantic KG
MF-SLU SLU Modeling by Matrix Factorization
Semantic Representation
Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
53
Experimental Setup Dataset Cambridge University SLU Corpus
Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type
Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots
Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
54
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
Approach ASR TranscriptsBaseline
SLUSupport Vector Machine 325 366
Multinomial Logistic Regression 340 388
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
55
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
The MF-SLU effectively models implicit information to decode semantics
The structure information further improves the results
Approach ASR Transcripts
Baseline SLU
Support Vector Machine 325 366Multinomial Logistic Regression 340 388
Proposed MF-SLU
Feature Model 376 453
Feature Model +Knowledge Graph Propagation
435
(+279)534
(+376)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
56
Experiments of Semantic DecodingEffectiveness of Relations
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
In the integrated structure information both semantic and dependency relations are useful for understanding
Approach ASR Transcripts
Feature Model 376 453
Feature + Knowledge Graph Propagation
Semantic 414 516
Dependency 416 490
All 435 (+157) 534 (+179)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Experiments for Structure LearningRelation Discovery Analysis
Discover inter-slot relations connecting important slot pairs
The reference ontology with the most frequent syntactic dependencies
locale_by_use
food expensiveness
seeking
relational_quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMODdesiring
DOBJ
type
food pricerange
DOBJ
AMOD AMOD
AMOD
taskarea
PREP_IN
The automatically learned domain ontology aligns well with the reference one
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57
The data-driven one is more objective while expert-annotated one is more subjective
58
Contributions of Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge
MF-SLU for Semantic Decoding is able to1) unify the automatically
acquired knowledge2) adapt to a domain-
specific setting 3) and then allows
systems to model implicit semantics for better understanding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
53
Experimental Setup Dataset Cambridge University SLU Corpus
Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type
Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots
Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
54
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
Approach ASR TranscriptsBaseline
SLUSupport Vector Machine 325 366
Multinomial Logistic Regression 340 388
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
55
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
The MF-SLU effectively models implicit information to decode semantics
The structure information further improves the results
Approach ASR Transcripts
Baseline SLU
Support Vector Machine 325 366Multinomial Logistic Regression 340 388
Proposed MF-SLU
Feature Model 376 453
Feature Model +Knowledge Graph Propagation
435
(+279)534
(+376)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
56
Experiments of Semantic DecodingEffectiveness of Relations
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
In the integrated structure information both semantic and dependency relations are useful for understanding
Approach ASR Transcripts
Feature Model 376 453
Feature + Knowledge Graph Propagation
Semantic 414 516
Dependency 416 490
All 435 (+157) 534 (+179)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Experiments for Structure LearningRelation Discovery Analysis
Discover inter-slot relations connecting important slot pairs
The reference ontology with the most frequent syntactic dependencies
locale_by_use
food expensiveness
seeking
relational_quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMODdesiring
DOBJ
type
food pricerange
DOBJ
AMOD AMOD
AMOD
taskarea
PREP_IN
The automatically learned domain ontology aligns well with the reference one
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57
The data-driven one is more objective while expert-annotated one is more subjective
58
Contributions of Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge
MF-SLU for Semantic Decoding is able to1) unify the automatically
acquired knowledge2) adapt to a domain-
specific setting 3) and then allows
systems to model implicit semantics for better understanding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
54
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
Approach ASR TranscriptsBaseline
SLUSupport Vector Machine 325 366
Multinomial Logistic Regression 340 388
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
55
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
The MF-SLU effectively models implicit information to decode semantics
The structure information further improves the results
Approach ASR Transcripts
Baseline SLU
Support Vector Machine 325 366Multinomial Logistic Regression 340 388
Proposed MF-SLU
Feature Model 376 453
Feature Model +Knowledge Graph Propagation
435
(+279)534
(+376)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
56
Experiments of Semantic DecodingEffectiveness of Relations
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
In the integrated structure information both semantic and dependency relations are useful for understanding
Approach ASR Transcripts
Feature Model 376 453
Feature + Knowledge Graph Propagation
Semantic 414 516
Dependency 416 490
All 435 (+157) 534 (+179)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Experiments for Structure LearningRelation Discovery Analysis
Discover inter-slot relations connecting important slot pairs
The reference ontology with the most frequent syntactic dependencies
locale_by_use
food expensiveness
seeking
relational_quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMODdesiring
DOBJ
type
food pricerange
DOBJ
AMOD AMOD
AMOD
taskarea
PREP_IN
The automatically learned domain ontology aligns well with the reference one
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57
The data-driven one is more objective while expert-annotated one is more subjective
58
Contributions of Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge
MF-SLU for Semantic Decoding is able to1) unify the automatically
acquired knowledge2) adapt to a domain-
specific setting 3) and then allows
systems to model implicit semantics for better understanding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
55
Experiments of Semantic DecodingQuality of Semantics Estimation
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
The MF-SLU effectively models implicit information to decode semantics
The structure information further improves the results
Approach ASR Transcripts
Baseline SLU
Support Vector Machine 325 366Multinomial Logistic Regression 340 388
Proposed MF-SLU
Feature Model 376 453
Feature Model +Knowledge Graph Propagation
435
(+279)534
(+376)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
56
Experiments of Semantic DecodingEffectiveness of Relations
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
In the integrated structure information both semantic and dependency relations are useful for understanding
Approach ASR Transcripts
Feature Model 376 453
Feature + Knowledge Graph Propagation
Semantic 414 516
Dependency 416 490
All 435 (+157) 534 (+179)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Experiments for Structure LearningRelation Discovery Analysis
Discover inter-slot relations connecting important slot pairs
The reference ontology with the most frequent syntactic dependencies
locale_by_use
food expensiveness
seeking
relational_quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMODdesiring
DOBJ
type
food pricerange
DOBJ
AMOD AMOD
AMOD
taskarea
PREP_IN
The automatically learned domain ontology aligns well with the reference one
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57
The data-driven one is more objective while expert-annotated one is more subjective
58
Contributions of Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge
MF-SLU for Semantic Decoding is able to1) unify the automatically
acquired knowledge2) adapt to a domain-
specific setting 3) and then allows
systems to model implicit semantics for better understanding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
56
Experiments of Semantic DecodingEffectiveness of Relations
Dataset Cambridge University SLU Corpus
Metric MAP of all estimated slot probabilities for all utterances
In the integrated structure information both semantic and dependency relations are useful for understanding
Approach ASR Transcripts
Feature Model 376 453
Feature + Knowledge Graph Propagation
Semantic 414 516
Dependency 416 490
All 435 (+157) 534 (+179)
the result is significantly better than the MLR with p lt 005 in t-test
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Experiments for Structure LearningRelation Discovery Analysis
Discover inter-slot relations connecting important slot pairs
The reference ontology with the most frequent syntactic dependencies
locale_by_use
food expensiveness
seeking
relational_quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMODdesiring
DOBJ
type
food pricerange
DOBJ
AMOD AMOD
AMOD
taskarea
PREP_IN
The automatically learned domain ontology aligns well with the reference one
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57
The data-driven one is more objective while expert-annotated one is more subjective
58
Contributions of Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge
MF-SLU for Semantic Decoding is able to1) unify the automatically
acquired knowledge2) adapt to a domain-
specific setting 3) and then allows
systems to model implicit semantics for better understanding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
Experiments for Structure LearningRelation Discovery Analysis
Discover inter-slot relations connecting important slot pairs
The reference ontology with the most frequent syntactic dependencies
locale_by_use
food expensiveness
seeking
relational_quantity
PREP_FOR
PREP_FOR
NN AMOD
AMOD
AMODdesiring
DOBJ
type
food pricerange
DOBJ
AMOD AMOD
AMOD
taskarea
PREP_IN
The automatically learned domain ontology aligns well with the reference one
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57
The data-driven one is more objective while expert-annotated one is more subjective
58
Contributions of Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge
MF-SLU for Semantic Decoding is able to1) unify the automatically
acquired knowledge2) adapt to a domain-
specific setting 3) and then allows
systems to model implicit semantics for better understanding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
58
Contributions of Semantic Decoding
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge
MF-SLU for Semantic Decoding is able to1) unify the automatically
acquired knowledge2) adapt to a domain-
specific setting 3) and then allows
systems to model implicit semantics for better understanding
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
59
Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)
The follow-up behaviors usually correspond to user intents
price=ldquocheaprdquo target=ldquorestaurantrdquo
SLU Model
ldquocan i have a cheap restaurantrdquo
intent=navigation
restaurant=ldquolegumerdquo time=ldquotonightrdquo
SLU Model
ldquoi plan to dine in legume tonightrdquo
intent=reservation
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
60
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
SDS Flowchart ndash Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
61
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
62
[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]
Input spoken utterances for making requests about launching an app
Output the apps supporting the required functionality
Intent Identification popular domains in Google Play
please dial a phone call to alex
Skype Hangout etc
Intent Prediction of Mobile Apps [SLTrsquo14c]
Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
63
Input single-turn request
Output apps that are able to support the required functionality
Intent Prediction ndash Single-Turn Request
1
Enriched Semantics
communication
90
1
1
Utterance 1 i would like to contact alex
Word Observation Intended App
hellip hellip
contact message Gmail Outlook Skypeemail
Test
90
Reasoning with Feature-Enriched MF
Train
hellip your email calendar contactshellip
hellip check and send emails msgs hellip
Outlook
Gmail
IR for app candidates
App Desc
Self-Train Utterance
Test Utterance
1
1
1
1
1
1
1
1 1
1
1 90 85 97 95
FeatureEnrichment
Utterance 1 i would like to contact alexhellip
1
1
The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
64
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
Challenge language ambiguity1) User preference2) App-level contexts
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
send to vivianvs
Email MessageCommunication
Idea Behavioral patterns in history can help intent prediction
previous turn
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
65
Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
Input multi-turn interaction
Output apps the user plans to launch
1
Lexical Intended Appphoto check camera IMtell
take this phototell vivian this is me in the lab
CAMERA
IMTrainDialogue
check my grades on websitesend an email to professor
hellip
CHROME
send
Behavior History
null camera
85
take a photo of thissend it to alice
CAMERA
IM
hellip
1
1
1 1
1
1 70
chrome
1
1
1
1
1
1
chrome email
11
1
1
95
80 55
User UtteranceIntended
App
Reasoning with Feature-Enriched MF
Test Dialogue
take a photo of thissend it to alicehellip
Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom
The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
66
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 261
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 555
LM-Based IR Model (unsupervised)
Multinomial Logistic Regression (supervised)
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
67
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)
Modeling hidden semantics helps intent prediction especially for noisy data
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
68
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566
Semantic enrichment provides rich cues to improve performance
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
69
Single-Turn Request Mean Average Precision (MAP)
Multi-Turn Interaction Mean Average Precision (MAP)
Feature MatrixASR Transcripts
LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)
Feature MatrixASR Transcripts
MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)
Intent prediction can benefit from both hidden information and low-level semantics
Experiments for Intent Prediction
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
70
OntologyInduction
Structure Learning
Semantic Decoding
Intent Prediction
Knowledge Acquisition
SLU Modeling
Contributions of Intent Prediction Feature-Enriched MF-SLU for
Intent Prediction is able to1) unify the knowledge at
different levels2) learn inference relations
between various features
3) and create personalized models by leveraging contextual behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
71
Personal Intelligent Architecture
Reactive Assistance
ASR LU Dialog LG TTS
Proactive Assistance
Inferences User Modeling Suggestions
Data Back-end Data
Bases Services and Client Signals
DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)
User Experienceldquocall taxirdquo
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
72
Outline Intelligent Assistant
What are they Why do we need them Why do companies care
Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions
Semantic Decoding
Intent Prediction
Conclusions amp Future Work
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
73
Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs
The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies
The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding
Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
74
Future Work Apply the proposed technology to domain discovery
not covered by the current systems but users are interested in guide the next developed domains
Improve the proposed approach by handling the uncertainty
SLUSLUModelingASR Knowledge
Acquisitionrecognition
errorsunreliable knowledge
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
75
d d d
U S1 S2
P(S1 | U) P(S2 | U)
hellip
Semantic RelationPosterior Probability
Utterance
Slot Candidate
hellip
w1 w2 wdWord Sequence x
Word Vector lw
Pooling Operation
R(U S1) R(U S2)
Knowledge Graph Propagation Matrix Wp
Semantic Projection Matrix Ws
Semantic Layer y
Knowledge Graph Propagation Layer lp
d
Sn
P(Sn | U)
Utterance Vector lf
hellip
R(U Sn)
Slot Vector lf
Convolution Matrix Wc
Convolutional Layer lc
Towards Unsupervised Deep Learning
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
76
Take Home Message Available big data wo annotations
Challenge how to acquire and organize important knowledge and further utilize it for applications
Language understanding for AI
language action understand voice to control music lights etc teach to let friends in by face recognition etc
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
Unsupervised or weakly-supervised methods will be the future trend
Deep language understanding is an emerging field
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-
77
Q amp ATHANKS FOR YOUR ATTENTIONS
bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)
bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016
SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS
- Statistical Learning from Dialogues for Intelligence Assistants
- My Background
- Outline
- Outline (2)
- What are Intelligent Assistants
- Why do we need them
- Why do we need them (2)
- Why do companies care
- Personal Intelligent Architecture
- Personal Intelligent Architecture (2)
- Outline (3)
- Spoken Dialogue System (SDS)
- What is Baymaxrsquos intelligence
- SDS Architecture
- Interaction Example
- SDS Process ndash Available Domain Ontology
- SDS Process ndash Available Domain Ontology (2)
- SDS Process ndash Available Domain Ontology (3)
- SDS Process ndash Spoken Language Understanding (SLU)
- SDS Process ndash Spoken Language Understanding (SLU) (2)
- SDS Process ndash Dialogue Management (DM)
- SDS Process ndash Dialogue Management (DM) (2)
- SDS Process ndash Dialogue Management (DM) (3)
- SDS Process ndash Dialogue Management (DM) (4)
- SDS Process ndash Natural Language Generation (NLG)
- Required Knowledge
- Challenges for SDS
- Contributions
- Contributions (2)
- Contributions (3)
- Knowledge Acquisition
- SLU Modeling
- SDS Architecture ndash Contributions
- SDS Flowchart
- SDS Flowchart ndash Semantic Decoding
- Outline (4)
- Semantic Decoding [ACL-IJCNLPrsquo15]
- Frame-Semantic Parsing
- Ontology Induction [ASRUrsquo13 SLTrsquo14a]
- Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
- 1st Issue How to adapt generic slots to a domain-specific sett
- Semantic Decoding [ACL-IJCNLPrsquo15] (2)
- Knowledge Graph Construction
- Edge Weight Measurement SlotWord Embeddings Training (Levy and
- Edge Weight Measurement
- Knowledge Graph Propagation Model
- Semantic Decoding [ACL-IJCNLPrsquo15] (3)
- Feature Model + Knowledge Graph Propagation Model
- 2nd Issue How to model the unobserved hidden semantics Matrix
- Bayesian Personalized Ranking for MF
- Matrix Factorization SLU (MF-SLU)
- Semantic Decoding [ACL-IJCNLPrsquo15] (4)
- Experimental Setup
- Experiments of Semantic Decoding Quality of Semantics Estimatio
- Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
- Experiments of Semantic Decoding Effectiveness of Relations
- Experiments for Structure Learning Relation Discovery Analysis
- Contributions of Semantic Decoding
- Low- and High-Level Understanding
- SDS Flowchart ndash Intent Prediction
- Outline (5)
- Intent Prediction of Mobile Apps [SLTrsquo14c]
- Intent Prediction ndash Single-Turn Request
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
- Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
- Experiments for Intent Prediction
- Experiments for Intent Prediction (2)
- Experiments for Intent Prediction (3)
- Experiments for Intent Prediction (4)
- Contributions of Intent Prediction
- Personal Intelligent Architecture (3)
- Outline (6)
- Conclusions
- Future Work
- Towards Unsupervised Deep Learning
- Take Home Message
- Q amp A
-