deep learning for human language processing
TRANSCRIPT
![Page 1: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/1.jpg)
Deep Learning for Human Language Processing
HUNG-YI LEE
李宏毅
![Page 2: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/2.jpg)
What is this course about?
• 深度學習與人類語言處理 (Deep Learning for Human Language Processing)
Human Computer
聽懂人說的說
看懂人寫的文句
寫出人看得懂的句子
說出人聽得懂的話
深度學習
![Page 3: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/3.jpg)
What is this course about?
• 深度學習與人類語言處理 (Deep Learning for Human Language Processing)
• 自然語言處理 (Natural Language Processing, NLP)• A language that has developed naturally in use (e.g.
Chinese, English)
• As contrasted with an artificial language (e.g. JAVA, Python)
• Natural Language can be Speech or Text
這門課也可以叫「深度學習與自然語言處理」
Why not???
![Page 4: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/4.jpg)
What is this course about?
• In this course, Text v.s. Speech = 5 : 5
• Most NLP textbook and course mainly focus on text (Text v.s. Speech = 9 : 1)
• Speech processing is NOT only speech recognition.
• Only 56% languages have written form (Ethnologue, 21st edition)
• We don't always know if the existing writing systems are widely used.
所以這門課叫做「深度學習與人類語言處理」
![Page 5: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/5.jpg)
Human Language Processing is popular!
Google Duplex (2018)
IBM Project Debater (2019)
![Page 6: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/6.jpg)
Human Language is complex1 second has 16K sample points
Each point has 256 possible values.
audio
Ref: https://thejohnfox.com/long-sentences/
Ref: https://en.wikipedia.org/wiki/Longest_English_sentence
![Page 7: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/7.jpg)
古希臘哲學家赫拉克利特(Heraclitus)
source: https://vocus.cc/davidlai1988/5cdef255fd89780001f11f6c
![Page 8: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/8.jpg)
你好 你好
你好 你好
也沒有人可以說同一段話兩次
![Page 9: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/9.jpg)
Human Language is complex1 second has 16K sample points
Each point has 256 possible values.
audio
text
Ref: https://thejohnfox.com/long-sentences/
Ref: https://en.wikipedia.org/wiki/Longest_English_sentence
The Language Instinct: How the Mind
Creates Language (Steven Arthur Pinker)
William Faulkner, “Absalom, Absalom.”:
“Just exactly like Father if Father had known ……” (1289 words)
Faulkner wrote, “Just exactly like Father …”
Pinker said Faulkner wrote, “Just exactly like Father …”
Who cares that Pinker said Faulkner wrote, “Just exactly like Father …”
Jonathan Coe's The Rotters' Club has a
sentence with 13,955 words (2014)
![Page 10: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/10.jpg)
One slide for this course
Model
ModelModel
Model
Model class Model class
![Page 11: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/11.jpg)
What is the model? Model =
沒有 「硬 train 一發」無法解決的問題
如果有 …
那只是你訓練資料和GPU 不夠多而已
遇到問題用 deep learning「硬 train 一發」就對了
Deep Network
硬 train 一發的故事: https://youtu.be/F1vek6ULo9w
![Page 12: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/12.jpg)
「硬 train 一發」過後
人類語言處理
的下一步
![Page 13: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/13.jpg)
One slide for this course
Model
ModelModel
Model
Model class Model class
Speech Recognition
![Page 14: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/14.jpg)
Automatic Speech Recognition (ASR)
Front-end
Signal Processing
Acoustic
Models Lexicon
Feature
VectorsLinguistic Decoding
and
Search Algorithm
Output
Sentence
Speech
Corpora
Acoustic
Model
Training
Language
Model
Construction
Text
Corpora
Language
Model
Input Speech
2GB
https://ai.googleblog.com/2019/03/an-all-neural-on-device-speech.html
Traditional Speech Recognition
End-to-end
80MB
(數位語音處理概論第一章)
It is not the seq2seq you know!
![Page 15: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/15.jpg)
One slide for this course
Model
ModelModel
Model
Model class Model class
Text-to-Speech Synthesis
![Page 16: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/16.jpg)
(Keiichi Tokuda, keynote,INTERSPEECH’19)
TTS is end-to-end
![Page 17: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/17.jpg)
All the problems solved?
高雄發大財我現在要出征
發財發財發財發財
發財
發財發財發財
發財發財It has happened in real applications ……
![Page 18: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/18.jpg)
蘋果仁頻道:https://www.youtube.com/watch?v=EwbTlnUkctM
This problem is found in 2018.02. It is already fixed.
![Page 19: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/19.jpg)
One slide for this course
Model
ModelModel
Model
Model class Model class
![Page 20: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/20.jpg)
Speech Separation
• 雞尾酒會效應(cocktail party effect)
感謝孫凡耕同學、施順耀同學提供實驗結果
兩人同時說話
語者一
語者二
上面結果連 Fourier Transform 都沒有用上只有用深度學習“硬train一發”
Model
![Page 21: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/21.jpg)
Voice Conversion
![Page 22: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/22.jpg)
要硬 train 一發你需要……
能不能……
Speaker A Speaker B
How are you? How are you?
Good morning Good morning
Speaker A Speaker B
天氣真好 How are you?
再見囉 Good morning
Speakers A and B are talking about completely different things.
![Page 23: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/23.jpg)
Unsupervised Voice Conversion
• Only one utterance from each speaker (one-shot learning)
Speaker A Speaker B
A → B
新垣結衣(Aragaki Yui)
感謝解正平同學提供實驗結果
B → A
![Page 24: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/24.jpg)
One slide for this course
Model
ModelModel
Model
Model class Model class
![Page 25: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/25.jpg)
Input Audio, Output Class
Model which speaker?
Speaker Recognition
Keyword Spotting
Model keyword or not?
![Page 26: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/26.jpg)
Wake up words
• 2017.01, in Dallas, Texas
• A six-year-old asked her Amazon Echo “can you play dollhouse with me and get me a dollhouse?”
• The device orders a KidKraft Sparkle mansion dollhouse.
• TV station CW-6 in San Diego, California, was doing a morning news segment
• Anchor Jim Patton said, “I love the little girl saying, ‘Alexa ordered me a dollhouse.’ ” ……
https://www.foxnews.com/tech/6-year-old-accidentally-orders-high-end-treats-with-amazons-alexa
https://www.theverge.com/2017/1/7/14200210/amazon-alexa-tech-news-anchor-order-dollhouse
![Page 27: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/27.jpg)
Wake up words2017.04
![Page 28: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/28.jpg)
Fermachado123 is the username of Burger King’s marketing chief
![Page 29: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/29.jpg)
![Page 30: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/30.jpg)
![Page 31: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/31.jpg)
One slide for this course
Model
ModelModel
Model
Model class Model class
![Page 32: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/32.jpg)
BERT跟他的好朋友們
![Page 33: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/33.jpg)
進展超乎想像 …
https://github.com/thunlp/PLMpapers
2018.03
2018.102019.02
2019.042019.06
2019.07
BERT 家族繁衍興盛
![Page 34: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/34.jpg)
Source of image: https://huaban.com/pins/1714071707/
ELMO (94M)
BERT (340M)
GPT-2 (1542M)
The models are become larger and larger …
![Page 35: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/35.jpg)
Megatron (8G)GPT-2 T5 (11G)
Turing NLG(17G)
The models are become larger and larger …
![Page 36: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/36.jpg)
One slide for this course
Model
ModelModel
Model
Model class Model class
![Page 37: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/37.jpg)
Text Generation
I have a dream
I have a dream
Autoregressive
Non-autoregressive
![Page 38: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/38.jpg)
One slide for this course
Model
ModelModel
Model
Model class Model class
![Page 39: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/39.jpg)
So many applications …
Model Model
ModelHello 你好
Translation
Model summary
Summarization
How are you?
I’m good.
Chat-bot
QAns
Question Answering
![Page 40: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/40.jpg)
So many applications …
• Even syntactic parsing … Model
![Page 41: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/41.jpg)
So many applications …
Model Model
ModelHello 你好
Translation
Model summary
Summarization
How are you?
I’m good.
Chat-bot
QAns
Question Answering
I will not go through all the applications because you will feel bored.
![Page 42: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/42.jpg)
There is more ……
Model
ModelModel
Model
Model class Model class
![Page 43: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/43.jpg)
Meta learning = Learn to learn
Learning task 1
Learning task 100
I can learn task 101 better because I learn some learning skillsLearning
task 2
……
Be a better learner,Learn with little paired data
Bengali
Tagalog
Zulu
Tamil
![Page 44: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/44.jpg)
audio text
Speech Recognition
audio audio
Voice Conversion
Image Style Transferimage image
Learning from Unpaired Data
Summarization
summarydocument
Language 1 Language 2
Unsupervised Translation
![Page 45: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/45.jpg)
Knowledge Graph
Image: https://www.pngfuel.com/free-png/cxpcq
Model
![Page 46: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/46.jpg)
Adversarial Attack
• Speech• Anti-spoofing system (detecting synthetized speech) is
easy to fool. [Liu, et al., ASRU, 2019]
• Speech recognition is easy to fool. [Lea Schonherr, et al., NDSS, 2019]
• NLP
[Eric Wallace, et al., EMNLP 2019]
![Page 47: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/47.jpg)
Explainable AI
This is a “cat”.
Because …
The ans is “8848 meters”
Because …q: how high is Everest?
![Page 48: Deep Learning for Human Language Processing](https://reader030.vdocuments.site/reader030/viewer/2022012801/61bd098a61276e740b0eb63a/html5/thumbnails/48.jpg)
That’s all for this course
Model
ModelModel
Model
Model class Model class