20190517 ç 究室ç ç©¶ç´¹ä» è ±èª ç fy2019 - …...2cuv 75kownvcpgqwu...
TRANSCRIPT
/14/14/14
Augmented Human Communication Lab
C
2019/5/20
Speech TranslationNeural Machine Translation
Brain Analysis
Spoken DialogMulti-modal Dialog
Why don’t you join our lab!I’m looking for
a lab.
Information Retrieval
QA SystemMulti-modal
Multi-language ASRSpeech SynthesisDeep Speech Chain
Deep NeuralNetwork
Affective ComputingEmotion and Environment Recognition
Prof. Satoshi Nakamura
Assis. Prof. Koichiro Yoshino
WEBInformationProcessing
Toward enhancement of human communication abilitiesToward enhancement of human communication abilities, AHC lab is studying multilingual speech translation, dialog system, user-
adaptive super-human automatic speech recognition/synthesis, and brain analysis related human communication. We have also been managing Data Science Center since 2017.
Assoc. Prof. Katsuhito Sudoh
ResearchAssoc. Prof.
Sakriani Sakti
Assis. Prof. Hiroki Tanaka
( )
Goal-oriented DialogNon goal-oriented Dialog
Incongruity measurementPrediction of feeling
Early Detection of DementiaCommunication Support Dialog
ResearchAssoc. Prof. Keiji Yasuda
Visiting Assoc. Prof. Yu Suzuki
ProfessorSatoshi Nakamura
World-wide
Visiting Assoc. Professor
Yu Suzuki
Assistant ProfessorKoichiro Yoshino
Spoken Dialog SystemDialog Control
Semantic AnalysisLanguage
UnderstandingKnowledge ExtractionInformation Retrieval
Research Associate ProfessorSakriani Sakti
Speech RecognitionMultilingual SR
Cognitive Communication
Graphical Models
Machine Translation
Speech TranslationNatural Language
ProcessingMachine Learning
Assistant ProfessorHiroki Tanaka
Communication AidCognitive Information
Processing
2019/5/20
Associate ProfessorKatsuhito Sudoh
Speech TranslationSpeech Recognition
Dialog ControlCognitive Communication
Big Data Analysis
Research Associate ProfessorKeiji Yasuda
Human Resources TechEducational Tech
Artificial Intelligence in Healthcare
Lab Members6 Faculty, 17 PhD Students,
27 Master Students
SpeechProcessingD: 8; M: 6
Spoken DialogD: 5; M: 6
CognitiveCommunicationD: 1; M: 6
NLPSTS TranslationD: 3; M: 9
2019/5/20
2019/5/20
Graduation Ceremony
Lab Research Boot Camp
Lab Ski Camp
Alumni Party
/14
Professor Satoshi NakamuraBackground
1981.4- 1994.3 Sharp Corp. Central Research Labs
1986-1989 ATR Interpreting Telephony Res. Labs.
1994.4-2000.3 Associate Prof. Nara Institute of Science and Technolog
2000.4 Advanced Telecommunication Research International (ATR) Vice President of ATR,
Director of Spoken Language Communication Labs.
ATR Fellow
2006.4 National Institute of Information and Communication Tech.(NICT) Director, MASTAR Project
Director, KCCC Research Center
Director General, Keihannna Research Laboratories
Dec. 2003 Honorarprofessor of University Karlsruhe, Germany
Apr. 2011 Prof. at Nara Institute of Science and Technology
Spoken Language Communication
Research Laboratories
2019/5/20
History of Speech Translation Research In Japan
Fundamentals
Read Speech
• Syntactically correct• Clear utterance• Limited domain
Ex. “Conference Registration”
Daily Conversation
• Standard expression• Unclear utterance• Limited domain
Ex. “Hotel Reservation”
Wider and Real Domain
• Wider and real domain“International Travel”
• Realistic expressions• Noisy speech• J-E, J-C speech translation
1986 1992 1999 2006
Rule-based TechnologyCorpus-based Technology
Hand-madeLarge scale corpus
+ Machine learning
2008ATR NICT
A-STAR
+ More languagesfor translation
• Multilateral translation for 8 Asian languages• Network-based S2ST
2010
•21 multilateral text translation
C-STAR
• Multilateral translation for 7 world languages
IWSLT
• Evaluation Campaign of S2S technologies
2011
VoiceTra
NAIST
ATR ATR
2019/5/20
/14
/42
Riken AIP Tourism Information Analytics TeamIoT2H: (Internet of Things to Human)
2019/5/20 Satoshi NAKAMURA@AHC,NAIST 8
IoT2H is a technology to bridge Internet of Things and human-beings.
What’s happening?
IoT, Social information
ToHuman
Output in language
Congestion factor for tourism spots
Shopping Hospital
Bus
Train
Restaurant
Temples
BeaconBeacon
Beacon Beacon
KinkakujiTemple is
now crowded
Chat bot
Hotel
Tourism Information in KyotoIdea development of Deep Learning
Image captioning image2cap!
Real-time
Assoc. Professor Katsuhito Sudoh
Background2000 Bachelor of Engineering, Kyoto University2002 Master of Informatics, Kyoto University2015 Ph.D. (Informatics), Kyoto University2002-2017 NTT Communication Science Laboratories2017- Associate Professor, Graduate School of
Information Science, NAIST
Machine TranslationSpoken Language Processing
I went Nara last night at noon
Information extraction from speech(using recognition “confidence”)
長尺矩形のオイルストレーナ74が溝条72aに略鉛直姿勢で嵌合される。
A long rectangular oil strainer 74 is fitted within the grooves 72a in a substantially vertical posture.
Translation with accurate word order (re-ordering)Translation of technical terms
2019/5/20
Toward a Language Barrier-free Future!Translation & Language Understanding
Evaluation of Natural Language Generation (Katsuhito Sudoh)
NAISTへようこそ!! NAISTへようこそ!! Welcome to NAIST!!
Machine Translation
For better translation
・Shorter processing time・Accurate translation・Multi-lingual translationetc.
We are now working on these problems to make the world language barrier-free!
High-quality Chat Response (Ryo Nakamura) Neural Machine Translation in Real-time (Katsuki Chousa)
Semantic Sentence Encoding (Yoichi Ishibashi)
Reduce time delay to translate
俄罗斯可以宣布胜利了
Russia could claima victory of sorts
Semantic Automatic Evaluation of Translation (Kosuke Takahashi)
Reference TranslationOriginalRussia can
declarevictory
Evaluate the meanings of translations Encode the sentences as same vector
端子は互いに接触しないように配置されている
Terminals are placed not to be in contact with each other.
It’s fluent but gives wrong meaning...
IT’S POSSIBLY MISUNDERSTOOD
Focusing on risks of Misunderstanding
Style transfer for natural language(Kosuke Futamata)
Apply arbitrary stylistic features.
The chicken was delicious.
The chicken was Terrible.
Style(Positive)
Style(Negative)
Past ・Simultaneous optimization of speech recognition and machine translation ・Translating normal style into honorific styleResearch: ・Small and accurate translation models ・Evaluation of simultaneous speech translation systems
・Machine translation error analysis ・Pivot translation strategies ・Automated programing ・Multilingual machine translation ・Code efficiency prediction based on OJS data
relation between nouns domainToritaniKinami knowledge
relation between verbs preferencehitswing how to say
Assistant ProfessorKoichiro Yoshino
Background2009 Bachelor of Arts in Environmental Information,
Keio University2014 Ph.D. (Informatics), Graduate School of Informatics,
Kyoto University2014- JSPS Research Fellow (PD)2015- Assistant Professor, Graduate School of Information Science, NAIST
Did Tortani hit a home-run?
Toritani who got the start in 1st line-up hit 2 doubles.
Toritani hitsubject
focus→ retrieve from news text
Toritani who got the start
in 1st line-up
hit 2 doubles
subject
object
Web Text
subject
2019/5/20
Understanding
Recognition
Management
ASR, Para-linguistic recognition (SP, CC)
Understanding
Management
GenerationAction
TTSAction & behavior generation (SP, CC)
Generation
What is understanding?• Materialization of utterances• Dialogue act tagging• Knowledge acquisition • Knowledge extension• Relations between events
NLP techniques (NL)
NLP techniques (NL)
PRESTO: incremental
knowledge acquisition
PRESTO: incremental processing and
knowledge acquisition
affective computing
Spoken Dialogue Group- Toward cooperative systems through interactions -
How do we realize systems?• Decision making with reinforcement learning• Using a variety of information:
arguments, deception, emotion, task completionentrainment, contradictions, etc
• Algorithm of reinforcement learning• Evaluations of dialogue systems
How systems have effects?• Generate responses
according to manager decisions, contexts, personality, etc
• Image and interaction• End-to-end systems w distillation
Confirm?
Use emotion?
Ask a question?
There are 3 Italian restaurants at Ikoma. Do you have any preference?
Kiyomizu-temple is very crowded due to the high-
seasonAIP: Touristic
information analysis
AP Yoshino
D 品川 D 杉山
M Mai M1
M 隆辻
D 河野
D Tung M 浅井
M1
D 村瀬 M 池内
M 田中 M1
Italian near by Ikoma
DA: questionDomain: restaurant{
type=Italian, …}Obs.
Assistant ProfessorHiroki Tanaka
Background Bachelor of Engineering,
Ph.D. (Engineering), NAIST
2019/5/20
Cognitive Communication Group
Assistant Prof.Hiroki Tanaka
Assessment and training of social communication skills (based on cognitive behavior therapy: SST)・Task: speaking, listening, small talks ・Feedback regarding eye gaze, speech and image・Now tested in clinics
Human - humanHuman - machine
Communication
Automatic assessment / Feedback(Medical and educational system)
Estimation of Cognitive &Psychological States
Current and Ongoing Researches
Automated social skills training EEG measuring during Simultaneous translation
Tourism Information Analysis using Tensor Decomposition
D3Haruko Yagura
Predicting Objective Speech Quality Score
Various modalities
EEG Face / Eye Voice
Previous Research Topics
• Speech recognition using EEG signals
• Detection of dementia from responses
• Prediction of depressive tendency from lifestyle
Anomalous Sentence Detection using EEG
Measuring Empathy from EEG Signals
M1Ivan Halim P.
Application:Objective quality measurement of synthesized speech
M2Taiki Kinoshita
Empathy
Inter-BrainSynchronization
EEG
Statistical analysis
Measuring empathy
Application:Evaluation of human-machine empathy
Taro eats an apple
Taro runs an apple
Speech EEG Prediction
Correct
Incorrect
Prediction whether the speech sentence is correct or incorrect using EEG based on machine learning model
Application:Evaluation of machine outputs, adaptive dialogue system
M2ShunnosukeMotomura
M2Motoi Kubo
Apply tensor decomposition to a variety of (=high dimensional) tourism information and analyze trends of tourist's tourist routes and popular spots.
Tourismdata
Tensordecomposition
Loc
atio
n
+ ・・・ +
Trend of migration pathway, popular spot
/14
Education2005-2008 Doctorate degree (Dr.-Ing)
in Engineering Science, University of Ulm, GERMANY
2000-2002 Master degree (MSc ) in Communication Technology, University of Ulm, GERMANY
1995-1999 Bachelor degree (BSc) in Informatics, Bandung Institute of Technology, INDONESIA
Work Experience2018 – Research Assoc. Professor, Augmented Human Communication Labs, NAIST, JAPAN
Research Scientist, RIKEN Advanced Intelligence Project AIP, JAPAN 2011 – 2017 Assistant Professor, Augmented Human Communication Labs, NAIST, JAPAN2009 – 2011 Visiting Professor, Faculty of Computer Science, University of Indonesia, INDONESIA 2006 – 2011 Expert Researcher, Spoken Language Communication Research Groups, NICT, JAPAN 2003 – 2009 Research Engineer - Researcher, Spoken Language Communication Research Labs, ATR, JAPAN 2001-2002 Masterarbeit, Speech Understanding Dept,
Daimler Chrysler Research Center, GERMANY1999-2000 Junior Software Consultant, Sumarno Pabotingi
Associate, INDONESIA
Research Assoc. Prof. Sakriani Sakti
2019/5/20
Speech Processing Research and Applications~ Let’s make a machine that can hear and speak as human ~
Multi-modal Paralinguistic Recognition & Modelling
Michael Heck (AIP-OB)[Multimodal
representation learning]
Speech Recognition and Synthesis
Speech-to-speech TranslationIncluding translation of paralinguistic information such as emphasis, intonation, and pitch
End-to-End Wav-to-Text ASR with Deep Learning
Machine Speech Chain
Emotion Analysis and Deception Detection
Real-time Text-to-speech Synthesis
Have a nice day!
Sahoko Nakayama (DC)
[ Code-switching Speech Chain]
Kazuki Tsunematsu(MC)
[Speech Prediction]
Yanagita Tomoya(DC)
[Incremental TTS]
Nurul Fitria Lubis (D-OB)[Social-Affective Dialogues]
Do Quoc Truong (D-OB)[Paralinguistic
Speech-to-Speech Translation]
Takamoto Kano (DC)[Direct End-to-end
Speech-to-speech Translation]
Johanes Effendi The (DC)[Multi-modal Translation]
Multilingual ASR for Speech Translation
Real-time Speech Recognition using Video
Recognize speech and output a text!
Zero Resource Speech Challenge
ꦱꦸꦒꦌꦚꦁ
Incorporating Human Cognitiveinto ASR
Andros Tjandra (DC)[Machine Speech Chain]
Marco Vetter (DC)
[Lexical Discovery]
Wu Bin (DC)
[Zero Resource ASR]
Tourism Information Analytics
Fan Yang (DC)[Scene Recognition]
Mayuko Okamoto (MC)
[Entrainment TTS]
Sashi Novitasari(MC)
[Incremental ASR]