deep learning for nlp @noah€¦ · deep learning for nlp @noah: progress, challenges and...

27
Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu Noah’s Ark Lab Huawei

Upload: others

Post on 22-May-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects

Deep Learning for NLP @Noah: progress, challenges and opportunities

Zhengdong Lu Noah’s Ark Lab Huawei  

Page 2: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects

Noah’s Ark Lab

•  Research Areas •  Machine Learning •  Data Mining •  Speech and Language

Processing •  Information and Knowledge

Management •  Intelligent Systems •  Human Computer Interaction

•  Founded in 2012 •  Base in Hong Kong & Shenzhen •  Connections with research

institutes in mainland China •  Researchers and student interns

from Tsinghua, PKU, ICT-CAS, etc

•  Collaboration with ..

Page 3: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects

Natural Language Processing (NLP)

Our goal is to build •  System that can understand you, act to your instructions, answer your

questions, and talk back to you. That involves technologies of –  Question answering –  Machine translation –  Natural language dialogue –  Cross-modal communication

•  It is almost A.I. It is cool It is hard  

   

 

Page 4: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects

The Road Map of DL4NLP (my personal view)

Representa.on

Classifica.on Matching

Machine  Transla.on Natural  Language  Dialog

Genera.on

Page 5: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects

The Road Map of DL4NLP (my personal view)

Representa.on

Classifica.on Matching

Machine  Transla.on Natural  Language  Dialog

Genera.on

Reasoning

Symbolic  A.I.

External  Memory

Knowledge    Representa.on

Page 6: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects

The DeepLearners @Noah

             researchers

student interns (in the descending order of fidelity to their real look)

Page 7: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects

Several Threads of Work (1)

Deep Matching Models: •  Models addressing different aspects of

objects and their matching •  Applied to retrieval-based dialog, image

retrieval, and machine translation, with state-of-the-art performance

 

 

DeepMatchtopic

DeepMatchtree

DeepMatchCNN M.T. application Image search application

Page 8: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects

Several Threads of Work (2)

Convolutional architecture for text modeling •  “Guided”-encoding of source sentence in traditional

M.T. ( 1 BLEU improvement over the BBN model) •  Sentence generation and even an end-to-end neural

M.T. system (similar to LSTM on generation) •  Multi-resolution representation of sentence for

short-text classification (state-of-the-art performance)

 (ACL-15)  

generative CNN (ACL-15)

Self-adaptive sentence model (IJCAI-15)

Encoding source sentence for machine translation

Page 9: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects

Several Threads of Work (3)

Neural Dialogue Models •  World’s first purely neural dialog model •  Models that can “understand” what you said and generate an appropriate

response •  Better response quality than SMT-based and retrieval-based model •  Moving towards multi-turn and knowledge-powered dialogue

 Neural Responding Machine

Page 10: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects

Several Threads of Work (4)

 Memory-based Deep Architecture •  Flexible intermediate representation of sequence

as content in a memory •  Novel deep architecture based on recursive read-

write between multiple memory •  “Generalize” Neural Turing Machine for

sequence-to-sequence modeling •  Performance comparable to Moses with small

parameter set on zh-en task

 

 

 

Page 11: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects

Several Threads of Work (5)

Neural Reasoning System and Neural Symbolic A. I. •  Teach neural network the rules/grammar

•  Purely neural net-based reasoning system that can infer over multiple supporting facts (state-of-the-art performance)

 Rule Assimilation & Neural Reasoner

Some cool stuff here

Page 12: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects

 

     Memory-based Deep Architectures for Machine Translation

Page 13: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects

{NN + Memory }for NLP

•  External memory has been recently added to the loop of end-to-end neural learning, but still very much toyish –  Neural Turing Machine –  Memory Network –  Dynamic Memory Network –  Neural Stack –  Neural Transducer

•  We are the first to apply it to neural machine translation, with remarkable improvement  

   

 

Page 14: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects

Memory Read-Write as An Nonlinear Transformation

•  Memory (with its content) as a more flexible way to represent sequences (e.g, NL sentences)

•  A system to read from R-memory and write to W-memory defines an nonlinear transformation between two representations

•  controller + read heads + write head (as in Neural Turing Machine)

Page 15: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects

Stacking them together …

•  We can stack them together to form a deep architecture to do layer-by-layer transformation of sequences

   

 

   

 

Page 16: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects

Neural Transformation Machine (NTram)

•  We can stack them together to form a deep architecture to do layer-by-layer transformation of sequences

•  Takes a source language sentence and gradually transform it into target language sentence

   

   

 

Page 17: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects

“Types”of W-R transformations

•  Different W-R combinations define different types of nonlinear transformation

•  The stacking layers in deep RNN as a special case

jointly by design (in specifying the strategy and the implementation details) and the later su-pervised learning (in tuning the parameters). We argue that the memory o↵ers more represen-tational flexibility for encoding sequences with complicated structures, and the transformationintroduced above provides more modeling flexibility for sequence-to-sequence learning.

As the most “conventional” special case,if we use L-addressing for both reading andwriting, we actually get the familiar struc-ture in units found in RNN with stacked lay-ers [16]. Indeed, as illustrated in the figureright to the text, this read-write strategy will invoke a relatively local dependency based onthe original spatial order in R-memory. It is not hard to show that we can recover some deepRNN model in [16] after stacking layers of read-write operations like this.

The C-addressing, however, be it for reading and writing, o↵ers a means for major re-ordering of the cells, while H-addressing can add into it the spatial structure of the lowerlayer memory. Memory with designed inner structure gives more representational flexibilityfor sequences than a fixed-length vector, especially when coupled with appropriate readingstrategy in composing the memory of next layer in a deep architecture as shown later. Onthe other hand, the learned memory-based representation is in general less universal than thefixed-length representation, since they typically needs a particular reading strategy to decodethe information.

In this paper, we consider four types of transformations induced by combinations of theread and write addressing strategies, listed pictorially in Figure 2. Notice that 1) we onlyinclude one combination with C-addressing for writing since it is computationally expensiveto optimize when combined with a C-addressing reading (see Section 3.2 for some analysis) ,and 2) for one particular read-write strategy there are still a fair amount of implementationdetails to be specified, which are omitted due to the space limit. One can easily designdi↵erent read/write strategies, for example a particular way of H-addressing for writing.

Figure 2: Examples of read-write strategies.

3 NTram: Stacking Them Together

We stack the transformations together to form a deep architecture for sequence-to-sequencelearning (named NTram), in a way analogous to the layers in DNNs. The aim of NTram

is to learn the representation of sequence better suited to the task (e.g., machine transla-tion) through layer-by-layer transformations. Just as in DNN, we expect that stacking rela-tively simple transformations can greatly enhances the expressing power and the e�ciency ofNTram, especially in handling translation between languages with vastly di↵erent syntacticalstructures (e.g., Chinese and English).

5

Page 18: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects

Architectures in the NTram family

•  Different architectures, with different combination of nonlinear transformations

•  RNNsearch as a special case

T

=

Page 19: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects

NTram as a neural machine translator

•  Performance comparable to Moses with small number of parameters (compared to Google’s brute-force version)  

   

 

Slightly out-of-date

Page 20: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects

 

Neural Network-based Reasoning  

Page 21: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects

Neural Reasoning

•  The ambition is to build neural reasoning system, with the rigor of logics and the flexibility of human language

•  Currently, only have rather naïve models  

   

 

End-­‐to-­‐end  Memory  Net    from  FB  AI  Research  

Page 22: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects

Neural Reasoning (cont’d)

•  Task:reasoning with multiple support facts •  Answer can be casted into classification problems with pre-determined

classes

   

 

Page 23: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects

Neural Reasoning (cont’d)

Results •  Ours is way better than other neural models with end-to-end learning •  Actually ours is even better than them with strong supervision, when

there are enough training data

   

 

Page 24: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects

Future Steps

Page 25: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects

Ongoing and future research

•  Rule and reasoning: –  Teach the neural network the rules –  Neural reasoning with “hybrid” mechanism

•  Dialogue: –  Multi-turn dialogue, dialogue with knowledge –  Neural reinforcement learning for dialogue with a purpose

•  Machine translation –  M.T. as classification problem (with CNN-based neural system ) –  Deep memory-based M.T. system

•  Natural language understanding/Semantic parsing –  Game-theoretic end-to-end learning for NL understanding

 

   

 

Page 26: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects

References

•  Self-Adaptive Hierarchical Sentence Model. H. Zhao, Z. Lu, P. Poupart, IJCAI 2015

•  Syntax-based Deep Matching of Short Texts. M. Wang, Z. Lu, H. Li, Q. Liu, IJCAI 2015 •  A Deep Architecture for Matching Short Texts. Z. Lu, H. Li. NIPS-2013 •  Convolutional Neural Network Architectures for Matching Natural Language Sentences. B. Hu, Z. Lu,

H. Li, Q. Chen. NIPS-2014 •  Multimodal Convolutional Neural Networks for Matching Image and Sentence, L. Ma, Z. Lu, L. Shang,

H. Li. arXiv:1504.06063

•  Neural Transformation Machine: A New Architecture for Sequence-to-Sequence Learning. F. Meng, Z. Lu H. Li, Q. Liu. arXiv:1504.06442

•  genCNN: A Convolutional Architecture for Word Sequence Prediction. M. Wang, Z. Lu, H. Li, W. Jiang, Q. Liu.  ACL-IJCNLP, 2015

•  Encoding Source Language with Convolutional Neural Network for Machine Translation. F. Meng, Z. Lu, M. Wang, H. Li, W. Ji, Q. Liu. ACL-IJCNLP, 2015

•  Context-Dependent Translation Selection Using Convolutional Neural Network. Z. Tu, B. Hu, Z. Lu, H. Li. ACL-IJCNLP, 2015

•  Neural Responding Machine for Short-Text Conversation. L. Shan, Z. Lu, H. Li. ACL-IJCNLP, 2015 •  Towards Neural Network-based Reasoning. B. Peng, Z. Lu, H. Li, K-F. Wong. To appear

Page 27: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects

Thanks, Question?