named entity recognition at scale with deep learning

34
Named Entity Recognition at Scale with Deep Learning Sijun He @SijunHe #TwitterCortex at #ODSCWest 1

Upload: others

Post on 09-Apr-2022

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Named Entity Recognition at Scale with Deep Learning

Named Entity Recognition at Scale with Deep Learning

Sijun He @SijunHe#TwitterCortex at #ODSCWest

1

Page 2: Named Entity Recognition at Scale with Deep Learning

Introduction

Sijun He@SijunHe

ML Engineer IITwitter Cortex

2

Page 3: Named Entity Recognition at Scale with Deep Learning

3

12345

NER on TweetsDataModelConfidence EstimationSystem Overview

Agenda

Page 4: Named Entity Recognition at Scale with Deep Learning

4

12345

NER on TweetsDataModelConfidence EstimationSystem Overview

Agenda

Page 5: Named Entity Recognition at Scale with Deep Learning

Named Entity Recognition (NER) on Tweets

PersonLocationOrganizationProductOther

5

Page 6: Named Entity Recognition at Scale with Deep Learning

Application of NER: Trends

6

Page 7: Named Entity Recognition at Scale with Deep Learning

Application of NER: Events Detection

7[Fedoryszak et al., 2019]

Page 8: Named Entity Recognition at Scale with Deep Learning

Application of NER: User Interest

Last Engagements

Twitter (9), US (9), China (7), HK (7), Google (3),

Linkedin (3), Stanford CoreNLP (2), Jeremy Lin (2)

Manchester United (1)

PersonLocationOrganizationProductOther

8

Page 9: Named Entity Recognition at Scale with Deep Learning

Why in-house NER?

● Strategic: Gauge of information extraction and content understanding at Twitter

● Unique linguistic feature of tweets○ Limited context due to brevity○ Abbreviation ○ Typos ○ Informal language○ Temporality ○ ...

● Cost of 3rd party Cloud API at production volume

9

Page 10: Named Entity Recognition at Scale with Deep Learning

Example of NER on Tweet

Google Natural Language API

Our Model

SpaCy (Open-source)

10

Page 11: Named Entity Recognition at Scale with Deep Learning

11

12345

NER on TweetsDataModelConfidence EstimationSystem Overview

Agenda

Page 12: Named Entity Recognition at Scale with Deep Learning

Generating Training Data

Data Cleaning

● Process character labels into token labels to train NER model

● Regular removal of deleted tweets (GDPR)

Sampling

● Stratified sampling based on tweet engagement

● Long period of time to capture temporal signal

Labeling

● Character-based Labeling on crowdsourced labeling platform○ Person○ Location○ Organization○ Product○ Other

12

Page 13: Named Entity Recognition at Scale with Deep Learning

13

12345

NER on TweetsDataModelConfidence EstimationSystem Overview

Agenda

Page 14: Named Entity Recognition at Scale with Deep Learning

NER Model Setup

14

John lives in San Jose

B-Per O O B-Loc I-Loc

Model

B - Beginning token of an entityI - Inside token of an entityO - Not an entity

Page 15: Named Entity Recognition at Scale with Deep Learning

Model Architectures

Conditional Random Field

[Lafferty et al., 2001]

Deep LearningArchitectures

[Li et al., 2018]

Fine-tunedLanguage Models

[Devlin et al., 2019]

15

Page 16: Named Entity Recognition at Scale with Deep Learning

Conditional Random Field (CRF)

16[Lafferty et al., 2001]

John lives in San Jose

B-Per O O B-Loc I-LocHidden State

Observed State .

O

● Discriminative analog to Hidden Markov Model (HMM)● Models local context with transition matrix

Page 17: Named Entity Recognition at Scale with Deep Learning

CRF Transition Matrix

17

From

To

Page 18: Named Entity Recognition at Scale with Deep Learning

Deep Learning Architectures

[Li et al., 2018]

Word Embedding, Character EmbeddingHand-crafted Features...

CNN, RNN, LSTM, Transformer, Attention...

MLP+Softmax, CRF... Decode Layer

Input Layer

Context Layer

18

Page 19: Named Entity Recognition at Scale with Deep Learning

Char-BiLSTM-CRF

Word Representation

Bidirectional LSTM

CRF

Character Representation

OtherFeatures

Decode Layer

Input Layer

Context Layer

19

Page 20: Named Entity Recognition at Scale with Deep Learning

Character Representations

[Li et al., 2018]20

Page 21: Named Entity Recognition at Scale with Deep Learning

Decoder

[Li et al., 2018]21

Page 22: Named Entity Recognition at Scale with Deep Learning

Fine-tuning Pre-trained LM (e.g. BERT)

Fine-tuning

22[Devlin et al., 2019]

Page 23: Named Entity Recognition at Scale with Deep Learning

Performance on CoNLL 2003

23nlp-progress

Model Type Performance (F1)

CRF ~ 0.85

BiLSTM-CRF ~ 0.92

BERT large ~ 0.93

Page 24: Named Entity Recognition at Scale with Deep Learning

24

12345

NER on TweetsDataNER ModelConfidence EstimationSystem Overview

Agenda

Page 25: Named Entity Recognition at Scale with Deep Learning

Confidence Estimation

25

Confidence Estimation

B-Per I-Per O O B-Loc I-Loc I-Loc I-Loc Sijun He is in San Jose , CA

NER Model

Sijun He Person 0.99San Jose, CA Location 0.97

Sijun He is in San Jose, CA

Page 26: Named Entity Recognition at Scale with Deep Learning

Confidence Estimation

26

0.9 0.6 B-Loc I-Loc

San Jose is in California

NER Model

● Softmax decoder computes token confidence● CRF decoder only computes the confidence for the whole sentence

Page 27: Named Entity Recognition at Scale with Deep Learning

Confidence Estimation with CRF

[Culotta et al., 2004]27

B I OJane

Doe

went

to

Paris

.

Total Likelihood

B I OJane

Doe

went

to

Paris

.

Constrained Total Likelihood

Entity: Jane DoeConstraints: (Jane, B), (Doe, I)

Find the total likelihood of all possible sequences a.k.a. normalizer Compute the marginal probability

Constraint Forward-Backward Algorithm

Page 28: Named Entity Recognition at Scale with Deep Learning

28

12345

NER on TweetsDataNER ModelConfidence EstimationSystem Overview

Agenda

Page 29: Named Entity Recognition at Scale with Deep Learning

System Overview

Model Endpoint Proxy

English NER

Spanish NER

Japanese NER

...

...

29

Page 30: Named Entity Recognition at Scale with Deep Learning

System Overview

Model Endpoint

HDFS

Cache

Tweet Creation

Scribe

PutOnline Clients

Read

Offline Clients

30

Cache miss

System Read RPS 120k rps

Model Inference RPS 10k rps

Model Latency p99 20 ms

Page 31: Named Entity Recognition at Scale with Deep Learning

Named Entities in External Articles

31

● One of the core pieces of public conversation on Twitter● Process NER on articles’ title and short snippet● Significant upside in entity signal coverage

No Named Entities in Tweet

Named Entities in the Linked Article:● Brunswick, GA● Detroit● Lions● Georgia

PersonLocationOrganizationProductOther

Page 32: Named Entity Recognition at Scale with Deep Learning

Future Work

32

● Language-specific Model Architecture● Multilingual Model● Active Learning for Data Efficiency

Page 33: Named Entity Recognition at Scale with Deep Learning

Reference

33

● Mateusz Fedoryszak, Brent Frederick, Vijay Rajaram and Changtao Zhong, Real-time Event

Detection on Social Data Streams, KDD 2019, link● John Lafferty, Andrew McCallum and Fernando C.N. Pereira, Conditional Random Fields:

Probabilistic Models for Segmenting and Labeling Sequence Data, ICML 2001, link● Jing Li, Aixin Sun, Jianglei Han and Chenliang Li, A Survey on Deep Learning for Named Entity

Recognition, link● Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova, BERT: Pre-training of Deep

Bidirectional Transformers for Language Understanding, NAACL-HLT 2019, link● NLP Progress, link● Aron Culotta and Andrew McCallum, Confidence Estimation for Information Extraction,

HLT-NAACL 2004, link

Page 34: Named Entity Recognition at Scale with Deep Learning

#ThankYou

34

We are hiring ML Researchers and Engineers! [email protected]