text mining lab (summer 2017) - word vector representation

Post on 21-Jan-2018

37 Views

Category:

Data & Analytics

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Summer 2017Elvis Saravia

PhD, Information Systems and Applicationsellfae@gmail.com

Github username: omarsarQuestions: sli.do (#Z217)

2

● Knowledge Discovery (KDD) Process

3

4

5

ConceptNet6

●●●

7

Motel = [0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0]Hotel = [0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0]

●●

One-hot representation

8

hotel = [0.728 0.234 -0.23 0.223]

Distributed representation (low-dimension vector)9

10

Paper source: https://arxiv.org/pdf/1301.3781.pdf

11

Paper source: https://arxiv.org/pdf/1301.3781.pdf

Feedforward Neural Net Language Model (NNLM)

variables to optimizedenotes window range

12

13

P(the|over)P(fox|over)P(jumped|over)P(the|over)P(lazy|over)P(dog|over)

P(VOUT | VIN)How to define this prob. distribution?

Determines similarity in [-1,1]

Get a probability in [0,1] out of a similarity in [-1,1]

14

15https://www.healthvault.com/en-us/health-bot/

16

● https://goo.gl/ppHX65

●○ Gensim guide for word2vec: https://goo.gl/i2UrdH

● https://goo.gl/7b72S9

●● https://goo.gl/uNJDrs

17

18

19

20

21

22

23

● https://goo.gl/KYacjz

●●●●●

● https://goo.gl/JezgYg

24

a. Build API: (Flask/Django recommended)b. Pretrained models: (Guide: https://goo.gl/5qt2Ki)c. Visualization: d3js / plotly / tensorboard

a. LSTM - (Guide: http://colah.github.io/posts/2015-08-Understanding-LSTMs/)b. CNN - (Guide: https://goo.gl/PgLUs7)c. RNN - (Guide: https://goo.gl/5L9kci

a. Starting point:https://rare-technologies.com/word2vec-tutorial#app

25

top related