web content mining - irityoann.pitarch/docs/m2stats/2019/m2... · 2019-01-11 · topic mining and...

51
Web Content Mining Topic Modeling and Text Classification — Session 2 — 1

Upload: others

Post on 21-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

Web Content Mining Topic Modeling and Text Classification

— Session 2 —

1

Page 2: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

Landscape of Text Mining and Analytics

Real World Text DataObserved World

Word association mining& analysis

Opinion mining & Sentiment analysis

Topic mining and analysisText based prediction

Perceive Express

(english)

NLP & text representation

2

Page 3: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

Topic Mining and Analysis: Motivation‣ Topic~ main idea discussed in text data

‣ Theme/subject of a discussion or conversation‣ Different granularities (e.g., topic of a sentence, an article, etc.)

‣ Many application require discovery of topics in text ‣ What are Twitter users talking about today?‣ What are the current research topics in data mining? How are they

different from those 5 years ago?‣ What do people like about the iPhone X? What do they dislike?‣ What were the major topics debated in 2016 presidential election?

3

Page 4: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

Task of Topic Mining and Analysis

Text Data

Topic 1

Topic 2

Topic k

...

Doc 1 Doc 2 Doc N...Task 1: Discovery k topics

Task 2: Figure out which documents cover which topics 4

Page 5: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

Task of Topic Mining and Analysis

Text Data

w1,w3,w5

w8,w17

w2,w4,w9

...

Doc 1 Doc 2 Doc N...

Unsupervised problem (clustering) 5

Page 6: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

Task of Topic Mining and Analysis

Text Data

Class 1

Class 2

Class k

...

Doc 1 Doc 2 Doc N...

Supervised problem (classification) 6

Page 7: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

Topic mining and Analysis‣ Group of documents (unsupervised)

‣ Clustering‣ Group of documents but into classes (supervised)

‣ Classification

‣ Group of words - dimensionality reduction (We are going to see them in the unsupervised part)

‣ NNMF, LDA, PLSA, etc.

7

Page 8: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

Clustering‣ Clustering of documents is a way to discovery related documents and

topics ‣ It is the process of grouping a set of objects into classes of similar objects

‣ Documents within a cluster should be similar.‣ Documents from different clusters should be dissimilar

‣ Many existing algorithms ‣ K-means, E.M., etc.

‣ A representation for each document is demanded (or document similarity) ‣ Term-document or document-term matrix‣ The frequency is frequently used, but many others weighted models

are available (tf-idf). Their use depends of the chosen algorithm

8

Page 9: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

Issues for clustering‣ Representation for clustering

‣ Document representation‣ Vector space? Normalization?‣ Centroids aren’t length normalized‣ Need a notion of similarity/distance

‣ How many clusters? ‣ Fixed a priori?‣ Completely data driven?

• Avoid “trivial” clusters - too large or small

t 1

D2

D1D3

D4

t 3

t 2

xy

9

Page 10: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

Document Representation TF-IDF

‣ Frequent terms are less informative than rare terms

‣ A document containing a frequent term is more likely to be relevant than a document that doesn’t

‣ But it’s not a sure indicator of relevance..

)df/(log)tf1log(w 10,, tdt Ndt

´+=

‣ Increases with the number of occurrences within a document

‣ Increases with the rarity of the term in the collection

TF-IDF

TF IDF

10

Page 11: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

Document Representation Term-Document Matrix

Anthony and

CleopatraJulius Caesar

The Tempest Hamlet Othello Macbeth

anthony 5.25 3.18 0.0 0.0 0.0 0.35

brutus 1.21 6.10 0.0 1.0 0.0 0.0

caesar 8.59 2.54 0.0 1.51 0.25 0.0

calpurnia 0.0 1.54 0.0 0.0 0.0 0.0

cleopatra 2.85 0.0 0.0 0.0 0.0 0.0

mercy 1.51 0.0 1.90 0.12 5.25 0.88

worser 1.37 0.0 0.11 4.15 0.25 1.95

Values used in these matrices are known as the weighting schema. They alter the obtained results (clustering or classification). Often the frequency is used, but other options are also available:

Why frequency is not the best option? Document size strongly affects similarity values!!!

Anthony and

CleopatraJulius

CaesarThe

Tempest Hamlet Othello Macbeth

anthony 1 1 0 0 0 1

brutus 1 1 0 1 0 0

caesar 1 1 0 1 1 1

calpurnia 0 1 0 0 0 0

cleopatra 1 0 0 0 0 0

mercy 1 0 1 1 1 1

worser 1 0 1 1 1 0

Binary matrix TF-IDF

11

Page 12: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

Documents as vectors

‣ So we have a |V|-dimensional vector space ‣ Terms are axes of the space ‣ Documents are points or vectors in this space ‣ Very high-dimensional: millions of dimensions when you apply this

to big collections ‣ These are very sparse vectors - most entries are zero

12

Page 13: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

Notion of similarity/distance‣ Ideal: semantic similarity (in your dreams)

‣ Practical: term-statistical similarity

‣ Cosine similarity

‣ Docs as vectors

‣ For many algorithms, easier to think in terms of a distance (rather than similarity) between docs

‣ It is easier to talk about Euclidean distance, but real implementations use cosine similarity

13

Page 14: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

Hard vs. soft clustering‣ Hard clustering: Each document belongs to exactly one cluster

‣ More common and easier to do

‣ Soft clustering: A document can belong to more than one cluster.

‣ Makes more sense for applications like creating browsable hierarchies

‣ You may want to put a pair of sneakers in two clusters: (i) sports apparel and (ii) shoes

‣ You can only do that with a soft clustering approach.

14

Page 15: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

What Is a Good Clustering?

‣ Internal criterion: A good clustering will produce high quality clusters in which: ‣ the intra-class (that is, intra-cluster) similarity is high‣ the inter-class similarity is low‣ The measured quality of a clustering depends on both the

document representation and the similarity measure used‣ External criterion: The quality of a clustering is also measured by its

ability to discover some or all of the hidden patterns or latent classes ‣ Assessable with gold standard data

15

Page 16: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

Clustering algorithms‣ Hard clustering

‣ K-means

‣ Hard EM

‣ Hierarchical clustering

‣ Soft clustering

‣ Latent Semantic Analysis (LSA)

‣ Non-negative matrix factorization

‣ Latent Dirichlet allocation

‣ Many others but these are the main used in text clustering

Not seen in this course

16

Page 17: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

‣ Proposed by Deerwester et al. (JASIST, 1990) ‣ Perform a SVD to the term-document matrix ‣ Result: low-rank approximation

Problem: how can negative weights be interpreted ?

17

LSA Latent Semantic Analysis

Page 18: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

NMF and pLSA Two alternatives

18

‣ NMF (Paatero et Tapper, 1990). Matrix decomposition with constraint on the positivity of the resulting matrices

‣ pLSA (Hofmann, 1999). Probabilistic version of LSA. Weights are probabilities (positive and interpretable)

Problem: overfitting and model not really generative (impossible to apply it on new documents)

Page 19: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

LDA Latent Dirichlet Allocation

‣ Blei et al. (2001) ‣ Fully generative model ‣ Inspired by probabilistic graphic model

19

Page 20: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

LDA Graphical Model

20

‣ Nodes are random variables ‣ Gray nodes are observed variables or constant ‣ Edge = conditional dependency ‣ Plate notation : variables are repeated

Page 21: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

LDA generative story1. For each topic draw a sample of words

from a DirichletW ( ) 2. For each document

‣ Draw a topic distribution from DirichletT( ) ‣ For each word at position n in document d

• Draw a topic zd,n from MultinomialT( ) • Draw a word wd,n from MultinomialW( )

j 2 {1, .., T}�

�j

d 2 D

21

✓d<latexit sha1_base64="FJTY+DBhtoXUv+lkt03HovnHItc=">AAACynicjVHLSsNAFD2Nr1pfVZdugkVwVVIR1F3RjRuhgrGFtpRJOm1D82JmIpTizi9wq/8l/oH+hXfGFNQiOiHJmXPPuTP3Xi8NA6kc57VgLSwuLa8UV0tr6xubW+XtnVuZZMLnrp+EiWh5TPIwiLmrAhXyVio4i7yQN73xhY4377iQQRLfqEnKuxEbxsEg8JkiqtVRI65Yr98rV5yqY5Y9D2o5qCBfjaT8gg76SOAjQwSOGIpwCAZJTxs1OEiJ62JKnCAUmDjHPUrkzUjFScGIHdN3SLt2zsa01zmlcft0SkivIKeNA/IkpBOE9Wm2iWcms2Z/yz01OfXdJvT38lwRsQojYv/yzZT/9elaFAY4NTUEVFNqGF2dn2fJTFf0ze0vVSnKkBKncZ/igrBvnLM+28YjTe26t8zE34xSs3rv59oM7/qWNODaz3HOA/eoelZ1ro8r9fN80kXsYR+HNM4T1HGJBlwzxkc84dm6sqQ1saafUquQe3bxbVkPH2hEklM=</latexit><latexit sha1_base64="FJTY+DBhtoXUv+lkt03HovnHItc=">AAACynicjVHLSsNAFD2Nr1pfVZdugkVwVVIR1F3RjRuhgrGFtpRJOm1D82JmIpTizi9wq/8l/oH+hXfGFNQiOiHJmXPPuTP3Xi8NA6kc57VgLSwuLa8UV0tr6xubW+XtnVuZZMLnrp+EiWh5TPIwiLmrAhXyVio4i7yQN73xhY4377iQQRLfqEnKuxEbxsEg8JkiqtVRI65Yr98rV5yqY5Y9D2o5qCBfjaT8gg76SOAjQwSOGIpwCAZJTxs1OEiJ62JKnCAUmDjHPUrkzUjFScGIHdN3SLt2zsa01zmlcft0SkivIKeNA/IkpBOE9Wm2iWcms2Z/yz01OfXdJvT38lwRsQojYv/yzZT/9elaFAY4NTUEVFNqGF2dn2fJTFf0ze0vVSnKkBKncZ/igrBvnLM+28YjTe26t8zE34xSs3rv59oM7/qWNODaz3HOA/eoelZ1ro8r9fN80kXsYR+HNM4T1HGJBlwzxkc84dm6sqQ1saafUquQe3bxbVkPH2hEklM=</latexit><latexit sha1_base64="FJTY+DBhtoXUv+lkt03HovnHItc=">AAACynicjVHLSsNAFD2Nr1pfVZdugkVwVVIR1F3RjRuhgrGFtpRJOm1D82JmIpTizi9wq/8l/oH+hXfGFNQiOiHJmXPPuTP3Xi8NA6kc57VgLSwuLa8UV0tr6xubW+XtnVuZZMLnrp+EiWh5TPIwiLmrAhXyVio4i7yQN73xhY4377iQQRLfqEnKuxEbxsEg8JkiqtVRI65Yr98rV5yqY5Y9D2o5qCBfjaT8gg76SOAjQwSOGIpwCAZJTxs1OEiJ62JKnCAUmDjHPUrkzUjFScGIHdN3SLt2zsa01zmlcft0SkivIKeNA/IkpBOE9Wm2iWcms2Z/yz01OfXdJvT38lwRsQojYv/yzZT/9elaFAY4NTUEVFNqGF2dn2fJTFf0ze0vVSnKkBKncZ/igrBvnLM+28YjTe26t8zE34xSs3rv59oM7/qWNODaz3HOA/eoelZ1ro8r9fN80kXsYR+HNM4T1HGJBlwzxkc84dm6sqQ1saafUquQe3bxbVkPH2hEklM=</latexit><latexit sha1_base64="R5wJmE9xE5SdTz0wZ8is0Q1VbN8=">AAACtHicjVLLSgMxFD0dX7VWrWs3g0VwVaZu1J2gC5cVHFuoRTJpWmMzDzIZoRR/wK0fJ/6B/oU3cQpqEc0wMyfn3nOSm5soUzI3QfBa8ZaWV1bXquu1jXptc2u7Ub/O00JzEfJUpboXsVwomYjQSKNEL9OCxZES3WhyZuPdB6FzmSZXZpqJQczGiRxJzgxRndtGM2gFbviLoF2CJsqRNl5wgyFScBSIIZDAEFZgyOnpo40AGXEDzIjThKSLCzyiRtqCsgRlMGIn9B3TrF+yCc2tZ+7UnFZR9GpS+tgnTUp5mrBdzXfxwjlb9jfvmfO0e5vSPyq9YmIN7oj9SzfP/K/O1mIwwrGrQVJNmWNsdbx0Kdyp2J37X6oy5JARZ/GQ4powd8r5OftOk7va7dkyF39zmZa1c17mFni3u6T+tn92cxGEh62TVnAZoIpd7OGAuniEU1ygg5Ach3jCs3fu3XvZ5zXwKuV92MG34ekPT4SMsA==</latexit><latexit sha1_base64="PDg1iVskRMlKb1vXnPvuo3yKWL8=">AAACv3icjVLLSgMxFD0dX7VWrW7dDBbBVZm6UXeCGzdCBccW2lIy07QNnRdJRijFn3Cr/yX+gf6FN3EKahHNMDMn595zkpubIIuE0p73WnJWVtfWN8qbla3q9s5uba96p9JchtwP0yiVnYApHomE+1roiHcyyVkcRLwdTC9NvH3PpRJpcqtnGe/HbJyIkQiZJqrT0xOu2WA4qNW9hmeHuwyaBaijGK209oIehkgRIkcMjgSacAQGRU8XTXjIiOtjTpwkJGyc4wEV0uaUxSmDETul75hm3YJNaG48lVWHtEpErySliyPSpJQnCZvVXBvPrbNhf/OeW0+ztxn9g8IrJlZjQuxfukXmf3WmFo0RzmwNgmrKLGOqCwuX3J6K2bn7pSpNDhlxBg8pLgmHVrk4Z9dqlK3dnC2z8TebaVgzD4vcHO9ml9Tg5s92LgP/pHHe8G48lHGAQxxTF09xgSu04NvuPeIJz861o5zZ501wSsWV2Me34cw/ABLSkR0=</latexit><latexit sha1_base64="PDg1iVskRMlKb1vXnPvuo3yKWL8=">AAACv3icjVLLSgMxFD0dX7VWrW7dDBbBVZm6UXeCGzdCBccW2lIy07QNnRdJRijFn3Cr/yX+gf6FN3EKahHNMDMn595zkpubIIuE0p73WnJWVtfWN8qbla3q9s5uba96p9JchtwP0yiVnYApHomE+1roiHcyyVkcRLwdTC9NvH3PpRJpcqtnGe/HbJyIkQiZJqrT0xOu2WA4qNW9hmeHuwyaBaijGK209oIehkgRIkcMjgSacAQGRU8XTXjIiOtjTpwkJGyc4wEV0uaUxSmDETul75hm3YJNaG48lVWHtEpErySliyPSpJQnCZvVXBvPrbNhf/OeW0+ztxn9g8IrJlZjQuxfukXmf3WmFo0RzmwNgmrKLGOqCwuX3J6K2bn7pSpNDhlxBg8pLgmHVrk4Z9dqlK3dnC2z8TebaVgzD4vcHO9ml9Tg5s92LgP/pHHe8G48lHGAQxxTF09xgSu04NvuPeIJz861o5zZ501wSsWV2Me34cw/ABLSkR0=</latexit><latexit sha1_base64="pmGTWxK+9yJL0G5/IG3QPG9ps5M=">AAACynicjVHLSsNAFD2Nr1pfVZdugkVwVVI36q7oxo1QwdiCLWWSTtuhaRIyE6EUd36BW/0v8Q/0L7wzTkEtohOSnDn3nDtz7w3SSEjlea8FZ2FxaXmluFpaW9/Y3Cpv79zIJM9C7odJlGStgEkeiZj7SqiIt9KMs3EQ8WYwOtfx5h3PpEjiazVJeWfMBrHoi5ApolptNeSKdXvdcsWrema586BmQQV2NZLyC9roIUGIHGNwxFCEIzBIem5Rg4eUuA6mxGWEhIlz3KNE3pxUnBSM2BF9B7S7tWxMe51TGndIp0T0ZuR0cUCehHQZYX2aa+K5yazZ33JPTU59twn9A5trTKzCkNi/fDPlf326FoU+TkwNgmpKDaOrC22W3HRF39z9UpWiDClxGvconhEOjXPWZ9d4pKld95aZ+JtRalbvQ6vN8a5vSQOu/RznPPCPqqdV78qr1M/spIvYwz4OaZzHqOMCDfhmjI94wrNz6Uhn4kw/pU7BenbxbTkPH2cEkk8=</latexit><latexit sha1_base64="FJTY+DBhtoXUv+lkt03HovnHItc=">AAACynicjVHLSsNAFD2Nr1pfVZdugkVwVVIR1F3RjRuhgrGFtpRJOm1D82JmIpTizi9wq/8l/oH+hXfGFNQiOiHJmXPPuTP3Xi8NA6kc57VgLSwuLa8UV0tr6xubW+XtnVuZZMLnrp+EiWh5TPIwiLmrAhXyVio4i7yQN73xhY4377iQQRLfqEnKuxEbxsEg8JkiqtVRI65Yr98rV5yqY5Y9D2o5qCBfjaT8gg76SOAjQwSOGIpwCAZJTxs1OEiJ62JKnCAUmDjHPUrkzUjFScGIHdN3SLt2zsa01zmlcft0SkivIKeNA/IkpBOE9Wm2iWcms2Z/yz01OfXdJvT38lwRsQojYv/yzZT/9elaFAY4NTUEVFNqGF2dn2fJTFf0ze0vVSnKkBKncZ/igrBvnLM+28YjTe26t8zE34xSs3rv59oM7/qWNODaz3HOA/eoelZ1ro8r9fN80kXsYR+HNM4T1HGJBlwzxkc84dm6sqQ1saafUquQe3bxbVkPH2hEklM=</latexit><latexit sha1_base64="FJTY+DBhtoXUv+lkt03HovnHItc=">AAACynicjVHLSsNAFD2Nr1pfVZdugkVwVVIR1F3RjRuhgrGFtpRJOm1D82JmIpTizi9wq/8l/oH+hXfGFNQiOiHJmXPPuTP3Xi8NA6kc57VgLSwuLa8UV0tr6xubW+XtnVuZZMLnrp+EiWh5TPIwiLmrAhXyVio4i7yQN73xhY4377iQQRLfqEnKuxEbxsEg8JkiqtVRI65Yr98rV5yqY5Y9D2o5qCBfjaT8gg76SOAjQwSOGIpwCAZJTxs1OEiJ62JKnCAUmDjHPUrkzUjFScGIHdN3SLt2zsa01zmlcft0SkivIKeNA/IkpBOE9Wm2iWcms2Z/yz01OfXdJvT38lwRsQojYv/yzZT/9elaFAY4NTUEVFNqGF2dn2fJTFf0ze0vVSnKkBKncZ/igrBvnLM+28YjTe26t8zE34xSs3rv59oM7/qWNODaz3HOA/eoelZ1ro8r9fN80kXsYR+HNM4T1HGJBlwzxkc84dm6sqQ1saafUquQe3bxbVkPH2hEklM=</latexit><latexit sha1_base64="FJTY+DBhtoXUv+lkt03HovnHItc=">AAACynicjVHLSsNAFD2Nr1pfVZdugkVwVVIR1F3RjRuhgrGFtpRJOm1D82JmIpTizi9wq/8l/oH+hXfGFNQiOiHJmXPPuTP3Xi8NA6kc57VgLSwuLa8UV0tr6xubW+XtnVuZZMLnrp+EiWh5TPIwiLmrAhXyVio4i7yQN73xhY4377iQQRLfqEnKuxEbxsEg8JkiqtVRI65Yr98rV5yqY5Y9D2o5qCBfjaT8gg76SOAjQwSOGIpwCAZJTxs1OEiJ62JKnCAUmDjHPUrkzUjFScGIHdN3SLt2zsa01zmlcft0SkivIKeNA/IkpBOE9Wm2iWcms2Z/yz01OfXdJvT38lwRsQojYv/yzZT/9elaFAY4NTUEVFNqGF2dn2fJTFf0ze0vVSnKkBKncZ/igrBvnLM+28YjTe26t8zE34xSs3rv59oM7/qWNODaz3HOA/eoelZ1ro8r9fN80kXsYR+HNM4T1HGJBlwzxkc84dm6sqQ1saafUquQe3bxbVkPH2hEklM=</latexit><latexit sha1_base64="FJTY+DBhtoXUv+lkt03HovnHItc=">AAACynicjVHLSsNAFD2Nr1pfVZdugkVwVVIR1F3RjRuhgrGFtpRJOm1D82JmIpTizi9wq/8l/oH+hXfGFNQiOiHJmXPPuTP3Xi8NA6kc57VgLSwuLa8UV0tr6xubW+XtnVuZZMLnrp+EiWh5TPIwiLmrAhXyVio4i7yQN73xhY4377iQQRLfqEnKuxEbxsEg8JkiqtVRI65Yr98rV5yqY5Y9D2o5qCBfjaT8gg76SOAjQwSOGIpwCAZJTxs1OEiJ62JKnCAUmDjHPUrkzUjFScGIHdN3SLt2zsa01zmlcft0SkivIKeNA/IkpBOE9Wm2iWcms2Z/yz01OfXdJvT38lwRsQojYv/yzZT/9elaFAY4NTUEVFNqGF2dn2fJTFf0ze0vVSnKkBKncZ/igrBvnLM+28YjTe26t8zE34xSs3rv59oM7/qWNODaz3HOA/eoelZ1ro8r9fN80kXsYR+HNM4T1HGJBlwzxkc84dm6sqQ1saafUquQe3bxbVkPH2hEklM=</latexit><latexit sha1_base64="FJTY+DBhtoXUv+lkt03HovnHItc=">AAACynicjVHLSsNAFD2Nr1pfVZdugkVwVVIR1F3RjRuhgrGFtpRJOm1D82JmIpTizi9wq/8l/oH+hXfGFNQiOiHJmXPPuTP3Xi8NA6kc57VgLSwuLa8UV0tr6xubW+XtnVuZZMLnrp+EiWh5TPIwiLmrAhXyVio4i7yQN73xhY4377iQQRLfqEnKuxEbxsEg8JkiqtVRI65Yr98rV5yqY5Y9D2o5qCBfjaT8gg76SOAjQwSOGIpwCAZJTxs1OEiJ62JKnCAUmDjHPUrkzUjFScGIHdN3SLt2zsa01zmlcft0SkivIKeNA/IkpBOE9Wm2iWcms2Z/yz01OfXdJvT38lwRsQojYv/yzZT/9elaFAY4NTUEVFNqGF2dn2fJTFf0ze0vVSnKkBKncZ/igrBvnLM+28YjTe26t8zE34xSs3rv59oM7/qWNODaz3HOA/eoelZ1ro8r9fN80kXsYR+HNM4T1HGJBlwzxkc84dm6sqQ1saafUquQe3bxbVkPH2hEklM=</latexit>

↵<latexit sha1_base64="6TBYvnDQThMN6duS14jYlr8FugI=">AAACyHicjVHLSsNAFD2Nr1pfVZdugkVwVVIR1F3RjeCmgmkLbZFJOm2jaRInE6UUN36BW/0x8Q/0L7wzTkEtohOSnDn3njNz7/WSMEil47zmrJnZufmF/GJhaXllda24vlFP40z43PXjMBZNj6U8DCLuykCGvJkIzoZeyBve9YmKN265SIM4upCjhHeGrB8FvcBnkqh6m4XJgF0WS07Z0cueBhUDSjCrFhdf0EYXMXxkGIIjgiQcgiGlp4UKHCTEdTAmThAKdJzjHgXSZpTFKYMRe03fPu1aho1orzxTrfbplJBeQUobO6SJKU8QVqfZOp5pZ8X+5j3WnupuI/p7xmtIrMSA2L90k8z/6lQtEj0c6hoCqinRjKrONy6Z7oq6uf2lKkkOCXEKdykuCPtaOemzrTWprl31lun4m85UrNr7JjfDu7olDbjyc5zTwN0rH5Wd8/1S9dhMOo8tbGOXxnmAKk5Rg0vWV3jEE56tM+vGurNGn6lWzmg28W1ZDx8izJFs</latexit><latexit sha1_base64="6TBYvnDQThMN6duS14jYlr8FugI=">AAACyHicjVHLSsNAFD2Nr1pfVZdugkVwVVIR1F3RjeCmgmkLbZFJOm2jaRInE6UUN36BW/0x8Q/0L7wzTkEtohOSnDn3njNz7/WSMEil47zmrJnZufmF/GJhaXllda24vlFP40z43PXjMBZNj6U8DCLuykCGvJkIzoZeyBve9YmKN265SIM4upCjhHeGrB8FvcBnkqh6m4XJgF0WS07Z0cueBhUDSjCrFhdf0EYXMXxkGIIjgiQcgiGlp4UKHCTEdTAmThAKdJzjHgXSZpTFKYMRe03fPu1aho1orzxTrfbplJBeQUobO6SJKU8QVqfZOp5pZ8X+5j3WnupuI/p7xmtIrMSA2L90k8z/6lQtEj0c6hoCqinRjKrONy6Z7oq6uf2lKkkOCXEKdykuCPtaOemzrTWprl31lun4m85UrNr7JjfDu7olDbjyc5zTwN0rH5Wd8/1S9dhMOo8tbGOXxnmAKk5Rg0vWV3jEE56tM+vGurNGn6lWzmg28W1ZDx8izJFs</latexit><latexit sha1_base64="6TBYvnDQThMN6duS14jYlr8FugI=">AAACyHicjVHLSsNAFD2Nr1pfVZdugkVwVVIR1F3RjeCmgmkLbZFJOm2jaRInE6UUN36BW/0x8Q/0L7wzTkEtohOSnDn3njNz7/WSMEil47zmrJnZufmF/GJhaXllda24vlFP40z43PXjMBZNj6U8DCLuykCGvJkIzoZeyBve9YmKN265SIM4upCjhHeGrB8FvcBnkqh6m4XJgF0WS07Z0cueBhUDSjCrFhdf0EYXMXxkGIIjgiQcgiGlp4UKHCTEdTAmThAKdJzjHgXSZpTFKYMRe03fPu1aho1orzxTrfbplJBeQUobO6SJKU8QVqfZOp5pZ8X+5j3WnupuI/p7xmtIrMSA2L90k8z/6lQtEj0c6hoCqinRjKrONy6Z7oq6uf2lKkkOCXEKdykuCPtaOemzrTWprl31lun4m85UrNr7JjfDu7olDbjyc5zTwN0rH5Wd8/1S9dhMOo8tbGOXxnmAKk5Rg0vWV3jEE56tM+vGurNGn6lWzmg28W1ZDx8izJFs</latexit>

�zd,n<latexit sha1_base64="BB1mi7S9ERtKeYqiMSyNiijxaCc=">AAAC0HicjVHLSsNAFD2Nr1pfVZdugkVwISUVQd0V3bisaG2hrSVJp+3QvJhMhFqKuPUL3OpPiX+gf+GdMQW1iE5Icubce87MvdeJPB5Ly3rNGDOzc/ML2cXc0vLK6lp+feMqDhPhsqobeqGoO3bMPB6wquTSY/VIMNt3PFZzBqcqXrthIuZhcCmHEWv5di/gXe7akqjrZtTn7dFte9TZC8bjdr5gFS29zGlQSkEB6aqE+Rc00UEIFwl8MASQhD3YiOlpoAQLEXEtjIgThLiOM4yRI21CWYwybGIH9O3RrpGyAe2VZ6zVLp3i0StIaWKHNCHlCcLqNFPHE+2s2N+8R9pT3W1Ifyf18omV6BP7l26S+V+dqkWiiyNdA6eaIs2o6tzUJdFdUTc3v1QlySEiTuEOxQVhVysnfTa1Jta1q97aOv6mMxWr9m6am+Bd3ZIGXPo5zmlQ3S8eF63zg0L5JJ10FlvYxi6N8xBlnKGCKlkLPOIJz8aFMTTujPvPVCOTajbxbRkPH3QHlR0=</latexit><latexit sha1_base64="BB1mi7S9ERtKeYqiMSyNiijxaCc=">AAAC0HicjVHLSsNAFD2Nr1pfVZdugkVwISUVQd0V3bisaG2hrSVJp+3QvJhMhFqKuPUL3OpPiX+gf+GdMQW1iE5Icubce87MvdeJPB5Ly3rNGDOzc/ML2cXc0vLK6lp+feMqDhPhsqobeqGoO3bMPB6wquTSY/VIMNt3PFZzBqcqXrthIuZhcCmHEWv5di/gXe7akqjrZtTn7dFte9TZC8bjdr5gFS29zGlQSkEB6aqE+Rc00UEIFwl8MASQhD3YiOlpoAQLEXEtjIgThLiOM4yRI21CWYwybGIH9O3RrpGyAe2VZ6zVLp3i0StIaWKHNCHlCcLqNFPHE+2s2N+8R9pT3W1Ifyf18omV6BP7l26S+V+dqkWiiyNdA6eaIs2o6tzUJdFdUTc3v1QlySEiTuEOxQVhVysnfTa1Jta1q97aOv6mMxWr9m6am+Bd3ZIGXPo5zmlQ3S8eF63zg0L5JJ10FlvYxi6N8xBlnKGCKlkLPOIJz8aFMTTujPvPVCOTajbxbRkPH3QHlR0=</latexit><latexit sha1_base64="BB1mi7S9ERtKeYqiMSyNiijxaCc=">AAAC0HicjVHLSsNAFD2Nr1pfVZdugkVwISUVQd0V3bisaG2hrSVJp+3QvJhMhFqKuPUL3OpPiX+gf+GdMQW1iE5Icubce87MvdeJPB5Ly3rNGDOzc/ML2cXc0vLK6lp+feMqDhPhsqobeqGoO3bMPB6wquTSY/VIMNt3PFZzBqcqXrthIuZhcCmHEWv5di/gXe7akqjrZtTn7dFte9TZC8bjdr5gFS29zGlQSkEB6aqE+Rc00UEIFwl8MASQhD3YiOlpoAQLEXEtjIgThLiOM4yRI21CWYwybGIH9O3RrpGyAe2VZ6zVLp3i0StIaWKHNCHlCcLqNFPHE+2s2N+8R9pT3W1Ifyf18omV6BP7l26S+V+dqkWiiyNdA6eaIs2o6tzUJdFdUTc3v1QlySEiTuEOxQVhVysnfTa1Jta1q97aOv6mMxWr9m6am+Bd3ZIGXPo5zmlQ3S8eF63zg0L5JJ10FlvYxi6N8xBlnKGCKlkLPOIJz8aFMTTujPvPVCOTajbxbRkPH3QHlR0=</latexit><latexit sha1_base64="R5wJmE9xE5SdTz0wZ8is0Q1VbN8=">AAACtHicjVLLSgMxFD0dX7VWrWs3g0VwVaZu1J2gC5cVHFuoRTJpWmMzDzIZoRR/wK0fJ/6B/oU3cQpqEc0wMyfn3nOSm5soUzI3QfBa8ZaWV1bXquu1jXptc2u7Ub/O00JzEfJUpboXsVwomYjQSKNEL9OCxZES3WhyZuPdB6FzmSZXZpqJQczGiRxJzgxRndtGM2gFbviLoF2CJsqRNl5wgyFScBSIIZDAEFZgyOnpo40AGXEDzIjThKSLCzyiRtqCsgRlMGIn9B3TrF+yCc2tZ+7UnFZR9GpS+tgnTUp5mrBdzXfxwjlb9jfvmfO0e5vSPyq9YmIN7oj9SzfP/K/O1mIwwrGrQVJNmWNsdbx0Kdyp2J37X6oy5JARZ/GQ4powd8r5OftOk7va7dkyF39zmZa1c17mFni3u6T+tn92cxGEh62TVnAZoIpd7OGAuniEU1ygg5Ach3jCs3fu3XvZ5zXwKuV92MG34ekPT4SMsA==</latexit><latexit sha1_base64="YSrsrosmhdlczImyVoNVtpuq4Qg=">AAACxXicjVLLSsNAFD2Nr1qrVrdugkVwISV1o+4ENy4rGltoa0nSaTs0LyYToYYi/oVb/SnxD/QvvDOmoBbRCUnOnHvPmblzx419nkjLei0YC4tLyyvF1dJaeX1js7JVvk6iVHjM9iI/Ei3XSZjPQ2ZLLn3WigVzAtdnTXd8puLNWyYSHoVXchKzbuAMQz7gniOJuunEI97L7npZ/yCcTnuVqlWz9DDnQT0HVeSjEVVe0EEfETykCMAQQhL24SChp406LMTEdZERJwhxHWeYokTalLIYZTjEjuk7pFk7Z0OaK89Eqz1axadXkNLEHmkiyhOE1WqmjqfaWbG/eWfaU+1tQn839wqIlRgR+5dulvlfnapFYoBjXQOnmmLNqOq83CXVp6J2bn6pSpJDTJzCfYoLwp5Wzs7Z1JpE167O1tHxN52pWDX38twU72qX1OD6z3bOA/uwdlKzLiwUsYNd7FMXj3CKczRgk6PAI57wbFwaE+P+8yYYhfxKbOPbMB4+AAD7k+E=</latexit><latexit sha1_base64="YSrsrosmhdlczImyVoNVtpuq4Qg=">AAACxXicjVLLSsNAFD2Nr1qrVrdugkVwISV1o+4ENy4rGltoa0nSaTs0LyYToYYi/oVb/SnxD/QvvDOmoBbRCUnOnHvPmblzx419nkjLei0YC4tLyyvF1dJaeX1js7JVvk6iVHjM9iI/Ei3XSZjPQ2ZLLn3WigVzAtdnTXd8puLNWyYSHoVXchKzbuAMQz7gniOJuunEI97L7npZ/yCcTnuVqlWz9DDnQT0HVeSjEVVe0EEfETykCMAQQhL24SChp406LMTEdZERJwhxHWeYokTalLIYZTjEjuk7pFk7Z0OaK89Eqz1axadXkNLEHmkiyhOE1WqmjqfaWbG/eWfaU+1tQn839wqIlRgR+5dulvlfnapFYoBjXQOnmmLNqOq83CXVp6J2bn6pSpJDTJzCfYoLwp5Wzs7Z1JpE167O1tHxN52pWDX38twU72qX1OD6z3bOA/uwdlKzLiwUsYNd7FMXj3CKczRgk6PAI57wbFwaE+P+8yYYhfxKbOPbMB4+AAD7k+E=</latexit><latexit sha1_base64="SrrMRHNIXhUQvhI21vskZjTpnfM=">AAAC0HicjVHLTsJAFD3UF+ILdemmkZi4MKS4UXdENy4xWjEBJG0ZYEJpm+nUBBti3PoFbvWnjH+gf+GdsSQqMTpN2zPn3nNm7r1u5PNYWtZrzpiZnZtfyC8WlpZXVteK6xuXcZgIj9le6IfiynVi5vOA2ZJLn11FgjlD12d1d3Ci4vUbJmIeBhdyFLHW0OkFvMs9RxJ13Yz6vJ3ettPOXjAet4slq2zpZU6DSgZKyFYtLL6giQ5CeEgwBEMASdiHg5ieBiqwEBHXQkqcIMR1nGGMAmkTymKU4RA7oG+Pdo2MDWivPGOt9ugUn15BShM7pAkpTxBWp5k6nmhnxf7mnWpPdbcR/d3Ma0isRJ/Yv3STzP/qVC0SXRzqGjjVFGlGVedlLonuirq5+aUqSQ4RcQp3KC4Ie1o56bOpNbGuXfXW0fE3nalYtfey3ATv6pY04MrPcU4De798VLbOrFL1OJt0HlvYxi6N8wBVnKIGm6wFHvGEZ+PcGBl3xv1nqpHLNJv4toyHD3LHlRk=</latexit><latexit sha1_base64="BB1mi7S9ERtKeYqiMSyNiijxaCc=">AAAC0HicjVHLSsNAFD2Nr1pfVZdugkVwISUVQd0V3bisaG2hrSVJp+3QvJhMhFqKuPUL3OpPiX+gf+GdMQW1iE5Icubce87MvdeJPB5Ly3rNGDOzc/ML2cXc0vLK6lp+feMqDhPhsqobeqGoO3bMPB6wquTSY/VIMNt3PFZzBqcqXrthIuZhcCmHEWv5di/gXe7akqjrZtTn7dFte9TZC8bjdr5gFS29zGlQSkEB6aqE+Rc00UEIFwl8MASQhD3YiOlpoAQLEXEtjIgThLiOM4yRI21CWYwybGIH9O3RrpGyAe2VZ6zVLp3i0StIaWKHNCHlCcLqNFPHE+2s2N+8R9pT3W1Ifyf18omV6BP7l26S+V+dqkWiiyNdA6eaIs2o6tzUJdFdUTc3v1QlySEiTuEOxQVhVysnfTa1Jta1q97aOv6mMxWr9m6am+Bd3ZIGXPo5zmlQ3S8eF63zg0L5JJ10FlvYxi6N8xBlnKGCKlkLPOIJz8aFMTTujPvPVCOTajbxbRkPH3QHlR0=</latexit><latexit sha1_base64="BB1mi7S9ERtKeYqiMSyNiijxaCc=">AAAC0HicjVHLSsNAFD2Nr1pfVZdugkVwISUVQd0V3bisaG2hrSVJp+3QvJhMhFqKuPUL3OpPiX+gf+GdMQW1iE5Icubce87MvdeJPB5Ly3rNGDOzc/ML2cXc0vLK6lp+feMqDhPhsqobeqGoO3bMPB6wquTSY/VIMNt3PFZzBqcqXrthIuZhcCmHEWv5di/gXe7akqjrZtTn7dFte9TZC8bjdr5gFS29zGlQSkEB6aqE+Rc00UEIFwl8MASQhD3YiOlpoAQLEXEtjIgThLiOM4yRI21CWYwybGIH9O3RrpGyAe2VZ6zVLp3i0StIaWKHNCHlCcLqNFPHE+2s2N+8R9pT3W1Ifyf18omV6BP7l26S+V+dqkWiiyNdA6eaIs2o6tzUJdFdUTc3v1QlySEiTuEOxQVhVysnfTa1Jta1q97aOv6mMxWr9m6am+Bd3ZIGXPo5zmlQ3S8eF63zg0L5JJ10FlvYxi6N8xBlnKGCKlkLPOIJz8aFMTTujPvPVCOTajbxbRkPH3QHlR0=</latexit><latexit sha1_base64="BB1mi7S9ERtKeYqiMSyNiijxaCc=">AAAC0HicjVHLSsNAFD2Nr1pfVZdugkVwISUVQd0V3bisaG2hrSVJp+3QvJhMhFqKuPUL3OpPiX+gf+GdMQW1iE5Icubce87MvdeJPB5Ly3rNGDOzc/ML2cXc0vLK6lp+feMqDhPhsqobeqGoO3bMPB6wquTSY/VIMNt3PFZzBqcqXrthIuZhcCmHEWv5di/gXe7akqjrZtTn7dFte9TZC8bjdr5gFS29zGlQSkEB6aqE+Rc00UEIFwl8MASQhD3YiOlpoAQLEXEtjIgThLiOM4yRI21CWYwybGIH9O3RrpGyAe2VZ6zVLp3i0StIaWKHNCHlCcLqNFPHE+2s2N+8R9pT3W1Ifyf18omV6BP7l26S+V+dqkWiiyNdA6eaIs2o6tzUJdFdUTc3v1QlySEiTuEOxQVhVysnfTa1Jta1q97aOv6mMxWr9m6am+Bd3ZIGXPo5zmlQ3S8eF63zg0L5JJ10FlvYxi6N8xBlnKGCKlkLPOIJz8aFMTTujPvPVCOTajbxbRkPH3QHlR0=</latexit><latexit sha1_base64="BB1mi7S9ERtKeYqiMSyNiijxaCc=">AAAC0HicjVHLSsNAFD2Nr1pfVZdugkVwISUVQd0V3bisaG2hrSVJp+3QvJhMhFqKuPUL3OpPiX+gf+GdMQW1iE5Icubce87MvdeJPB5Ly3rNGDOzc/ML2cXc0vLK6lp+feMqDhPhsqobeqGoO3bMPB6wquTSY/VIMNt3PFZzBqcqXrthIuZhcCmHEWv5di/gXe7akqjrZtTn7dFte9TZC8bjdr5gFS29zGlQSkEB6aqE+Rc00UEIFwl8MASQhD3YiOlpoAQLEXEtjIgThLiOM4yRI21CWYwybGIH9O3RrpGyAe2VZ6zVLp3i0StIaWKHNCHlCcLqNFPHE+2s2N+8R9pT3W1Ifyf18omV6BP7l26S+V+dqkWiiyNdA6eaIs2o6tzUJdFdUTc3v1QlySEiTuEOxQVhVysnfTa1Jta1q97aOv6mMxWr9m6am+Bd3ZIGXPo5zmlQ3S8eF63zg0L5JJ10FlvYxi6N8xBlnKGCKlkLPOIJz8aFMTTujPvPVCOTajbxbRkPH3QHlR0=</latexit><latexit sha1_base64="BB1mi7S9ERtKeYqiMSyNiijxaCc=">AAAC0HicjVHLSsNAFD2Nr1pfVZdugkVwISUVQd0V3bisaG2hrSVJp+3QvJhMhFqKuPUL3OpPiX+gf+GdMQW1iE5Icubce87MvdeJPB5Ly3rNGDOzc/ML2cXc0vLK6lp+feMqDhPhsqobeqGoO3bMPB6wquTSY/VIMNt3PFZzBqcqXrthIuZhcCmHEWv5di/gXe7akqjrZtTn7dFte9TZC8bjdr5gFS29zGlQSkEB6aqE+Rc00UEIFwl8MASQhD3YiOlpoAQLEXEtjIgThLiOM4yRI21CWYwybGIH9O3RrpGyAe2VZ6zVLp3i0StIaWKHNCHlCcLqNFPHE+2s2N+8R9pT3W1Ifyf18omV6BP7l26S+V+dqkWiiyNdA6eaIs2o6tzUJdFdUTc3v1QlySEiTuEOxQVhVysnfTa1Jta1q97aOv6mMxWr9m6am+Bd3ZIGXPo5zmlQ3S8eF63zg0L5JJ10FlvYxi6N8xBlnKGCKlkLPOIJz8aFMTTujPvPVCOTajbxbRkPH3QHlR0=</latexit><latexit sha1_base64="R5wJmE9xE5SdTz0wZ8is0Q1VbN8=">AAACtHicjVLLSgMxFD0dX7VWrWs3g0VwVaZu1J2gC5cVHFuoRTJpWmMzDzIZoRR/wK0fJ/6B/oU3cQpqEc0wMyfn3nOSm5soUzI3QfBa8ZaWV1bXquu1jXptc2u7Ub/O00JzEfJUpboXsVwomYjQSKNEL9OCxZES3WhyZuPdB6FzmSZXZpqJQczGiRxJzgxRndtGM2gFbviLoF2CJsqRNl5wgyFScBSIIZDAEFZgyOnpo40AGXEDzIjThKSLCzyiRtqCsgRlMGIn9B3TrF+yCc2tZ+7UnFZR9GpS+tgnTUp5mrBdzXfxwjlb9jfvmfO0e5vSPyq9YmIN7oj9SzfP/K/O1mIwwrGrQVJNmWNsdbx0Kdyp2J37X6oy5JARZ/GQ4powd8r5OftOk7va7dkyF39zmZa1c17mFni3u6T+tn92cxGEh62TVnAZoIpd7OGAuniEU1ygg5Ach3jCs3fu3XvZ5zXwKuV92MG34ekPT4SMsA==</latexit><latexit sha1_base64="YSrsrosmhdlczImyVoNVtpuq4Qg=">AAACxXicjVLLSsNAFD2Nr1qrVrdugkVwISV1o+4ENy4rGltoa0nSaTs0LyYToYYi/oVb/SnxD/QvvDOmoBbRCUnOnHvPmblzx419nkjLei0YC4tLyyvF1dJaeX1js7JVvk6iVHjM9iI/Ei3XSZjPQ2ZLLn3WigVzAtdnTXd8puLNWyYSHoVXchKzbuAMQz7gniOJuunEI97L7npZ/yCcTnuVqlWz9DDnQT0HVeSjEVVe0EEfETykCMAQQhL24SChp406LMTEdZERJwhxHWeYokTalLIYZTjEjuk7pFk7Z0OaK89Eqz1axadXkNLEHmkiyhOE1WqmjqfaWbG/eWfaU+1tQn839wqIlRgR+5dulvlfnapFYoBjXQOnmmLNqOq83CXVp6J2bn6pSpJDTJzCfYoLwp5Wzs7Z1JpE167O1tHxN52pWDX38twU72qX1OD6z3bOA/uwdlKzLiwUsYNd7FMXj3CKczRgk6PAI57wbFwaE+P+8yYYhfxKbOPbMB4+AAD7k+E=</latexit><latexit sha1_base64="YSrsrosmhdlczImyVoNVtpuq4Qg=">AAACxXicjVLLSsNAFD2Nr1qrVrdugkVwISV1o+4ENy4rGltoa0nSaTs0LyYToYYi/oVb/SnxD/QvvDOmoBbRCUnOnHvPmblzx419nkjLei0YC4tLyyvF1dJaeX1js7JVvk6iVHjM9iI/Ei3XSZjPQ2ZLLn3WigVzAtdnTXd8puLNWyYSHoVXchKzbuAMQz7gniOJuunEI97L7npZ/yCcTnuVqlWz9DDnQT0HVeSjEVVe0EEfETykCMAQQhL24SChp406LMTEdZERJwhxHWeYokTalLIYZTjEjuk7pFk7Z0OaK89Eqz1axadXkNLEHmkiyhOE1WqmjqfaWbG/eWfaU+1tQn839wqIlRgR+5dulvlfnapFYoBjXQOnmmLNqOq83CXVp6J2bn6pSpJDTJzCfYoLwp5Wzs7Z1JpE167O1tHxN52pWDX38twU72qX1OD6z3bOA/uwdlKzLiwUsYNd7FMXj3CKczRgk6PAI57wbFwaE+P+8yYYhfxKbOPbMB4+AAD7k+E=</latexit><latexit sha1_base64="SrrMRHNIXhUQvhI21vskZjTpnfM=">AAAC0HicjVHLTsJAFD3UF+ILdemmkZi4MKS4UXdENy4xWjEBJG0ZYEJpm+nUBBti3PoFbvWnjH+gf+GdsSQqMTpN2zPn3nNm7r1u5PNYWtZrzpiZnZtfyC8WlpZXVteK6xuXcZgIj9le6IfiynVi5vOA2ZJLn11FgjlD12d1d3Ci4vUbJmIeBhdyFLHW0OkFvMs9RxJ13Yz6vJ3ettPOXjAet4slq2zpZU6DSgZKyFYtLL6giQ5CeEgwBEMASdiHg5ieBiqwEBHXQkqcIMR1nGGMAmkTymKU4RA7oG+Pdo2MDWivPGOt9ugUn15BShM7pAkpTxBWp5k6nmhnxf7mnWpPdbcR/d3Ma0isRJ/Yv3STzP/qVC0SXRzqGjjVFGlGVedlLonuirq5+aUqSQ4RcQp3KC4Ie1o56bOpNbGuXfXW0fE3nalYtfey3ATv6pY04MrPcU4De798VLbOrFL1OJt0HlvYxi6N8wBVnKIGm6wFHvGEZ+PcGBl3xv1nqpHLNJv4toyHD3LHlRk=</latexit><latexit sha1_base64="BB1mi7S9ERtKeYqiMSyNiijxaCc=">AAAC0HicjVHLSsNAFD2Nr1pfVZdugkVwISUVQd0V3bisaG2hrSVJp+3QvJhMhFqKuPUL3OpPiX+gf+GdMQW1iE5Icubce87MvdeJPB5Ly3rNGDOzc/ML2cXc0vLK6lp+feMqDhPhsqobeqGoO3bMPB6wquTSY/VIMNt3PFZzBqcqXrthIuZhcCmHEWv5di/gXe7akqjrZtTn7dFte9TZC8bjdr5gFS29zGlQSkEB6aqE+Rc00UEIFwl8MASQhD3YiOlpoAQLEXEtjIgThLiOM4yRI21CWYwybGIH9O3RrpGyAe2VZ6zVLp3i0StIaWKHNCHlCcLqNFPHE+2s2N+8R9pT3W1Ifyf18omV6BP7l26S+V+dqkWiiyNdA6eaIs2o6tzUJdFdUTc3v1QlySEiTuEOxQVhVysnfTa1Jta1q97aOv6mMxWr9m6am+Bd3ZIGXPo5zmlQ3S8eF63zg0L5JJ10FlvYxi6N8xBlnKGCKlkLPOIJz8aFMTTujPvPVCOTajbxbRkPH3QHlR0=</latexit><latexit sha1_base64="BB1mi7S9ERtKeYqiMSyNiijxaCc=">AAAC0HicjVHLSsNAFD2Nr1pfVZdugkVwISUVQd0V3bisaG2hrSVJp+3QvJhMhFqKuPUL3OpPiX+gf+GdMQW1iE5Icubce87MvdeJPB5Ly3rNGDOzc/ML2cXc0vLK6lp+feMqDhPhsqobeqGoO3bMPB6wquTSY/VIMNt3PFZzBqcqXrthIuZhcCmHEWv5di/gXe7akqjrZtTn7dFte9TZC8bjdr5gFS29zGlQSkEB6aqE+Rc00UEIFwl8MASQhD3YiOlpoAQLEXEtjIgThLiOM4yRI21CWYwybGIH9O3RrpGyAe2VZ6zVLp3i0StIaWKHNCHlCcLqNFPHE+2s2N+8R9pT3W1Ifyf18omV6BP7l26S+V+dqkWiiyNdA6eaIs2o6tzUJdFdUTc3v1QlySEiTuEOxQVhVysnfTa1Jta1q97aOv6mMxWr9m6am+Bd3ZIGXPo5zmlQ3S8eF63zg0L5JJ10FlvYxi6N8xBlnKGCKlkLPOIJz8aFMTTujPvPVCOTajbxbRkPH3QHlR0=</latexit><latexit sha1_base64="BB1mi7S9ERtKeYqiMSyNiijxaCc=">AAAC0HicjVHLSsNAFD2Nr1pfVZdugkVwISUVQd0V3bisaG2hrSVJp+3QvJhMhFqKuPUL3OpPiX+gf+GdMQW1iE5Icubce87MvdeJPB5Ly3rNGDOzc/ML2cXc0vLK6lp+feMqDhPhsqobeqGoO3bMPB6wquTSY/VIMNt3PFZzBqcqXrthIuZhcCmHEWv5di/gXe7akqjrZtTn7dFte9TZC8bjdr5gFS29zGlQSkEB6aqE+Rc00UEIFwl8MASQhD3YiOlpoAQLEXEtjIgThLiOM4yRI21CWYwybGIH9O3RrpGyAe2VZ6zVLp3i0StIaWKHNCHlCcLqNFPHE+2s2N+8R9pT3W1Ifyf18omV6BP7l26S+V+dqkWiiyNdA6eaIs2o6tzUJdFdUTc3v1QlySEiTuEOxQVhVysnfTa1Jta1q97aOv6mMxWr9m6am+Bd3ZIGXPo5zmlQ3S8eF63zg0L5JJ10FlvYxi6N8xBlnKGCKlkLPOIJz8aFMTTujPvPVCOTajbxbRkPH3QHlR0=</latexit><latexit sha1_base64="BB1mi7S9ERtKeYqiMSyNiijxaCc=">AAAC0HicjVHLSsNAFD2Nr1pfVZdugkVwISUVQd0V3bisaG2hrSVJp+3QvJhMhFqKuPUL3OpPiX+gf+GdMQW1iE5Icubce87MvdeJPB5Ly3rNGDOzc/ML2cXc0vLK6lp+feMqDhPhsqobeqGoO3bMPB6wquTSY/VIMNt3PFZzBqcqXrthIuZhcCmHEWv5di/gXe7akqjrZtTn7dFte9TZC8bjdr5gFS29zGlQSkEB6aqE+Rc00UEIFwl8MASQhD3YiOlpoAQLEXEtjIgThLiOM4yRI21CWYwybGIH9O3RrpGyAe2VZ6zVLp3i0StIaWKHNCHlCcLqNFPHE+2s2N+8R9pT3W1Ifyf18omV6BP7l26S+V+dqkWiiyNdA6eaIs2o6tzUJdFdUTc3v1QlySEiTuEOxQVhVysnfTa1Jta1q97aOv6mMxWr9m6am+Bd3ZIGXPo5zmlQ3S8eF63zg0L5JJ10FlvYxi6N8xBlnKGCKlkLPOIJz8aFMTTujPvPVCOTajbxbRkPH3QHlR0=</latexit><latexit sha1_base64="BB1mi7S9ERtKeYqiMSyNiijxaCc=">AAAC0HicjVHLSsNAFD2Nr1pfVZdugkVwISUVQd0V3bisaG2hrSVJp+3QvJhMhFqKuPUL3OpPiX+gf+GdMQW1iE5Icubce87MvdeJPB5Ly3rNGDOzc/ML2cXc0vLK6lp+feMqDhPhsqobeqGoO3bMPB6wquTSY/VIMNt3PFZzBqcqXrthIuZhcCmHEWv5di/gXe7akqjrZtTn7dFte9TZC8bjdr5gFS29zGlQSkEB6aqE+Rc00UEIFwl8MASQhD3YiOlpoAQLEXEtjIgThLiOM4yRI21CWYwybGIH9O3RrpGyAe2VZ6zVLp3i0StIaWKHNCHlCcLqNFPHE+2s2N+8R9pT3W1Ifyf18omV6BP7l26S+V+dqkWiiyNdA6eaIs2o6tzUJdFdUTc3v1QlySEiTuEOxQVhVysnfTa1Jta1q97aOv6mMxWr9m6am+Bd3ZIGXPo5zmlQ3S8eF63zg0L5JJ10FlvYxi6N8xBlnKGCKlkLPOIJz8aFMTTujPvPVCOTajbxbRkPH3QHlR0=</latexit>

✓d<latexit sha1_base64="FJTY+DBhtoXUv+lkt03HovnHItc=">AAACynicjVHLSsNAFD2Nr1pfVZdugkVwVVIR1F3RjRuhgrGFtpRJOm1D82JmIpTizi9wq/8l/oH+hXfGFNQiOiHJmXPPuTP3Xi8NA6kc57VgLSwuLa8UV0tr6xubW+XtnVuZZMLnrp+EiWh5TPIwiLmrAhXyVio4i7yQN73xhY4377iQQRLfqEnKuxEbxsEg8JkiqtVRI65Yr98rV5yqY5Y9D2o5qCBfjaT8gg76SOAjQwSOGIpwCAZJTxs1OEiJ62JKnCAUmDjHPUrkzUjFScGIHdN3SLt2zsa01zmlcft0SkivIKeNA/IkpBOE9Wm2iWcms2Z/yz01OfXdJvT38lwRsQojYv/yzZT/9elaFAY4NTUEVFNqGF2dn2fJTFf0ze0vVSnKkBKncZ/igrBvnLM+28YjTe26t8zE34xSs3rv59oM7/qWNODaz3HOA/eoelZ1ro8r9fN80kXsYR+HNM4T1HGJBlwzxkc84dm6sqQ1saafUquQe3bxbVkPH2hEklM=</latexit><latexit sha1_base64="FJTY+DBhtoXUv+lkt03HovnHItc=">AAACynicjVHLSsNAFD2Nr1pfVZdugkVwVVIR1F3RjRuhgrGFtpRJOm1D82JmIpTizi9wq/8l/oH+hXfGFNQiOiHJmXPPuTP3Xi8NA6kc57VgLSwuLa8UV0tr6xubW+XtnVuZZMLnrp+EiWh5TPIwiLmrAhXyVio4i7yQN73xhY4377iQQRLfqEnKuxEbxsEg8JkiqtVRI65Yr98rV5yqY5Y9D2o5qCBfjaT8gg76SOAjQwSOGIpwCAZJTxs1OEiJ62JKnCAUmDjHPUrkzUjFScGIHdN3SLt2zsa01zmlcft0SkivIKeNA/IkpBOE9Wm2iWcms2Z/yz01OfXdJvT38lwRsQojYv/yzZT/9elaFAY4NTUEVFNqGF2dn2fJTFf0ze0vVSnKkBKncZ/igrBvnLM+28YjTe26t8zE34xSs3rv59oM7/qWNODaz3HOA/eoelZ1ro8r9fN80kXsYR+HNM4T1HGJBlwzxkc84dm6sqQ1saafUquQe3bxbVkPH2hEklM=</latexit><latexit sha1_base64="FJTY+DBhtoXUv+lkt03HovnHItc=">AAACynicjVHLSsNAFD2Nr1pfVZdugkVwVVIR1F3RjRuhgrGFtpRJOm1D82JmIpTizi9wq/8l/oH+hXfGFNQiOiHJmXPPuTP3Xi8NA6kc57VgLSwuLa8UV0tr6xubW+XtnVuZZMLnrp+EiWh5TPIwiLmrAhXyVio4i7yQN73xhY4377iQQRLfqEnKuxEbxsEg8JkiqtVRI65Yr98rV5yqY5Y9D2o5qCBfjaT8gg76SOAjQwSOGIpwCAZJTxs1OEiJ62JKnCAUmDjHPUrkzUjFScGIHdN3SLt2zsa01zmlcft0SkivIKeNA/IkpBOE9Wm2iWcms2Z/yz01OfXdJvT38lwRsQojYv/yzZT/9elaFAY4NTUEVFNqGF2dn2fJTFf0ze0vVSnKkBKncZ/igrBvnLM+28YjTe26t8zE34xSs3rv59oM7/qWNODaz3HOA/eoelZ1ro8r9fN80kXsYR+HNM4T1HGJBlwzxkc84dm6sqQ1saafUquQe3bxbVkPH2hEklM=</latexit><latexit sha1_base64="R5wJmE9xE5SdTz0wZ8is0Q1VbN8=">AAACtHicjVLLSgMxFD0dX7VWrWs3g0VwVaZu1J2gC5cVHFuoRTJpWmMzDzIZoRR/wK0fJ/6B/oU3cQpqEc0wMyfn3nOSm5soUzI3QfBa8ZaWV1bXquu1jXptc2u7Ub/O00JzEfJUpboXsVwomYjQSKNEL9OCxZES3WhyZuPdB6FzmSZXZpqJQczGiRxJzgxRndtGM2gFbviLoF2CJsqRNl5wgyFScBSIIZDAEFZgyOnpo40AGXEDzIjThKSLCzyiRtqCsgRlMGIn9B3TrF+yCc2tZ+7UnFZR9GpS+tgnTUp5mrBdzXfxwjlb9jfvmfO0e5vSPyq9YmIN7oj9SzfP/K/O1mIwwrGrQVJNmWNsdbx0Kdyp2J37X6oy5JARZ/GQ4powd8r5OftOk7va7dkyF39zmZa1c17mFni3u6T+tn92cxGEh62TVnAZoIpd7OGAuniEU1ygg5Ach3jCs3fu3XvZ5zXwKuV92MG34ekPT4SMsA==</latexit><latexit sha1_base64="PDg1iVskRMlKb1vXnPvuo3yKWL8=">AAACv3icjVLLSgMxFD0dX7VWrW7dDBbBVZm6UXeCGzdCBccW2lIy07QNnRdJRijFn3Cr/yX+gf6FN3EKahHNMDMn595zkpubIIuE0p73WnJWVtfWN8qbla3q9s5uba96p9JchtwP0yiVnYApHomE+1roiHcyyVkcRLwdTC9NvH3PpRJpcqtnGe/HbJyIkQiZJqrT0xOu2WA4qNW9hmeHuwyaBaijGK209oIehkgRIkcMjgSacAQGRU8XTXjIiOtjTpwkJGyc4wEV0uaUxSmDETul75hm3YJNaG48lVWHtEpErySliyPSpJQnCZvVXBvPrbNhf/OeW0+ztxn9g8IrJlZjQuxfukXmf3WmFo0RzmwNgmrKLGOqCwuX3J6K2bn7pSpNDhlxBg8pLgmHVrk4Z9dqlK3dnC2z8TebaVgzD4vcHO9ml9Tg5s92LgP/pHHe8G48lHGAQxxTF09xgSu04NvuPeIJz861o5zZ501wSsWV2Me34cw/ABLSkR0=</latexit><latexit sha1_base64="PDg1iVskRMlKb1vXnPvuo3yKWL8=">AAACv3icjVLLSgMxFD0dX7VWrW7dDBbBVZm6UXeCGzdCBccW2lIy07QNnRdJRijFn3Cr/yX+gf6FN3EKahHNMDMn595zkpubIIuE0p73WnJWVtfWN8qbla3q9s5uba96p9JchtwP0yiVnYApHomE+1roiHcyyVkcRLwdTC9NvH3PpRJpcqtnGe/HbJyIkQiZJqrT0xOu2WA4qNW9hmeHuwyaBaijGK209oIehkgRIkcMjgSacAQGRU8XTXjIiOtjTpwkJGyc4wEV0uaUxSmDETul75hm3YJNaG48lVWHtEpErySliyPSpJQnCZvVXBvPrbNhf/OeW0+ztxn9g8IrJlZjQuxfukXmf3WmFo0RzmwNgmrKLGOqCwuX3J6K2bn7pSpNDhlxBg8pLgmHVrk4Z9dqlK3dnC2z8TebaVgzD4vcHO9ml9Tg5s92LgP/pHHe8G48lHGAQxxTF09xgSu04NvuPeIJz861o5zZ501wSsWV2Me34cw/ABLSkR0=</latexit><latexit sha1_base64="pmGTWxK+9yJL0G5/IG3QPG9ps5M=">AAACynicjVHLSsNAFD2Nr1pfVZdugkVwVVI36q7oxo1QwdiCLWWSTtuhaRIyE6EUd36BW/0v8Q/0L7wzTkEtohOSnDn3nDtz7w3SSEjlea8FZ2FxaXmluFpaW9/Y3Cpv79zIJM9C7odJlGStgEkeiZj7SqiIt9KMs3EQ8WYwOtfx5h3PpEjiazVJeWfMBrHoi5ApolptNeSKdXvdcsWrema586BmQQV2NZLyC9roIUGIHGNwxFCEIzBIem5Rg4eUuA6mxGWEhIlz3KNE3pxUnBSM2BF9B7S7tWxMe51TGndIp0T0ZuR0cUCehHQZYX2aa+K5yazZ33JPTU59twn9A5trTKzCkNi/fDPlf326FoU+TkwNgmpKDaOrC22W3HRF39z9UpWiDClxGvconhEOjXPWZ9d4pKld95aZ+JtRalbvQ6vN8a5vSQOu/RznPPCPqqdV78qr1M/spIvYwz4OaZzHqOMCDfhmjI94wrNz6Uhn4kw/pU7BenbxbTkPH2cEkk8=</latexit><latexit sha1_base64="FJTY+DBhtoXUv+lkt03HovnHItc=">AAACynicjVHLSsNAFD2Nr1pfVZdugkVwVVIR1F3RjRuhgrGFtpRJOm1D82JmIpTizi9wq/8l/oH+hXfGFNQiOiHJmXPPuTP3Xi8NA6kc57VgLSwuLa8UV0tr6xubW+XtnVuZZMLnrp+EiWh5TPIwiLmrAhXyVio4i7yQN73xhY4377iQQRLfqEnKuxEbxsEg8JkiqtVRI65Yr98rV5yqY5Y9D2o5qCBfjaT8gg76SOAjQwSOGIpwCAZJTxs1OEiJ62JKnCAUmDjHPUrkzUjFScGIHdN3SLt2zsa01zmlcft0SkivIKeNA/IkpBOE9Wm2iWcms2Z/yz01OfXdJvT38lwRsQojYv/yzZT/9elaFAY4NTUEVFNqGF2dn2fJTFf0ze0vVSnKkBKncZ/igrBvnLM+28YjTe26t8zE34xSs3rv59oM7/qWNODaz3HOA/eoelZ1ro8r9fN80kXsYR+HNM4T1HGJBlwzxkc84dm6sqQ1saafUquQe3bxbVkPH2hEklM=</latexit><latexit sha1_base64="FJTY+DBhtoXUv+lkt03HovnHItc=">AAACynicjVHLSsNAFD2Nr1pfVZdugkVwVVIR1F3RjRuhgrGFtpRJOm1D82JmIpTizi9wq/8l/oH+hXfGFNQiOiHJmXPPuTP3Xi8NA6kc57VgLSwuLa8UV0tr6xubW+XtnVuZZMLnrp+EiWh5TPIwiLmrAhXyVio4i7yQN73xhY4377iQQRLfqEnKuxEbxsEg8JkiqtVRI65Yr98rV5yqY5Y9D2o5qCBfjaT8gg76SOAjQwSOGIpwCAZJTxs1OEiJ62JKnCAUmDjHPUrkzUjFScGIHdN3SLt2zsa01zmlcft0SkivIKeNA/IkpBOE9Wm2iWcms2Z/yz01OfXdJvT38lwRsQojYv/yzZT/9elaFAY4NTUEVFNqGF2dn2fJTFf0ze0vVSnKkBKncZ/igrBvnLM+28YjTe26t8zE34xSs3rv59oM7/qWNODaz3HOA/eoelZ1ro8r9fN80kXsYR+HNM4T1HGJBlwzxkc84dm6sqQ1saafUquQe3bxbVkPH2hEklM=</latexit><latexit sha1_base64="FJTY+DBhtoXUv+lkt03HovnHItc=">AAACynicjVHLSsNAFD2Nr1pfVZdugkVwVVIR1F3RjRuhgrGFtpRJOm1D82JmIpTizi9wq/8l/oH+hXfGFNQiOiHJmXPPuTP3Xi8NA6kc57VgLSwuLa8UV0tr6xubW+XtnVuZZMLnrp+EiWh5TPIwiLmrAhXyVio4i7yQN73xhY4377iQQRLfqEnKuxEbxsEg8JkiqtVRI65Yr98rV5yqY5Y9D2o5qCBfjaT8gg76SOAjQwSOGIpwCAZJTxs1OEiJ62JKnCAUmDjHPUrkzUjFScGIHdN3SLt2zsa01zmlcft0SkivIKeNA/IkpBOE9Wm2iWcms2Z/yz01OfXdJvT38lwRsQojYv/yzZT/9elaFAY4NTUEVFNqGF2dn2fJTFf0ze0vVSnKkBKncZ/igrBvnLM+28YjTe26t8zE34xSs3rv59oM7/qWNODaz3HOA/eoelZ1ro8r9fN80kXsYR+HNM4T1HGJBlwzxkc84dm6sqQ1saafUquQe3bxbVkPH2hEklM=</latexit><latexit sha1_base64="FJTY+DBhtoXUv+lkt03HovnHItc=">AAACynicjVHLSsNAFD2Nr1pfVZdugkVwVVIR1F3RjRuhgrGFtpRJOm1D82JmIpTizi9wq/8l/oH+hXfGFNQiOiHJmXPPuTP3Xi8NA6kc57VgLSwuLa8UV0tr6xubW+XtnVuZZMLnrp+EiWh5TPIwiLmrAhXyVio4i7yQN73xhY4377iQQRLfqEnKuxEbxsEg8JkiqtVRI65Yr98rV5yqY5Y9D2o5qCBfjaT8gg76SOAjQwSOGIpwCAZJTxs1OEiJ62JKnCAUmDjHPUrkzUjFScGIHdN3SLt2zsa01zmlcft0SkivIKeNA/IkpBOE9Wm2iWcms2Z/yz01OfXdJvT38lwRsQojYv/yzZT/9elaFAY4NTUEVFNqGF2dn2fJTFf0ze0vVSnKkBKncZ/igrBvnLM+28YjTe26t8zE34xSs3rv59oM7/qWNODaz3HOA/eoelZ1ro8r9fN80kXsYR+HNM4T1HGJBlwzxkc84dm6sqQ1saafUquQe3bxbVkPH2hEklM=</latexit><latexit sha1_base64="FJTY+DBhtoXUv+lkt03HovnHItc=">AAACynicjVHLSsNAFD2Nr1pfVZdugkVwVVIR1F3RjRuhgrGFtpRJOm1D82JmIpTizi9wq/8l/oH+hXfGFNQiOiHJmXPPuTP3Xi8NA6kc57VgLSwuLa8UV0tr6xubW+XtnVuZZMLnrp+EiWh5TPIwiLmrAhXyVio4i7yQN73xhY4377iQQRLfqEnKuxEbxsEg8JkiqtVRI65Yr98rV5yqY5Y9D2o5qCBfjaT8gg76SOAjQwSOGIpwCAZJTxs1OEiJ62JKnCAUmDjHPUrkzUjFScGIHdN3SLt2zsa01zmlcft0SkivIKeNA/IkpBOE9Wm2iWcms2Z/yz01OfXdJvT38lwRsQojYv/yzZT/9elaFAY4NTUEVFNqGF2dn2fJTFf0ze0vVSnKkBKncZ/igrBvnLM+28YjTe26t8zE34xSs3rv59oM7/qWNODaz3HOA/eoelZ1ro8r9fN80kXsYR+HNM4T1HGJBlwzxkc84dm6sqQ1saafUquQe3bxbVkPH2hEklM=</latexit>

Page 22: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

Estimating the model

‣ Approximate posterior inference algorithms ‣ Mean field variational methods‣ Expectation propagation‣ Collapsed Gibbs sampling*‣ Collapsed variational inference‣ Online variational inference

22

Page 23: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

Take away message

‣ The estimation of the parameters in the model is performed based

on the observed data and the parameters

‣ Clustering can be performed using LDA by choosing the most

representative topic and fixing the desired number of topics as the

number of clusters

23

Page 24: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

More about clustering‣ Many algorithms are available for clustering, however many of them are

not suitable for text

‣ Classical algorithms usually perform well, but state-of-the-art algorithms can perform much better if they are well parametrized (not free lunch)

‣ Clustering is a hard task and few algorithms scale well to huge text collections. If hierarchies are needed, bottom-up or top-down strategies can be applied (combined with the presented algorithms)

‣ There exist problems in which labels are needed!!! LDA is a good solution however other simpler algorithms can do the job (STC, Lingo, etc)

‣ As any other data mining task, the best clustering algorithm will be drawn by the problem and not by choosing the most popular one

24

Page 25: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

How to evaluate clustering?‣ Evaluation is a hard task.

‣ Best scenario: an annotated collection is available

‣ Pairs of documents are used to evaluated the partitions by asking if they belongs to the same partition in the annotated data and in the obtained partition (for any clustering algorithm)

‣ Extreme cases are hard to evaluate

‣ All documents belong to just one cluster

‣ All documents belong to individual clusters

25

Page 26: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

Comparison of clustering evaluation metrics

26

Page 27: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

Take away message‣ Evaluate if you have annotated data

‣ Metric selection is an important issue

‣ Try to understand the addressed problem to select a metric adapted to it

‣ Tools also implement evaluation metrics

‣ Clustering performance evaluation module in scikit-learn

‣ Clusteval in R

27

Page 28: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

Landscape of Text Mining and Analytics

Real World Text DataObserved World

Word association mining& analysis

Opinion mining & Sentiment analysis

Topic mining and analysisText based prediction

Perceive Express

(english)

NLP & text representation

28

Page 29: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

Task of Topic Mining and Analysis

Text Data

Class 1

Class 2

Class k

...

Doc 1 Doc 2 Doc N...

Supervised problem (classification) 29

Page 30: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

Topic mining and Analysis‣ Group of documents (unsupervised)

‣ Clustering‣ Group of documents but into classes (supervised)

‣ Classification‣ Group of words - dimensionality reduction (We are going to see

them in the unsupervised part)

‣ NMF, LDA, PLSA, etc.

30

Page 31: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

Text classification or categorization‣ Given the following

‣ A set of predefined categories, possibly forming a hierarchy and often

‣ A training set of labeled text objects

‣ Task: Classify a text object into one or more of the categories

Page 32: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

Examples of text categorization‣ Text objects can vary (e.g., documents, passages, or collections of text) ‣ Categories can also vary

‣ Internal categories that characterize a text object (e.g., topical categories, sentiment categories)

‣ External categories that characterize an entity associated with the object (e.g., author attribution or any other meaningful categories associated with text data)

‣ Some examples of applications ‣ News categorization, literature article categorization (e.g., MeSH

annotations) ‣ Spam email detection/filtering ‣ Sentiment categorization of product reviews or tweets ‣ Automatic email sorting/routing ‣ Author attribution

Page 33: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

Variants of problem formulation‣ Binary categorization: only two categories

‣ Retrieval (relevant or not)

‣ Span filtering (span or not)

‣ Opinion (negative or positive)

‣ K-category categorization: more than two categories

‣ Topic categorization (sports, science, travel, business, etc.)

‣ Email routing (folder 1, folder 2, etc.)

‣ Hierarchical categorization: categories for a hierarchy

‣ Joint categorization: multiple related categorization tasks done in a joint manner

Page 34: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

Why text categorization?‣ To enrich text representation (more understanding of text)

‣ Te x t c a n n o w b e r e p r e s e n t e d i n m u l t i p l e l e v e l s (keywords+categories)

‣ Semantic categories assigned ca be directly or indirectly useful for an application

‣ Semantic categories facilitate aggregation of text content (e.g., aggregating all positive/negative opinions about a product)

‣ To infer properties of entities associated with text data (discovery of knowledge about the world) ‣ As long as an entity can be associated with text data, we can always

use the text data to help to categorize the associated entities ‣ E.g., discovery of non-native speakers of a language; prediction of

party affiliation based on a political speech

Page 35: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

Categorization methods: manual‣ Determine the category based on rules that are carefully designed to

reflect the domain knowledge about the categorization problem ‣ Works well when

‣ The categories are very well defined ‣ Categories are easily distinguished based on surface features in text

(e.g., special vocabulary is known to only occur in a particular category)

‣ Sufficient domain knowledge is available to suggest many effective rules

‣ Problems ‣ Labor intensive -> doesn't scale up well ‣ Can't handle uncertainty in rules; rules may be inconsistent -> not

robust ‣ Both problems can be solved/alleviated by using machine learning

Page 36: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

Categorization methods: automatic‣ Use human experts to

‣ Annotate data sets with category labels -> training data ‣ Provide a set of features to represent each text object that can

potentially provide a "clue" about the category ‣ Use machine learning to learn "soft rules" for separating different

categories ‣ Figure out which features are most useful for separating different

categories ‣ Optimally combine the features to minimize the errors of

categorization on the training data ‣ The trained classifier can then be applied to a new text object to

predict the most likely category (that a human expert would assign to it)

Page 37: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

Machine learning for text categorization

‣ General setup: learn a classifier f: X->Y ‣ Input: X= all objects; Output: Y all categories ‣ Learn a classifier function, f: X->Y, such that f(x)=y gives correct

category for x (correct is based on the training data) ‣ All methods

‣ Rely on discriminative features of text objects to distinguish categories

‣ Combine multiple features in a weighted manner ‣ Adjust weights on features to minimize errors on the training data

‣ Different methods tend to vary in ‣ Their way of measuring the errors on the training data (may optimize

a different objective/loss/cost function) ‣ Their way to combining features (e.g. linear vs non-linear)

Page 38: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

Generative vs Discriminative Classifiers

‣ Generative classifiers (learn what the data "looks" like in each category) ‣ Attempt to model p(X,Y)=p(Y)P(X|Y) and compute p(Y|X) based on

p(X|Y) and p(Y) by using the Bayes rule ‣ Objective function is likelihood, thus indirectly measuring training

error ‣ E.g., Naive Bayes

‣ Discriminative classifiers (learn what features separate categories) ‣ Attempt to model p(Y|X) directly ‣ Objective function directly measures errors of categorization on

training data ‣ E.g., logistic regression, Support vector machines (SVM), k-Nearest

Neighbors (kNN)

Page 39: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

39

Text Categorization with Naïve Bayes

‣ Consider each category independently as a class c (for the multiple class setting) ‣ Example d – document ‣ Feature w – word or term

‣ Classify c if score(c)>θ • Typically a specifically tuned threshold for each class, due to

inaccuracy of the probabilistic estimate of P(d|c) with given training statistics and independence assumption,

• .. but a biased probability estimate for c may still correlate well with the classification decision

Page 40: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

40

Nearest-Neighbor Learning Algorithm‣ Learning is just storing the representations of the training examples in

data set D ‣ Testing instance x:

‣ Compute similarity between x and all examples in D ‣ Assign x the category of the most similar examples in D

‣ Does not explicitly compute a generalization or category prototypes (i.e., no “modeling”)

‣ Also called: ‣ Case-based ‣ Memory-based ‣ Lazy learning

Page 41: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

41

K Nearest Neighbor for Text‣ Training:

‣ For each each training example <x, c(x)> Є D • Compute the corresponding TF-IDF vector, dx, for document x

‣ Test instance y: ‣ Compute TF-IDF vector d for document y ‣ For each <x, c(x)> Є D

• Let sx = cos(d, dx)

‣ Sort examples, x, in D by decreasing value of sx ‣ Let N be the first k examples in D. (get most similar neighbors) ‣ Return the majority class of examples in N

Simple but powerful in very large collections!

Page 42: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

42

KNN discussion‣ No feature selection necessary

‣ No training necessary

‣ Scales well with large number of classes/documents

‣ Don’t need to train n classifiers for n classes

‣ Classes can influence each other

‣ Small changes to one class can have ripple effect

‣ Done naively, very expensive at test time

Page 43: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

43

Evaluation‣ Recall: Fraction of docs in class i classified correctly ‣ Precision: Fraction of docs assigned class i that are actually about

class i ‣ Accuracy: (1 - error rate) Fraction of docs classified correctly ‣ Other metrics, but again the problem define the appropriate

metrics

Page 44: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

44

Cross validation example

‣ Split the data into 5 samples ‣ Fit a model to the training

samples and use the test sample to calculate a CV metric.

‣ Repeat the process for the next sample, until all samples have been used to either train or test the model

Page 45: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

45

http://scikit-learn.org/stable/auto_examples/text/document_clustering.html

http://scikit-learn.org/stable/auto_examples/text/document_classification_20newsgroups.html

Examples within the landscape of Text Mining and Analytics

Page 46: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

Landscape of Text Mining and Analytics

Real World Text DataObserved World

Word association mining& analysis

Opinion mining & Sentiment analysis

Topic mining and analysisText based prediction

Perceive Express

(english)

NLP & text representation

46

Page 47: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

47

import nltk.collocations import collections

f = open("full.txt") documents = " ".join([line for line in f.readlines()]) f.close()

bgm = nltk.collocations.BigramAssocMeasures() finder = nltk.collocations.BigramCollocationFinder.from_words(documents.split()) ignored_words = nltk.corpus.stopwords.words('english') finder.apply_word_filter(lambda w: len(w) < 3 or w.lower() in ignored_words)

finder.nbest(bgm.raw_freq, 10) finder.nbest(bgm.pmi, 10) finder.nbest(bgm.likelihood_ratio, 10) finder.nbest(bgm.chi_sq, 10) finder.nbest(bgm.dice, 10) finder.nbest(bgm.fisher, 10) finder.nbest(bgm.jaccard, 10) finder.nbest(bgm.mi_like, 10) finder.nbest(bgm.poisson_stirling, 10) finder.nbest(bgm.student_t, 10)

scored = finder.score_ngrams(bgm.pmi)

Page 48: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

Landscape of Text Mining and Analytics

Real World Text DataObserved World

Word association mining& analysis

Opinion mining & Sentiment analysis

Topic mining and analysisText based prediction

Perceive Express

(english)

NLP & text representation

48

Page 49: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

49

from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer from sklearn.decomposition import NMF, LatentDirichletAllocation from sklearn.cluster import KMeans

f = open("full.txt.head") documents = [line for line in f.readlines()] f.close()

tf_v = CountVectorizer(max_df=0.95, min_df=2, max_features=100000, stop_words='english') tfidf_v = TfidfVectorizer(max_df=0.95, min_df=2, max_features=100000, stop_words='english')

tf = tf_v.fit_transform(documents) tfidf = tfidf_v.fit_transform(documents)

nmf = NMF(n_components=10, random_state=1,alpha=.1, l1_ratio=.5).fit(tfidf) km = KMeans(n_clusters=10, init='k-means++', max_iter=100, n_init=1).fit(tfidf) lda = LatentDirichletAllocation(n_topics=10, max_iter=5, learning_method='online', learning_offset=50., random_state=0).fit(tf)

Page 50: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

Landscape of Text Mining and Analytics

Real World Text DataObserved World

Word association mining& analysis

Opinion mining & Sentiment analysis

Topic mining and analysisText based prediction

Perceive Express

(english)

NLP & text representation

50

Page 51: Web Content Mining - IRITYoann.Pitarch/Docs/M2Stats/2019/m2... · 2019-01-11 · Topic Mining and Analysis: Motivation ‣ Topic~ main idea discussed in text data ‣ Theme/subject

51

from sklearn.svm import SVC from sklearn.feature_extraction.text import TfidfVectorizer import random

from nltk.corpus import movie_reviews docs = [(list(movie_reviews.words(fileid)), category) for category in movie_reviews.categories() for fileid in movie_reviews.fileids(category)] random.shuffle(docs) X,y= [" ".join(w[0]) for w in docs],[1 if w[1] =='pos' else 0 for w in docs]

tfidf_v = TfidfVectorizer(max_df=0.95, min_df=2, max_features=100000, stop_words='english')

tfidf = tfidf_v.fit_transform(X)

clf = SVC() clf.fit(tfidf, y)

tfidf_test = tfidf_v.transform("One of my all time favorites. Shawshank Redemption is a very moving story about hope and the power of friendship. The cast is first rate with everyone giving a great performance. Tim Robbins and Morgan Freeman carry the movie, but Bob Gunton and Clancy Brown are perfect as the Warden Norton and prison guard captain Hadley respectively. And James Whitmore's portrail of an elderly inmate Brooks is moving. The screenplay gives almost every actor at least one or more memorable lines through out the film. As well as a very surprising twist near the end that almost knocked me out of my chair. If you have not seen this movie rent it or better yet buy it. As I bet you'll want to see this one more than once.".split())

print(clf.predict(tfidf_test))