bdaca1516s2 - lecture7

49
Co-occurrences Networks Other co-occurrence based methods Next meetings Big Data and Automated Content Analysis Week 7 – Monday »Co-occurring words« Damian Trilling [email protected] @damian0604 www.damiantrilling.net Afdeling Communicatiewetenschap Universiteit van Amsterdam 9 May 2016 Big Data and Automated Content Analysis Damian Trilling

Upload: department-of-communication-science-university-of-amsterdam

Post on 08-Jan-2017

186 views

Category:

Education


1 download

TRANSCRIPT

Page 1: BDACA1516s2 - Lecture7

Co-occurrences Networks Other co-occurrence based methods Next meetings

Big Data and Automated Content AnalysisWeek 7 – Monday

»Co-occurring words«

Damian Trilling

[email protected]@damian0604

www.damiantrilling.net

Afdeling CommunicatiewetenschapUniversiteit van Amsterdam

9 May 2016

Big Data and Automated Content Analysis Damian Trilling

Page 2: BDACA1516s2 - Lecture7

Co-occurrences Networks Other co-occurrence based methods Next meetings

Today

1 Integrating word counts and network analysis: Wordco-occurrences

The ideaA real-life example

2 Other co-occurrence based methodsPCALDA

3 Next meetings, & final project

Big Data and Automated Content Analysis Damian Trilling

Page 3: BDACA1516s2 - Lecture7

Co-occurrences Networks Other co-occurrence based methods Next meetings

Integrating word counts and network analysis:Word co-occurrences

Big Data and Automated Content Analysis Damian Trilling

Page 4: BDACA1516s2 - Lecture7

Co-occurrences Networks Other co-occurrence based methods Next meetings

The idea

Simple word count

We already know this.1 from collections import Counter2 tekst="this is a test where many test words occur several times this is

because it is a test yes indeed it is"3 c=Counter(tekst.split())4 print "The top 5 are: "5 for woord,aantal in c.most_common(5):6 print (aantal,woord)

Big Data and Automated Content Analysis Damian Trilling

Page 5: BDACA1516s2 - Lecture7

Co-occurrences Networks Other co-occurrence based methods Next meetings

The idea

Simple word count

The output:1 The top 5 are:2 4 is3 3 test4 2 a5 2 this6 2 it

Big Data and Automated Content Analysis Damian Trilling

Page 6: BDACA1516s2 - Lecture7

Co-occurrences Networks Other co-occurrence based methods Next meetings

The idea

What if we could. . .

. . . count the frequency of combinations of words?

As in: Which words do typical occur together in the sametweet (or paragraph, or sentence, . . . )

Big Data and Automated Content Analysis Damian Trilling

Page 7: BDACA1516s2 - Lecture7

Co-occurrences Networks Other co-occurrence based methods Next meetings

The idea

What if we could. . .

. . . count the frequency of combinations of words?

As in: Which words do typical occur together in the sametweet (or paragraph, or sentence, . . . )

Big Data and Automated Content Analysis Damian Trilling

Page 8: BDACA1516s2 - Lecture7

Co-occurrences Networks Other co-occurrence based methods Next meetings

The idea

We can — with the combinations() function

1 >>> from itertools import combinations2 >>> words="Hoi this is a test test test a test it is".split()3 >>> print ([e for e in combinations(words,2)])4 [(’Hoi’, ’this’), (’Hoi’, ’is’), (’Hoi’, ’a’), (’Hoi’, ’test’), (’Hoi’,

’test’), (’Hoi’, ’test’), (’Hoi’, ’a’), (’Hoi’, ’test’), (’Hoi’, ’it’), (’Hoi’, ’is’), (’this’, ’is’), (’this’, ’a’), (’this’, ’test’), (’this’, ’test’), (’this’, ’test’), (’this’, ’a’), (’this’, ’test’), (’this’, ’it’), (’this’, ’is’), (’is’, ’a’), (’is’, ’test’), (’is’, ’test’), (’is’, ’test’), (’is’, ’a’), (’is’, ’test’), (’is’, ’it’), (’is’, ’is’), (’a’, ’test’), (’a’, ’test’), (’a’, ’test’), (’a’, ’a’), (’a’, ’test’), (’a’, ’it’), (’a’, ’is’), (’test’, ’test’), (’test’, ’test’), (’test’, ’a’), (’test’, ’test’), (’test’,’it’), (’test’, ’is’), (’test’, ’test’), (’test’, ’a’), (’test’, ’

test’), (’test’, ’it’), (’test’, ’is’), (’test’, ’a’), (’test’, ’test’), (’test’, ’it’), (’test’, ’is’), (’a’, ’test’), (’a’, ’it’),(’a’, ’is’), (’test’, ’it’), (’test’, ’is’), (’it’, ’is’)]

Big Data and Automated Content Analysis Damian Trilling

Page 9: BDACA1516s2 - Lecture7

Co-occurrences Networks Other co-occurrence based methods Next meetings

The idea

Count co-occurrences

1 from collections import defaultdict2 from itertools import combinations34 tweets=["i am having coffee with my friend","i like coffee","i like

coffee and beer","beer i like"]5 cooc=defaultdict(int)67 for tweet in tweets:8 words=tweet.split()9 for a,b in set(combinations(words,2)):10 if (b,a) in cooc:11 a,b = b,a12 if a!=b:13 cooc[(a,b)]+=11415 for combi in sorted(cooc,key=cooc.get,reverse=True):16 print (cooc[combi],"\t",combi)

Big Data and Automated Content Analysis Damian Trilling

Page 10: BDACA1516s2 - Lecture7

Co-occurrences Networks Other co-occurrence based methods Next meetings

The idea

Count co-occurrences

The output:1 3 (’i’, ’coffee’)2 3 (’i’, ’like’)3 2 (’i’, ’beer’)4 2 (’like’, ’beer’)5 2 (’like’, ’coffee’)6 1 (’coffee’, ’beer’)7 1 (’and’, ’beer’)8 ...9 ...10 ...

Big Data and Automated Content Analysis Damian Trilling

Page 11: BDACA1516s2 - Lecture7

Co-occurrences Networks Other co-occurrence based methods Next meetings

The idea

From a list of co-occurrences to a network

Let’s conceptualize each word as a node and eachcooccurrence as an edge

• node weight = word frequency• edge weight = number of coocurrences

A GDF file offers all of this and looks like this:

Big Data and Automated Content Analysis Damian Trilling

Page 12: BDACA1516s2 - Lecture7

1 nodedef>name VARCHAR, width DOUBLE2 coffee,33 beer,24 i,45 and,16 with,17 friend,18 having,19 like,310 am,111 my,112 edgedef>node1 VARCHAR,node2 VARCHAR, weight DOUBLE13 coffee,beer,114 i,beer,215 and,beer,116 with,friend,117 coffee,with,118 i,and,119 having,friend,120 like,beer,221 am,friend,122 i,am,123 i,coffee,324 i,with,125 am,having,126 i,having,127 coffee,and,128 like,coffee,229 am,coffee,130 with,my,131 i,friend,132 like,and,133 am,with,134 having,with,135 i,my,136 having,coffee,137 i,like,338 coffee,friend,139 having,my,140 am,my,141 coffee,my,142 my,friend,1

Page 13: BDACA1516s2 - Lecture7

Co-occurrences Networks Other co-occurrence based methods Next meetings

The idea

How to represent the cooccurrences graphically?

A two-step approach

1 Save as a GDF file (the format seems easy to understand, sowe could write a function for this in Python)

2 Open the GDF file in Gephi for visualization and/or networkanalysis

Big Data and Automated Content Analysis Damian Trilling

Page 14: BDACA1516s2 - Lecture7

Co-occurrences Networks Other co-occurrence based methods Next meetings

The idea

Gephi

• Install (NOT in the VM) from https://gephi.org• By problems on MacOS, see what I wrote about Gephi here:

http://www.damiantrilling.net/setting-up-my-new-macbook/

• I made a screencast on how to visualize the GDF file in Gephi:https://streamingmedia.uva.nl/asset/detail/t2KWKVZtQWZIe2Cj8qXcW5KF

• Further: see the materials I mailed to you

Big Data and Automated Content Analysis Damian Trilling

Page 15: BDACA1516s2 - Lecture7

Co-occurrences Networks Other co-occurrence based methods Next meetings

A real-life example

A real-life example

Trilling, D. (2015). Two different debates? Investigating therelationship between a political debate on TV and simultaneouscomments on Twitter. Social Science Computer Review,33,259–276. doi: 10.1177/0894439314537886

Big Data and Automated Content Analysis Damian Trilling

Page 16: BDACA1516s2 - Lecture7

Co-occurrences Networks Other co-occurrence based methods Next meetings

A real-life example

Commenting the TV debate on Twitter

The viewers

• Commenting television programs on social networks hasbecome a regular pattern of behavior (Courtois & d’Heer, 2012)

• User comments have shown to reflect the structure of thedebate (Shamma, Churchill, & Kennedy, 2010; Shamma, Kennedy, & Churchill, 2009)

• Topic and speaker effect more influential than, e.g., rhetoricalskills (Nagel, Maurer, & Reinemann, 2012; De Mooy & Maier, 2014)

Big Data and Automated Content Analysis Damian Trilling

Page 17: BDACA1516s2 - Lecture7

Co-occurrences Networks Other co-occurrence based methods Next meetings

A real-life example

Research Questions

To which extent are the statements politicians make during aTV debate reflected in online live discussions of the debate?

RQ1 Which topics are emphasized by the candidates?RQ2 Which topics are emphasized by the Twitter users?RQ3 With which topics are the two candidates associated

on Twitter?

Big Data and Automated Content Analysis Damian Trilling

Page 18: BDACA1516s2 - Lecture7

Co-occurrences Networks Other co-occurrence based methods Next meetings

A real-life example

Method

The data

• debate transcript• tweets containing#tvduell

• N = 120, 557 tweetsby N = 24, 796 users

• 22-9-2013,20.30-22.00

The analysis

• Series of self-written Pythonscripts:

1 preprocessing (stemming,stopword removal)

2 word counts3 word log likelihood (corpus

comparison)• Stata: regression analysis

Big Data and Automated Content Analysis Damian Trilling

Page 19: BDACA1516s2 - Lecture7

Co-occurrences Networks Other co-occurrence based methods Next meetings

A real-life example

Method

The data

• debate transcript• tweets containing#tvduell

• N = 120, 557 tweetsby N = 24, 796 users

• 22-9-2013,20.30-22.00

The analysis

• Series of self-written Pythonscripts:

1 preprocessing (stemming,stopword removal)

2 word counts3 word log likelihood (corpus

comparison)• Stata: regression analysis

Big Data and Automated Content Analysis Damian Trilling

Page 20: BDACA1516s2 - Lecture7

Co-occurrences Networks Other co-occurrence based methods Next meetings

A real-life example

Method

The data

• debate transcript• tweets containing#tvduell

• N = 120, 557 tweetsby N = 24, 796 users

• 22-9-2013,20.30-22.00

The analysis

• Series of self-written Pythonscripts:

1 preprocessing (stemming,stopword removal)

2 word counts3 word log likelihood (corpus

comparison)• Stata: regression analysis

Big Data and Automated Content Analysis Damian Trilling

Page 21: BDACA1516s2 - Lecture7

02

00

04

00

06

00

08

000

−60 −50 −40 −30 −20 −10 10 20 30 40 50 60 70 80 100 110 120 130 140 150start

end

Page 22: BDACA1516s2 - Lecture7

Co-occurrences Networks Other co-occurrence based methods Next meetings

A real-life example

Relationship between words on TV and on Twitter

02

46

81

0ln

(w

ord

on

Tw

itte

r +

1)

0 1 2 3ln (word on TV +1)

Big Data and Automated Content Analysis Damian Trilling

Page 23: BDACA1516s2 - Lecture7

Co-occurrences Networks Other co-occurrence based methods Next meetings

A real-life example

Word frequency TV ⇒ word frequency Twitter

Model 1 Model 2 Model 3ln(Twitter +1) ln(Twitter +1) ln(Twitter +1)

together w/ M. together w/ S.b (SE) b(SE) b(SE)beta beta beta

ln (TV M. +1) 1.59 (.052) *** 1.54 (.041) *** .77 (.037) ***.21 .26 .14

ln (TV S. +1) 1.29 (.051) *** .88 (.041) *** 1.25 (.037) ***.17 .15 .24

intercept 1.64 (.008) *** .87 (.007) *** .60 (.006) ***R2 .100 .115 .100b M. & S. differ? F(1, 21408) = 12.29 F(1, 21408) = 96.69 F(1, 21408) =

p <.001 p <.001 63.38p <.001

M = Merkel; S = Steinbrück

Big Data and Automated Content Analysis Damian Trilling

Page 24: BDACA1516s2 - Lecture7

Co-occurrences Networks Other co-occurrence based methods Next meetings

A real-life example

Most distinctive words on TV

LL word Frequency Merkel Frequency Steinbrück27,73 merkel 0 2019,41 arbeitsplatz [job] 14 015,25 steinbruck 11 09,70 koalition [coaltion] 7 09,70 international 7 09,70 gemeinsam [together] 7 08,55 griechenland [Greece] 10 18,32 investi [investment] 6 06,93 uberzeug [belief] 5 06,93 okonom [economic] 0 5

Big Data and Automated Content Analysis Damian Trilling

Page 25: BDACA1516s2 - Lecture7

Co-occurrences Networks Other co-occurrence based methods Next meetings

A real-life example

Most distinctive words on Twitter

LL word Frequency Merkel Frequency Steinbrück32443,39 merkel 29672 030751,65 steinbrueck 0 177801507,08 kett [necklace] 1628 341241,14 vertrau [trust] 1240 12863,84 fdp [a coalition partner] 985 29775,93 nsa 1809 298626,49 wikipedia 40 502574,65 twittert [tweets] 40 469544,87 koalition [coalition] 864 77517,99 gold 669 34

Big Data and Automated Content Analysis Damian Trilling

Page 26: BDACA1516s2 - Lecture7

Co-occurrences Networks Other co-occurrence based methods Next meetings

A real-life example

Putting the pieces together

Merkel

• necklace• trust (sarcastic)• nsa affair• coalition partners

Steinbrück

• suggestion to look sth. upon Wikipedia

• tweets from his accountduring the debate

Big Data and Automated Content Analysis Damian Trilling

Page 27: BDACA1516s2 - Lecture7
Page 28: BDACA1516s2 - Lecture7

Other (non-networkbased, statistical) co-occurrence basedmethods

Page 29: BDACA1516s2 - Lecture7

Enter unsupervised machine learning

(something you aready did in your Bachelor – no kidding.)

Page 30: BDACA1516s2 - Lecture7

Enter unsupervised machine learning

(something you aready did in your Bachelor – no kidding.)

Page 31: BDACA1516s2 - Lecture7

Co-occurrences Networks Other co-occurrence based methods Next meetings

Some terminology

Supervised machine learningYou have a dataset with bothpredictor and outcome(dependent and independentvariables) — a labeled dataset.

Think of regression: You measured x1,x2, x3 and you want to predict y,which you also measured

Unsupervised machine learningYou have no labels.

(You did notmeasure y)Again, you already know sometechniques to find out how x1,x2,. . . x_i co-occur from othercourses:

• Principal Component Analysis(PCA)

• Cluster analysis• . . .

Big Data and Automated Content Analysis Damian Trilling

Page 32: BDACA1516s2 - Lecture7

Co-occurrences Networks Other co-occurrence based methods Next meetings

Some terminology

Supervised machine learningYou have a dataset with bothpredictor and outcome(dependent and independentvariables) — a labeled dataset.Think of regression: You measured x1,x2, x3 and you want to predict y,which you also measured

Unsupervised machine learningYou have no labels.

(You did notmeasure y)Again, you already know sometechniques to find out how x1,x2,. . . x_i co-occur from othercourses:

• Principal Component Analysis(PCA)

• Cluster analysis• . . .

Big Data and Automated Content Analysis Damian Trilling

Page 33: BDACA1516s2 - Lecture7

Co-occurrences Networks Other co-occurrence based methods Next meetings

Some terminology

Supervised machine learningYou have a dataset with bothpredictor and outcome(dependent and independentvariables) — a labeled dataset.

Think of regression: You measured x1,x2, x3 and you want to predict y,which you also measured

Unsupervised machine learningYou have no labels.

(You did notmeasure y)Again, you already know sometechniques to find out how x1,x2,. . . x_i co-occur from othercourses:

• Principal Component Analysis(PCA)

• Cluster analysis• . . .

Big Data and Automated Content Analysis Damian Trilling

Page 34: BDACA1516s2 - Lecture7

Co-occurrences Networks Other co-occurrence based methods Next meetings

Some terminology

Supervised machine learningYou have a dataset with bothpredictor and outcome(dependent and independentvariables) — a labeled dataset.

Think of regression: You measured x1,x2, x3 and you want to predict y,which you also measured

Unsupervised machine learningYou have no labels. (You did notmeasure y)

Again, you already know sometechniques to find out how x1,x2,. . . x_i co-occur from othercourses:

• Principal Component Analysis(PCA)

• Cluster analysis• . . .

Big Data and Automated Content Analysis Damian Trilling

Page 35: BDACA1516s2 - Lecture7

Co-occurrences Networks Other co-occurrence based methods Next meetings

Some terminology

Supervised machine learningYou have a dataset with bothpredictor and outcome(dependent and independentvariables) — a labeled dataset.

Think of regression: You measured x1,x2, x3 and you want to predict y,which you also measured

Unsupervised machine learningYou have no labels.

(You did notmeasure y)

Again, you already know sometechniques to find out how x1,x2,. . . x_i co-occur from othercourses:

• Principal Component Analysis(PCA)

• Cluster analysis• . . .

Big Data and Automated Content Analysis Damian Trilling

Page 36: BDACA1516s2 - Lecture7

Co-occurrences Networks Other co-occurrence based methods Next meetings

PCA

Principal Component Analysis? How does that fit in here?

In fact, PCA is used everywhere, even in image compression

PCA in ACA

• Find out what word cooccur (inductive frame analysis)• Basically, transform each document in a vector of wordfrequencies and do a PCA

Big Data and Automated Content Analysis Damian Trilling

Page 37: BDACA1516s2 - Lecture7

Co-occurrences Networks Other co-occurrence based methods Next meetings

PCA

Principal Component Analysis? How does that fit in here?

In fact, PCA is used everywhere, even in image compression

PCA in ACA

• Find out what word cooccur (inductive frame analysis)• Basically, transform each document in a vector of wordfrequencies and do a PCA

Big Data and Automated Content Analysis Damian Trilling

Page 38: BDACA1516s2 - Lecture7

Co-occurrences Networks Other co-occurrence based methods Next meetings

PCA

Principal Component Analysis? How does that fit in here?

In fact, PCA is used everywhere, even in image compression

PCA in ACA

• Find out what word cooccur (inductive frame analysis)• Basically, transform each document in a vector of wordfrequencies and do a PCA

Big Data and Automated Content Analysis Damian Trilling

Page 39: BDACA1516s2 - Lecture7

Co-occurrences Networks Other co-occurrence based methods Next meetings

PCA

A so-called term-document-matrix

1 w1,w2,w3,w4,w5,w6 ...2 text1, 2, 0, 0, 1, 2, 3 ...3 text2, 0, 0, 1, 2, 3, 4 ...4 text3, 9, 0, 1, 1, 0, 0 ...5 ...

These can be simple counts, but also more advanced metrics, liketf-idf scores (where you weigh the frequency by the number ofdocuments in which it occurs), cosine distances, etc.

Big Data and Automated Content Analysis Damian Trilling

Page 40: BDACA1516s2 - Lecture7

Co-occurrences Networks Other co-occurrence based methods Next meetings

PCA

A so-called term-document-matrix

1 w1,w2,w3,w4,w5,w6 ...2 text1, 2, 0, 0, 1, 2, 3 ...3 text2, 0, 0, 1, 2, 3, 4 ...4 text3, 9, 0, 1, 1, 0, 0 ...5 ...

These can be simple counts, but also more advanced metrics, liketf-idf scores (where you weigh the frequency by the number ofdocuments in which it occurs), cosine distances, etc.

Big Data and Automated Content Analysis Damian Trilling

Page 41: BDACA1516s2 - Lecture7

Co-occurrences Networks Other co-occurrence based methods Next meetings

PCA

PCA: implications and problems

• given a term-document matrix, easy to do with any tool• probably extremely skewed distributions• some problematic assumptions: does the goal of PCA, to finda solution in which one word loads on one component matchreal life, where a word can belong to several topics or frames?

Big Data and Automated Content Analysis Damian Trilling

Page 42: BDACA1516s2 - Lecture7

Co-occurrences Networks Other co-occurrence based methods Next meetings

LDA

Enter topic modeling with Latent Dirichlet Allocation (LDA)

Big Data and Automated Content Analysis Damian Trilling

Page 43: BDACA1516s2 - Lecture7

Co-occurrences Networks Other co-occurrence based methods Next meetings

LDA

LDA, what’s that?

No mathematical details here, but the general idea

• There are k topics, T1. . .Tk

• Each document Di consists of a mixture of these topics,e.g.80%T1, 15%T2, 0%T3, . . . 5%Tk

• On the next level, each topic consists of a specific probabilitydistribution of words

• Thus, based on the frequencies of words in Di , one can inferits distribution of topics

• Note that LDA (likek PCA) is a Bag-of-Words (BOW)approach

Big Data and Automated Content Analysis Damian Trilling

Page 44: BDACA1516s2 - Lecture7

Co-occurrences Networks Other co-occurrence based methods Next meetings

LDA

Doing a LDA in Python

You can use gensim (Řehůřek & Sojka, 2010) for this.1 sudo pip3 install gensim

Furthermore, let us assume you have a list of lists of words (!)called texts:

1 articles=[’The tax deficit is higher than expected. This said xxx ...’,’Germany won the World Cup. After a’]

2 texts=[art.split() for art in articles]

which looks like this:1 [[’The’, ’tax’, ’deficit’, ’is’, ’higher’, ’than’, ’expected.’, ’This’,

’said’, ’xxx’, ’...’], [’Germany’, ’won’, ’the’, ’World’, ’Cup.’, ’After’, ’a’]]

Řehůřek, R., & Sojka, P. (2010). Software framework for topic modelling with large corpora. Proceedings of theLREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50. Valletta, Malta: ELRA.

Big Data and Automated Content Analysis Damian Trilling

Page 45: BDACA1516s2 - Lecture7

1 from gensim import corpora, models23 NTOPICS = 1004 LDAOUTPUTFILE="topicscores.tsv"56 # Create a BOW represenation of the texts7 id2word = corpora.Dictionary(texts)8 mm =[id2word.doc2bow(text) for text in texts]910 # Train the LDA models.11 lda = models.ldamodel.LdaModel(corpus=mm, id2word=id2word, num_topics=

NTOPICS, alpha="auto")1213 # Print the topics.14 for top in lda.print_topics(num_topics=NTOPICS, num_words=5):15 print ("\n",top)1617 print ("\nFor further analysis, a dataset with the topic score for each

document is saved to",LDAOUTPUTFILE)1819 scoresperdoc=lda.inference(mm)2021 with open(LDAOUTPUTFILE,"w",encoding="utf-8") as fo:22 for row in scoresperdoc[0]:23 fo.write("\t".join(["{:0.3f}".format(score) for score in row]))24 fo.write("\n")

Page 46: BDACA1516s2 - Lecture7

Co-occurrences Networks Other co-occurrence based methods Next meetings

LDA

Output: Topics (below) & topic scores (next slide)1 0.069*fusie + 0.058*brussel + 0.045*europesecommissie + 0.036*europese +

0.023*overname2 0.109*bank + 0.066*britse + 0.041*regering + 0.035*financien + 0.033*

minister3 0.114*nederlandse + 0.106*nederland + 0.070*bedrijven + 0.042*rusland +

0.038*russische4 0.093*nederlandsespoorwegen + 0.074*den + 0.036*jaar + 0.029*onderzoek +

0.027*raad5 0.099*banen + 0.045*jaar + 0.045*productie + 0.036*ton + 0.029*aantal6 0.041*grote + 0.038*bedrijven + 0.027*ondernemers + 0.023*goed + 0.015*

jaar7 0.108*werknemers + 0.037*jongeren + 0.035*werkgevers + 0.029*jaar +

0.025*werk8 0.171*bank + 0.122* + 0.041*klanten + 0.035*verzekeraar + 0.028*euro9 0.162*banken + 0.055*bank + 0.039*centrale + 0.027*leningen + 0.024*

financiele10 0.052*post + 0.042*media + 0.038*nieuwe + 0.034*netwerk + 0.025*

personeel11 ...

Big Data and Automated Content Analysis Damian Trilling

Page 47: BDACA1516s2 - Lecture7
Page 48: BDACA1516s2 - Lecture7

Co-occurrences Networks Other co-occurrence based methods Next meetings

Next meetings

Big Data and Automated Content Analysis Damian Trilling

Page 49: BDACA1516s2 - Lecture7

Co-occurrences Networks Other co-occurrence based methods Next meetings

Next meetings

Wednesday, 11–5Lab sessionConduct an analysis based on word co-occurrences (Chapter 8and/or 9.2). Install Gephi in advance!

No meeting on Monday (Pentecost)

Wednesday, 18–5Supervised machine learning

Big Data and Automated Content Analysis Damian Trilling