big data and human behavior - courses.cit.cornell.edu · big data and human behavior pantelis p....

46
Big data and human behavior Pantelis P. Analytis Manifestos Click prediction Information latent in the environment New techniques Data from the field Personalization and targeting The new laboratory Big data and forecasting Big data and human behavior Pantelis P. Analytis April 19, 2018 1 / 46

Upload: lamnhan

Post on 07-Sep-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

Big data and human behavior

Pantelis P. Analytis

April 19, 2018

1 / 46

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

1 Manifestos

2 Click prediction

3 Information latent in the environment

4 New techniques

5 Data from the field

6 Personalization and targeting

7 The new laboratory

8 Big data and forecasting

2 / 46

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

Potential uses of big data and naturally occurringdatasets

External validation of laboratory experiments.

Demonstrate phenomena that motivate follow-onlaboratory research.

Discover patterns of information latent in environmentsprocessed, and analyzed?

Create stimuli for experiments.

Construct and test computational models of cognition.

3 / 46

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

The four V’s of big data

Volume: How much data are included?

Variety: How many different kinds or sources of data areincluded?

Velocity: How quickly are the data able to be gathered,processed, and analyzed?

Veracity: How faithfully do the data capture what theyare believed to capture?

4 / 46

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

The first click model

zn D. H. KRAFT and T. LEE

Table 2. Expected search length formulas

Rule (Condition) E(X) -!a?.- E(Z)

satiation Rule (Ix+) R I N Disgust Rule (i>I) Combinaeion Rule

(r>R and i>I)

Satiation Rule (6=R) r r I/(R+l) r (N+l),'(R+lf Combination Rule

(rCR and VI) z

Disgust Rule (il1) i R/(1+1) I i - i fN+l)/(I+l)

Combination Rule (OR and 1~1)

Combination Rule r-l R I X ( (k)(i)/(k~iflik/(k+l)

r-l R I C { (k)li)/(k~l)}12/(k+l)

r-l R I

'r;R and 41) k-0 k=O k=O = i-l R I N 2 i-1RI N i-l R I N

+ E (rffk)/(k+rfr /Wk) + ~t(r)(k)/(r+kf3rk/(r+kf + r Z (r)(kf,'k+r) k=O k=O k-0

model of the satiation stopping rule. This rule allows us to expand our horizons by avoiding the assumption that the user does not consider the number of irrelevant documents encountered, or the disgust level, when determining when to terminate the scan. It also results in a new version of expected scan length, based on an alternative stopping rule. The disgust rule makes sense for user queries for exhaustive searches. An example would be a search for a list of all documents about Life in the antebellum South. However, the disgust rule has its drawbacks in that it ignores any effects on search length of the satiation level.

The probab~ity distributions for X, Y and 2 and the expected values of X, Y and Z are modeled for the disgust rule and presented above in Tables I and 2, respectively. One can readify see that the disgust rule is a dual to the satiation rule.

THE COMBINATION RULE

Still another alternative is needed. We suggest a combination rule, which altows the user to be seen as stopping the scan if he/she is satiated by finding the desired number of relevant documents or disgusted by having to examine too many irrelevant documents, whichever comes first. This rule incorporates aspects of the previous two rules.

If either R or I is zero, the combination rule degenerates into either the disgust rule or the satiation rule, respectively. We shah assume that both R and I are positive throughout the paper.

The formulas for the probab~ity distribution, and the expected values, are presented above in TabIes 1 and 2 respectively. One can readily see that the combination rule is the most complex stopping rule, the expected search Iength formulas being the most sophisticated models. Unfortunately, the expectation formulas cannot be appreciably simplified, due in large part to the explicit evaluation of the partial sums.

AN APPROXIMATION

The negative binomial distribution can be employed as an approximation to the negative hypergometric distribution formulas of Table 1. This assumes that N, the number of documents retrieved, is large, and that R and I are both positive. The negative binomial distribution applies exactly if each document is replaced after it is examined, and the search is random rather than linearf51. This coincides with our intuition that for large N, drawing documents with or without replacement yields approx~ately the same results.

The approximations for the probability distributions of X, Y and 2 and for the expected

Kraft, Donald H., and T. Lee. ”Stopping rules and their effecton expected search length.” Information ProcessingManagement 15.1 (1979): 47-58.

5 / 46

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

Information foraging theory

Pirolli, Peter, and Stuart Card. ”Information foraging.”Psychological review 106.4 (1999): 643.

6 / 46

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

The position based model

7 / 46

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

The cascade model

8 / 46

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

Measures of retrieval effectiveness

Cooper, William S. ”On selecting a measure of retrievaleffectiveness.” Journal of the American Society for InformationScience 24.2 (1973): 87-100.

Carbonell, J., Goldstein, J. (1998). The use of MMR,diversity-based reranking for reordering documents andproducing summaries. SIGIR

Jarvelin, Kalervo, and Jaana Kekalainen. (2002) Cumulatedgain-based evaluation of IR techniques. TOIS

Chapelle, O., Metlzer, D., Zhang, Y., Grinspan, P. (2009).Expected reciprocal rank for graded relevance. CIKM

9 / 46

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

Dynamic Bayesian model

10 / 46

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

Prediction rates of different models

11 / 46

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

Reflections on memory

Ebbinghaus was a pioneer of the experimental study ofmemory.He studied the effect of repetition on learning usingnon-sense syllabus doing experiments on himself.He spend 4 hours a day for his experiments and hecontrolled his life for a period longer than 2 years.

12 / 46

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

Laws of retention

13 / 46

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

Environment reflecting memory (Anderson andSchooler, 1991)

14 / 46

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

Spacing effect

15 / 46

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

Spacing effect (Anderson and Schooler, 1991)

16 / 46

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

The induction problem: how do people know somuch with so little experience.

Plato’s Phaedon dialogue.

Chomsky’s linguistics and nativism.

Shepard’s law of generalization: placing the problem atthe center of psychology.

All of machine learning is about induction.

17 / 46

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

Latent semantic analysis (Landauer and Dumais,1997)

Landauer and Dumais assume that conceptual similarity isbased on co-occurrence in natural language.They trained the model using 4.6m words from Grolier’sAcademic American Encyclopedia.They either used the first 2000 words or the entire articleto construct co-occurrence matrixes.

18 / 46

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

Latent semantic analysis (Landauer and Dumais,1997)

They tested the performance of the model on the Toefltest (80 synonym items).The model achieved performance comparable to theaverage foreigner student taking the test.

19 / 46

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

Biases in the field (Lacetera et al., 2012)

26.000.000 cars from 2002 to 2008.Average car is 4 years old with 57.000 in the meter. 82 %of the cars were sold. 20 / 46

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

Games and behavior (Stafford and Dewar, 2014)

Spacing practice leads to better results than cramming.

Learning curves are characterized by the power law ofpractice.

It takes 10000h of practice to completely master a skill.(Ericsson,2006).

21 / 46

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

Games and behavior (Stafford and Dewar, 2014)

22 / 46

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

Games and behavior (Stafford and Dewar, 2014)

23 / 46

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

Games and behavior (Stafford and Dewar, 2014)

24 / 46

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

Assessing personalities

25 / 46

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

Using likes to predict personality (Youyou et al.,2015)

26 / 46

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

Using likes to predict personality (Youyou et al.,2015)

27 / 46

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

Psychological targeting (Matz et al., 2017)

28 / 46

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

Psychological targeting (Matz et al., 2017)

29 / 46

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

Predicting sexual orientation (Kosinski and Wang,2017)

30 / 46

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

Predicting sexual orientation (Kosinski and Wang,2017)

31 / 46

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

To vote or not to vote?

Two world-class economists who run into each other at thevoting booth. What are you doing here? one asks.My wife made me come, the other says.The first economist gives a confirming nod. The same.After a mutually sheepish moment, one of them hatches a plan:”If you promise never to tell anyone you saw me here, I’ll nevertell anyone I saw you.” They shake hands, finish their pollingbusiness and scurry off.

32 / 46

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

Political mobilization (Bond et al., 2012)

33 / 46

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

Political mobilization (Bond et al., 2012)

The introduction of the button might have caused a 0.60% increase in the 2010 congressional election.Most of this effect can be attributed to the impact ofstrong ties. 34 / 46

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

Social influence (Muchnik et al. 2013)

101.281 experimental comments, 4049 were uptreated,1942 downtreated.

Viewed 107 times and rated 308515 times.

35 / 46

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

Emotion contagion

the tendency to automatically mimic and synchronizeexpressions, vocalizations, postures, and movements with thoseof another person’s and, consequently, to converge emotionally.

36 / 46

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

Emotion contagion over the internet (Kramer etal., 2014)

Scientists at Facebook manipulated the newsfeed of600000+ users.Some of them were exposed to linguistically more positivecontent.While others where exposed to more negative content. 37 / 46

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

Pushback to the emotion contagion study

The data did not violate Facebook’s consent policy.

Debate: was it ethical for facebook to conduct such anexperiment?

38 / 46

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

Using searched terms for forecasting (Choi andVarian, 2012)

39 / 46

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

Using searched terms for forecasting (Choi andVarian, 2012)

40 / 46

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

Predicting movies, video games and music (Goel etal., 2010)

41 / 46

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

Predicting movies, video games and music (Goel etal., 2010)

42 / 46

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

Predicting movies, video games and music (Goel etal., 2010)

43 / 46

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

Beating the market (Preis et al., 2013)

44 / 46

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

Beating the market (Preis et al., 2013)

45 / 46

Big data andhuman

behavior

Pantelis P.Analytis

Manifestos

Clickprediction

Informationlatent in theenvironment

Newtechniques

Data from thefield

Personalizationand targeting

The newlaboratory

Big data andforecasting

The parable of the flu (Lazer et al., 2014)

46 / 46