big data and human behavior - courses.cit.cornell.edu · big data and human behavior pantelis p....
TRANSCRIPT
Big data andhuman
behavior
Pantelis P.Analytis
Manifestos
Clickprediction
Informationlatent in theenvironment
Newtechniques
Data from thefield
Personalizationand targeting
The newlaboratory
Big data andforecasting
Big data and human behavior
Pantelis P. Analytis
April 19, 2018
1 / 46
Big data andhuman
behavior
Pantelis P.Analytis
Manifestos
Clickprediction
Informationlatent in theenvironment
Newtechniques
Data from thefield
Personalizationand targeting
The newlaboratory
Big data andforecasting
1 Manifestos
2 Click prediction
3 Information latent in the environment
4 New techniques
5 Data from the field
6 Personalization and targeting
7 The new laboratory
8 Big data and forecasting
2 / 46
Big data andhuman
behavior
Pantelis P.Analytis
Manifestos
Clickprediction
Informationlatent in theenvironment
Newtechniques
Data from thefield
Personalizationand targeting
The newlaboratory
Big data andforecasting
Potential uses of big data and naturally occurringdatasets
External validation of laboratory experiments.
Demonstrate phenomena that motivate follow-onlaboratory research.
Discover patterns of information latent in environmentsprocessed, and analyzed?
Create stimuli for experiments.
Construct and test computational models of cognition.
3 / 46
Big data andhuman
behavior
Pantelis P.Analytis
Manifestos
Clickprediction
Informationlatent in theenvironment
Newtechniques
Data from thefield
Personalizationand targeting
The newlaboratory
Big data andforecasting
The four V’s of big data
Volume: How much data are included?
Variety: How many different kinds or sources of data areincluded?
Velocity: How quickly are the data able to be gathered,processed, and analyzed?
Veracity: How faithfully do the data capture what theyare believed to capture?
4 / 46
Big data andhuman
behavior
Pantelis P.Analytis
Manifestos
Clickprediction
Informationlatent in theenvironment
Newtechniques
Data from thefield
Personalizationand targeting
The newlaboratory
Big data andforecasting
The first click model
zn D. H. KRAFT and T. LEE
Table 2. Expected search length formulas
Rule (Condition) E(X) -!a?.- E(Z)
satiation Rule (Ix+) R I N Disgust Rule (i>I) Combinaeion Rule
(r>R and i>I)
Satiation Rule (6=R) r r I/(R+l) r (N+l),'(R+lf Combination Rule
(rCR and VI) z
Disgust Rule (il1) i R/(1+1) I i - i fN+l)/(I+l)
Combination Rule (OR and 1~1)
Combination Rule r-l R I X ( (k)(i)/(k~iflik/(k+l)
r-l R I C { (k)li)/(k~l)}12/(k+l)
r-l R I
'r;R and 41) k-0 k=O k=O = i-l R I N 2 i-1RI N i-l R I N
+ E (rffk)/(k+rfr /Wk) + ~t(r)(k)/(r+kf3rk/(r+kf + r Z (r)(kf,'k+r) k=O k=O k-0
model of the satiation stopping rule. This rule allows us to expand our horizons by avoiding the assumption that the user does not consider the number of irrelevant documents encountered, or the disgust level, when determining when to terminate the scan. It also results in a new version of expected scan length, based on an alternative stopping rule. The disgust rule makes sense for user queries for exhaustive searches. An example would be a search for a list of all documents about Life in the antebellum South. However, the disgust rule has its drawbacks in that it ignores any effects on search length of the satiation level.
The probab~ity distributions for X, Y and 2 and the expected values of X, Y and Z are modeled for the disgust rule and presented above in Tables I and 2, respectively. One can readify see that the disgust rule is a dual to the satiation rule.
THE COMBINATION RULE
Still another alternative is needed. We suggest a combination rule, which altows the user to be seen as stopping the scan if he/she is satiated by finding the desired number of relevant documents or disgusted by having to examine too many irrelevant documents, whichever comes first. This rule incorporates aspects of the previous two rules.
If either R or I is zero, the combination rule degenerates into either the disgust rule or the satiation rule, respectively. We shah assume that both R and I are positive throughout the paper.
The formulas for the probab~ity distribution, and the expected values, are presented above in TabIes 1 and 2 respectively. One can readily see that the combination rule is the most complex stopping rule, the expected search Iength formulas being the most sophisticated models. Unfortunately, the expectation formulas cannot be appreciably simplified, due in large part to the explicit evaluation of the partial sums.
AN APPROXIMATION
The negative binomial distribution can be employed as an approximation to the negative hypergometric distribution formulas of Table 1. This assumes that N, the number of documents retrieved, is large, and that R and I are both positive. The negative binomial distribution applies exactly if each document is replaced after it is examined, and the search is random rather than linearf51. This coincides with our intuition that for large N, drawing documents with or without replacement yields approx~ately the same results.
The approximations for the probability distributions of X, Y and 2 and for the expected
Kraft, Donald H., and T. Lee. ”Stopping rules and their effecton expected search length.” Information ProcessingManagement 15.1 (1979): 47-58.
5 / 46
Big data andhuman
behavior
Pantelis P.Analytis
Manifestos
Clickprediction
Informationlatent in theenvironment
Newtechniques
Data from thefield
Personalizationand targeting
The newlaboratory
Big data andforecasting
Information foraging theory
Pirolli, Peter, and Stuart Card. ”Information foraging.”Psychological review 106.4 (1999): 643.
6 / 46
Big data andhuman
behavior
Pantelis P.Analytis
Manifestos
Clickprediction
Informationlatent in theenvironment
Newtechniques
Data from thefield
Personalizationand targeting
The newlaboratory
Big data andforecasting
The position based model
7 / 46
Big data andhuman
behavior
Pantelis P.Analytis
Manifestos
Clickprediction
Informationlatent in theenvironment
Newtechniques
Data from thefield
Personalizationand targeting
The newlaboratory
Big data andforecasting
The cascade model
8 / 46
Big data andhuman
behavior
Pantelis P.Analytis
Manifestos
Clickprediction
Informationlatent in theenvironment
Newtechniques
Data from thefield
Personalizationand targeting
The newlaboratory
Big data andforecasting
Measures of retrieval effectiveness
Cooper, William S. ”On selecting a measure of retrievaleffectiveness.” Journal of the American Society for InformationScience 24.2 (1973): 87-100.
Carbonell, J., Goldstein, J. (1998). The use of MMR,diversity-based reranking for reordering documents andproducing summaries. SIGIR
Jarvelin, Kalervo, and Jaana Kekalainen. (2002) Cumulatedgain-based evaluation of IR techniques. TOIS
Chapelle, O., Metlzer, D., Zhang, Y., Grinspan, P. (2009).Expected reciprocal rank for graded relevance. CIKM
9 / 46
Big data andhuman
behavior
Pantelis P.Analytis
Manifestos
Clickprediction
Informationlatent in theenvironment
Newtechniques
Data from thefield
Personalizationand targeting
The newlaboratory
Big data andforecasting
Dynamic Bayesian model
10 / 46
Big data andhuman
behavior
Pantelis P.Analytis
Manifestos
Clickprediction
Informationlatent in theenvironment
Newtechniques
Data from thefield
Personalizationand targeting
The newlaboratory
Big data andforecasting
Prediction rates of different models
11 / 46
Big data andhuman
behavior
Pantelis P.Analytis
Manifestos
Clickprediction
Informationlatent in theenvironment
Newtechniques
Data from thefield
Personalizationand targeting
The newlaboratory
Big data andforecasting
Reflections on memory
Ebbinghaus was a pioneer of the experimental study ofmemory.He studied the effect of repetition on learning usingnon-sense syllabus doing experiments on himself.He spend 4 hours a day for his experiments and hecontrolled his life for a period longer than 2 years.
12 / 46
Big data andhuman
behavior
Pantelis P.Analytis
Manifestos
Clickprediction
Informationlatent in theenvironment
Newtechniques
Data from thefield
Personalizationand targeting
The newlaboratory
Big data andforecasting
Laws of retention
13 / 46
Big data andhuman
behavior
Pantelis P.Analytis
Manifestos
Clickprediction
Informationlatent in theenvironment
Newtechniques
Data from thefield
Personalizationand targeting
The newlaboratory
Big data andforecasting
Environment reflecting memory (Anderson andSchooler, 1991)
14 / 46
Big data andhuman
behavior
Pantelis P.Analytis
Manifestos
Clickprediction
Informationlatent in theenvironment
Newtechniques
Data from thefield
Personalizationand targeting
The newlaboratory
Big data andforecasting
Spacing effect
15 / 46
Big data andhuman
behavior
Pantelis P.Analytis
Manifestos
Clickprediction
Informationlatent in theenvironment
Newtechniques
Data from thefield
Personalizationand targeting
The newlaboratory
Big data andforecasting
Spacing effect (Anderson and Schooler, 1991)
16 / 46
Big data andhuman
behavior
Pantelis P.Analytis
Manifestos
Clickprediction
Informationlatent in theenvironment
Newtechniques
Data from thefield
Personalizationand targeting
The newlaboratory
Big data andforecasting
The induction problem: how do people know somuch with so little experience.
Plato’s Phaedon dialogue.
Chomsky’s linguistics and nativism.
Shepard’s law of generalization: placing the problem atthe center of psychology.
All of machine learning is about induction.
17 / 46
Big data andhuman
behavior
Pantelis P.Analytis
Manifestos
Clickprediction
Informationlatent in theenvironment
Newtechniques
Data from thefield
Personalizationand targeting
The newlaboratory
Big data andforecasting
Latent semantic analysis (Landauer and Dumais,1997)
Landauer and Dumais assume that conceptual similarity isbased on co-occurrence in natural language.They trained the model using 4.6m words from Grolier’sAcademic American Encyclopedia.They either used the first 2000 words or the entire articleto construct co-occurrence matrixes.
18 / 46
Big data andhuman
behavior
Pantelis P.Analytis
Manifestos
Clickprediction
Informationlatent in theenvironment
Newtechniques
Data from thefield
Personalizationand targeting
The newlaboratory
Big data andforecasting
Latent semantic analysis (Landauer and Dumais,1997)
They tested the performance of the model on the Toefltest (80 synonym items).The model achieved performance comparable to theaverage foreigner student taking the test.
19 / 46
Big data andhuman
behavior
Pantelis P.Analytis
Manifestos
Clickprediction
Informationlatent in theenvironment
Newtechniques
Data from thefield
Personalizationand targeting
The newlaboratory
Big data andforecasting
Biases in the field (Lacetera et al., 2012)
26.000.000 cars from 2002 to 2008.Average car is 4 years old with 57.000 in the meter. 82 %of the cars were sold. 20 / 46
Big data andhuman
behavior
Pantelis P.Analytis
Manifestos
Clickprediction
Informationlatent in theenvironment
Newtechniques
Data from thefield
Personalizationand targeting
The newlaboratory
Big data andforecasting
Games and behavior (Stafford and Dewar, 2014)
Spacing practice leads to better results than cramming.
Learning curves are characterized by the power law ofpractice.
It takes 10000h of practice to completely master a skill.(Ericsson,2006).
21 / 46
Big data andhuman
behavior
Pantelis P.Analytis
Manifestos
Clickprediction
Informationlatent in theenvironment
Newtechniques
Data from thefield
Personalizationand targeting
The newlaboratory
Big data andforecasting
Games and behavior (Stafford and Dewar, 2014)
22 / 46
Big data andhuman
behavior
Pantelis P.Analytis
Manifestos
Clickprediction
Informationlatent in theenvironment
Newtechniques
Data from thefield
Personalizationand targeting
The newlaboratory
Big data andforecasting
Games and behavior (Stafford and Dewar, 2014)
23 / 46
Big data andhuman
behavior
Pantelis P.Analytis
Manifestos
Clickprediction
Informationlatent in theenvironment
Newtechniques
Data from thefield
Personalizationand targeting
The newlaboratory
Big data andforecasting
Games and behavior (Stafford and Dewar, 2014)
24 / 46
Big data andhuman
behavior
Pantelis P.Analytis
Manifestos
Clickprediction
Informationlatent in theenvironment
Newtechniques
Data from thefield
Personalizationand targeting
The newlaboratory
Big data andforecasting
Assessing personalities
25 / 46
Big data andhuman
behavior
Pantelis P.Analytis
Manifestos
Clickprediction
Informationlatent in theenvironment
Newtechniques
Data from thefield
Personalizationand targeting
The newlaboratory
Big data andforecasting
Using likes to predict personality (Youyou et al.,2015)
26 / 46
Big data andhuman
behavior
Pantelis P.Analytis
Manifestos
Clickprediction
Informationlatent in theenvironment
Newtechniques
Data from thefield
Personalizationand targeting
The newlaboratory
Big data andforecasting
Using likes to predict personality (Youyou et al.,2015)
27 / 46
Big data andhuman
behavior
Pantelis P.Analytis
Manifestos
Clickprediction
Informationlatent in theenvironment
Newtechniques
Data from thefield
Personalizationand targeting
The newlaboratory
Big data andforecasting
Psychological targeting (Matz et al., 2017)
28 / 46
Big data andhuman
behavior
Pantelis P.Analytis
Manifestos
Clickprediction
Informationlatent in theenvironment
Newtechniques
Data from thefield
Personalizationand targeting
The newlaboratory
Big data andforecasting
Psychological targeting (Matz et al., 2017)
29 / 46
Big data andhuman
behavior
Pantelis P.Analytis
Manifestos
Clickprediction
Informationlatent in theenvironment
Newtechniques
Data from thefield
Personalizationand targeting
The newlaboratory
Big data andforecasting
Predicting sexual orientation (Kosinski and Wang,2017)
30 / 46
Big data andhuman
behavior
Pantelis P.Analytis
Manifestos
Clickprediction
Informationlatent in theenvironment
Newtechniques
Data from thefield
Personalizationand targeting
The newlaboratory
Big data andforecasting
Predicting sexual orientation (Kosinski and Wang,2017)
31 / 46
Big data andhuman
behavior
Pantelis P.Analytis
Manifestos
Clickprediction
Informationlatent in theenvironment
Newtechniques
Data from thefield
Personalizationand targeting
The newlaboratory
Big data andforecasting
To vote or not to vote?
Two world-class economists who run into each other at thevoting booth. What are you doing here? one asks.My wife made me come, the other says.The first economist gives a confirming nod. The same.After a mutually sheepish moment, one of them hatches a plan:”If you promise never to tell anyone you saw me here, I’ll nevertell anyone I saw you.” They shake hands, finish their pollingbusiness and scurry off.
32 / 46
Big data andhuman
behavior
Pantelis P.Analytis
Manifestos
Clickprediction
Informationlatent in theenvironment
Newtechniques
Data from thefield
Personalizationand targeting
The newlaboratory
Big data andforecasting
Political mobilization (Bond et al., 2012)
33 / 46
Big data andhuman
behavior
Pantelis P.Analytis
Manifestos
Clickprediction
Informationlatent in theenvironment
Newtechniques
Data from thefield
Personalizationand targeting
The newlaboratory
Big data andforecasting
Political mobilization (Bond et al., 2012)
The introduction of the button might have caused a 0.60% increase in the 2010 congressional election.Most of this effect can be attributed to the impact ofstrong ties. 34 / 46
Big data andhuman
behavior
Pantelis P.Analytis
Manifestos
Clickprediction
Informationlatent in theenvironment
Newtechniques
Data from thefield
Personalizationand targeting
The newlaboratory
Big data andforecasting
Social influence (Muchnik et al. 2013)
101.281 experimental comments, 4049 were uptreated,1942 downtreated.
Viewed 107 times and rated 308515 times.
35 / 46
Big data andhuman
behavior
Pantelis P.Analytis
Manifestos
Clickprediction
Informationlatent in theenvironment
Newtechniques
Data from thefield
Personalizationand targeting
The newlaboratory
Big data andforecasting
Emotion contagion
the tendency to automatically mimic and synchronizeexpressions, vocalizations, postures, and movements with thoseof another person’s and, consequently, to converge emotionally.
36 / 46
Big data andhuman
behavior
Pantelis P.Analytis
Manifestos
Clickprediction
Informationlatent in theenvironment
Newtechniques
Data from thefield
Personalizationand targeting
The newlaboratory
Big data andforecasting
Emotion contagion over the internet (Kramer etal., 2014)
Scientists at Facebook manipulated the newsfeed of600000+ users.Some of them were exposed to linguistically more positivecontent.While others where exposed to more negative content. 37 / 46
Big data andhuman
behavior
Pantelis P.Analytis
Manifestos
Clickprediction
Informationlatent in theenvironment
Newtechniques
Data from thefield
Personalizationand targeting
The newlaboratory
Big data andforecasting
Pushback to the emotion contagion study
The data did not violate Facebook’s consent policy.
Debate: was it ethical for facebook to conduct such anexperiment?
38 / 46
Big data andhuman
behavior
Pantelis P.Analytis
Manifestos
Clickprediction
Informationlatent in theenvironment
Newtechniques
Data from thefield
Personalizationand targeting
The newlaboratory
Big data andforecasting
Using searched terms for forecasting (Choi andVarian, 2012)
39 / 46
Big data andhuman
behavior
Pantelis P.Analytis
Manifestos
Clickprediction
Informationlatent in theenvironment
Newtechniques
Data from thefield
Personalizationand targeting
The newlaboratory
Big data andforecasting
Using searched terms for forecasting (Choi andVarian, 2012)
40 / 46
Big data andhuman
behavior
Pantelis P.Analytis
Manifestos
Clickprediction
Informationlatent in theenvironment
Newtechniques
Data from thefield
Personalizationand targeting
The newlaboratory
Big data andforecasting
Predicting movies, video games and music (Goel etal., 2010)
41 / 46
Big data andhuman
behavior
Pantelis P.Analytis
Manifestos
Clickprediction
Informationlatent in theenvironment
Newtechniques
Data from thefield
Personalizationand targeting
The newlaboratory
Big data andforecasting
Predicting movies, video games and music (Goel etal., 2010)
42 / 46
Big data andhuman
behavior
Pantelis P.Analytis
Manifestos
Clickprediction
Informationlatent in theenvironment
Newtechniques
Data from thefield
Personalizationand targeting
The newlaboratory
Big data andforecasting
Predicting movies, video games and music (Goel etal., 2010)
43 / 46
Big data andhuman
behavior
Pantelis P.Analytis
Manifestos
Clickprediction
Informationlatent in theenvironment
Newtechniques
Data from thefield
Personalizationand targeting
The newlaboratory
Big data andforecasting
Beating the market (Preis et al., 2013)
44 / 46
Big data andhuman
behavior
Pantelis P.Analytis
Manifestos
Clickprediction
Informationlatent in theenvironment
Newtechniques
Data from thefield
Personalizationand targeting
The newlaboratory
Big data andforecasting
Beating the market (Preis et al., 2013)
45 / 46