Transcript
Page 1: Predicting Sexual Orientation Via Facebook Status …cs229.stanford.edu/proj2016/poster/LohSooXing-PredictingSexual... · Michael Xing Aaron Loh ... Predicting Sexual Orientation

KennethSooMichaelXingAaronLoh

Goal: PredictsexualorientationfromFacebookstatusupdates.

Motivation:Wewanttoexaminethehypothesisthatpeoplewithdifferentsexualorientationsexpressthemselvesdifferentlyonsocialmedia.CombiningourresultswithourCS221Project,whichextractedgenderfeaturesfromstatusupdates,weseektotestthestereotypethatmalehomosexualstendtousemorefemininelanguage.

GoalandMotivationStanford,CA

ResultsStanford,CA

Analysis▪ ROCscoresforbothfemalesandmaleswereabove60%,whichtoldusthattherewere

distinctions inhowhomosexualsexpressedthemselvesonsocialmedia,evenifthedistinctionwasnotgreatenoughtoconsistentlypredictone’ssexualorientation.

▪ Mentionsofanotherpartneroftheopposite gender(e.g.whenmalesmention‘wife’)arestrongindicatorsthatapersonisheterosexual.

▪ Ourmodelshowed thatamalehomosexualwas4timesmorelikelytousetheword“gay”.Infact,amalewhomentions“gay”inastatusupdatehasa1in4chanceofbeinghomosexual.

Limitations▪ Agecouldbeaconfoundingfactorthatisdrivingthedifferencesbetweenhomosexualsand

heterosexuals.Forexample,itmaybepopularforyounggirlstodeclareonFacebookthatthereareina"relationship"withagoodfriendiftheyareheterosexual.Also,thetopwordfeaturesforfemalehomosexualsaremoreassociatedwithyoungpeople(e.g.omg,sex,b*tch),whereasthatforheterosexualsaremoreassociatedwitholderpeople(e.g.husband, church,work).

Applyingthegendermodeltoourdata• WehadearlierbuiltandtrainedmodelstopredictgenderforourCS221Project.Wewantedto

usethesemodelstotestthestereotypethatmalehomosexualsexpressedthemselves inamoreeffeminatemanner.

• Wefoundthatourgendermodelpredictedthat45%ofallmalehomosexualswerefemale.Itpredictedthat40%ofallmaleheterosexualswerefemale.Thismeantthatamalehomosexualwas5percentagepointsmorelikelytobepredictedfemale.

• Ourresultssuggestthatthereisslightevidencethatmalehomosexualsexpressthemselvesmorelikefemales,ascomparedtomaleheterosexuals.However,theevidenceisnotstrongenoughtosupport thesocialstereotype.

FutureWork• Explorationofothermethodsoffeatureextraction(e.g.Word2Vec),andmorenuancedfeature

engineering.Wecanalsouseneuralnetworks toautomaticallylearnfeaturesinthedata.

420

PredictingSexualOrientationViaFacebookStatusUpdates

▪ WeuseddatafrommyPersonality.org,withkindpermissionfromDr.MichalKosinski (StanfordGSB),whichcontains22MFacebookstatusupdatesandincludeddemographicdetails(e.g.gender)ofeveryuserinthedataset.

▪ Wederivedthesexualorientationlabelsbylookingatthegenderofauser'spartner,andcomparingittotheuser'sgender.

▪ Wordstemmingwasappliedonthestatusupdates.▪ Ourdatasetisskewedina9:1ratio.Assuch,ourtesterrordidnotprovidea

meaningfulsenseofhowourmodelperformed,andweusedalternativemeasureslikeF-1scoreandROCcurvesinstead.

DataStanford,CA

229

Features:▪ N-grams(tunedacrossarangeofhyper-parameters)▪ Countsofperiods,exclamationmarks,smileysandcapitalletters.

LearningAlgorithms:▪ SupportVectorMachine▪ MultinomialNaïveBayes▪ LogisticRegression▪ RandomForest

FeaturesandModelsStanford,CA

1337

221

Stanford,CA

AnalysisandFutureWork

Model

Males FemalesROCAUC

F1Score*

ROCAUC

F1Score*

LogisticRegression

0.57 0.92(0.97,0.20)

0.62 0.84(0.94,0.24)

NaïveBayes 0.52 0.91(0.96,0.21)

0.58 0.84(0.93,0.30)

SVM 0.61 0.94(0.98,0.17)

0.62 0.85(0.93,0.36)

Random Forest 0.55 - 0.63 -*ForF1Score, thefiguresinparenthesesindicateF1-scores forheterosexualsandhomosexuals respectively.

51337 1457

5693240

Confusion matrix, without normalization

Heterosexual Homosexual

hete

rose

xual

hom

osex

ual

Predicted Label

True

Lab

el

Heterosexual Homosexual

Fem

ale

Mal

e

Top Word Features

SVMModel ParametersTuningN-gramrange (1,2) (1,4) (1,5)Mindocumentfrequency 1 0.95 0.9

Maxdocumentfrequency 1 0.95 0.9

Kernel Linear Poly Rbf

952

Top Related