KennethSooMichaelXingAaronLoh
Goal: PredictsexualorientationfromFacebookstatusupdates.
Motivation:Wewanttoexaminethehypothesisthatpeoplewithdifferentsexualorientationsexpressthemselvesdifferentlyonsocialmedia.CombiningourresultswithourCS221Project,whichextractedgenderfeaturesfromstatusupdates,weseektotestthestereotypethatmalehomosexualstendtousemorefemininelanguage.
GoalandMotivationStanford,CA
ResultsStanford,CA
Analysis▪ ROCscoresforbothfemalesandmaleswereabove60%,whichtoldusthattherewere
distinctions inhowhomosexualsexpressedthemselvesonsocialmedia,evenifthedistinctionwasnotgreatenoughtoconsistentlypredictone’ssexualorientation.
▪ Mentionsofanotherpartneroftheopposite gender(e.g.whenmalesmention‘wife’)arestrongindicatorsthatapersonisheterosexual.
▪ Ourmodelshowed thatamalehomosexualwas4timesmorelikelytousetheword“gay”.Infact,amalewhomentions“gay”inastatusupdatehasa1in4chanceofbeinghomosexual.
Limitations▪ Agecouldbeaconfoundingfactorthatisdrivingthedifferencesbetweenhomosexualsand
heterosexuals.Forexample,itmaybepopularforyounggirlstodeclareonFacebookthatthereareina"relationship"withagoodfriendiftheyareheterosexual.Also,thetopwordfeaturesforfemalehomosexualsaremoreassociatedwithyoungpeople(e.g.omg,sex,b*tch),whereasthatforheterosexualsaremoreassociatedwitholderpeople(e.g.husband, church,work).
Applyingthegendermodeltoourdata• WehadearlierbuiltandtrainedmodelstopredictgenderforourCS221Project.Wewantedto
usethesemodelstotestthestereotypethatmalehomosexualsexpressedthemselves inamoreeffeminatemanner.
• Wefoundthatourgendermodelpredictedthat45%ofallmalehomosexualswerefemale.Itpredictedthat40%ofallmaleheterosexualswerefemale.Thismeantthatamalehomosexualwas5percentagepointsmorelikelytobepredictedfemale.
• Ourresultssuggestthatthereisslightevidencethatmalehomosexualsexpressthemselvesmorelikefemales,ascomparedtomaleheterosexuals.However,theevidenceisnotstrongenoughtosupport thesocialstereotype.
FutureWork• Explorationofothermethodsoffeatureextraction(e.g.Word2Vec),andmorenuancedfeature
engineering.Wecanalsouseneuralnetworks toautomaticallylearnfeaturesinthedata.
420
PredictingSexualOrientationViaFacebookStatusUpdates
▪ WeuseddatafrommyPersonality.org,withkindpermissionfromDr.MichalKosinski (StanfordGSB),whichcontains22MFacebookstatusupdatesandincludeddemographicdetails(e.g.gender)ofeveryuserinthedataset.
▪ Wederivedthesexualorientationlabelsbylookingatthegenderofauser'spartner,andcomparingittotheuser'sgender.
▪ Wordstemmingwasappliedonthestatusupdates.▪ Ourdatasetisskewedina9:1ratio.Assuch,ourtesterrordidnotprovidea
meaningfulsenseofhowourmodelperformed,andweusedalternativemeasureslikeF-1scoreandROCcurvesinstead.
DataStanford,CA
229
Features:▪ N-grams(tunedacrossarangeofhyper-parameters)▪ Countsofperiods,exclamationmarks,smileysandcapitalletters.
LearningAlgorithms:▪ SupportVectorMachine▪ MultinomialNaïveBayes▪ LogisticRegression▪ RandomForest
FeaturesandModelsStanford,CA
1337
221
Stanford,CA
AnalysisandFutureWork
Model
Males FemalesROCAUC
F1Score*
ROCAUC
F1Score*
LogisticRegression
0.57 0.92(0.97,0.20)
0.62 0.84(0.94,0.24)
NaïveBayes 0.52 0.91(0.96,0.21)
0.58 0.84(0.93,0.30)
SVM 0.61 0.94(0.98,0.17)
0.62 0.85(0.93,0.36)
Random Forest 0.55 - 0.63 -*ForF1Score, thefiguresinparenthesesindicateF1-scores forheterosexualsandhomosexuals respectively.
51337 1457
5693240
Confusion matrix, without normalization
Heterosexual Homosexual
hete
rose
xual
hom
osex
ual
Predicted Label
True
Lab
el
Heterosexual Homosexual
Fem
ale
Mal
e
Top Word Features
SVMModel ParametersTuningN-gramrange (1,2) (1,4) (1,5)Mindocumentfrequency 1 0.95 0.9
Maxdocumentfrequency 1 0.95 0.9
Kernel Linear Poly Rbf
952