cs6604 digital libraries social communities knowledge ...€¦ · cs6604 digital libraries social...
Post on 09-Jun-2020
17 Views
Preview:
TRANSCRIPT
CS6604 Digital LibrariesSocial Communities Knowledge Management:
Social Interactome
Final Term Project PresentationPresenter
Prashant Chandrasekar{peecee}@vt.edu
InstructorDr. Edward A. Fox
Virginia Polytechnic Institute and State UniversityBlacksburg, VA, 24061
May 2, 2017
Final Presentation
• Dr.EdwardA.Fox• Globaleventsteam• SocialInteractometeam
• TheSocialInteractomeofRecovery:SocialMediaasTherapyDevelopment(NIHGrant1R01DA039456-01)
• XuanZhangandYufengMa• MostafaMohammed
Acknowledgements
2
Final Presentation
• Background• Socialnetworkcommunity;SocialInteractome• Data
• Challenges• Goal• Approaches• NetworkClassification• LearningviaMarkovLogicNetworks• FutureWork
Outline
3
Final Presentation
• SocialInteractome• NIH-fundedprojectconductedbyateamofresearchers• Studythecommunityofpeople,whoarerecoveringfromaddiction
• Studytheirinteractionsinanonlinesocialnetwork,builttoprovidesupportandmanagementoftheirrecovery
• Theprojectisbrokendownintosetof“testvs.control”experimentswithvariablesdefined:
• Durationofstudy• Numberofparticipantsrequired• Avenueofrecruitment• Nullandalternativehypotheses
Background: Social Interactome (SI)
4
Final Presentation
• Theprojectisbrokendownintoasetofclinicaltrials.Foreachclinicaltrial:
• Theteamdecidesonasetofnullandalternativehypothesesandthedurationofthetrial
• Recruitsparticipantsforthetrial• Organizestheparticipantsintooneoftwo(ormore)128-nodesocialnetwork
• Participantsinteractwiththewebsiteandtheirassignedfriends
• Two16-weekclinicaltrialshavebeencompleted.AlongwithasetofsmallerscaledtrialsexecutedviaAmazonMturk
Background: SI Setup
5
Final Presentation
Participants
Demographic Family'shistorywithaddiction
PastSocialnetworkexperience
PrimaryAddiction
SecondaryAddiction
AddictionSeverityIndex
SocialConnected
Scale
AdultSocialNetworkIndex
DSM-VBig5
Personalities
RecoveryCapitalScale
Relapse
AssessmentRecoveryCapital
MinuteDiscounting
ReligiousCommitmentInventory
RecoveryParticipation
Scale
FamilyInfo
...
Background: SI Participant Info
6
- Collectedfrom19,070questions
- ~10psychology-basedmeasures
- 16surveys
Final Presentation
Participants
TESModules/TESscores
NewsStories
SuccessStories
UnpaidAssessments
VideoMeetings
PrivateMessages
ResponseFromAdmin
Posts
Posts/Likes/Shares
/Comments
Pictures/LinksUploads
Background:SI Website Use Data
7
Final Presentation
• Howdoyouorganizethedata?• Howdoyouvalidate/cleanthedata?• Whatdoyouanalyzefirst?Andinwhatorderdoyougoaboutit?
• Howdoyoumakesenseofthedata?• Howtointerpretpsychology-relatedmeasures?• Biggoal:Howtostreamlinetheentireprocessfromdatacollectiontoanalysestopresentationsuchthatitisreproducibleandextensible?
Overall Challenges
8
Final Presentation
• Goal:Investigate/explorewaystomodelthedataandrecommendanapproach.
• Approachestounderstandthedata• FrequencyDistributions/Histograms• Timeseries• Checkingforcorrelations• Comparingmeansandstandarddeviations
• t-tests• Statisticalmodeling
Goal
9
Final Presentation
• StatisticalModeling• Whatdowemodel?
• Substancerelapse• Engagement/Changeinengagement• Changeinpsychology-relatedmeasures• Changeinbehavior• Homophily• FriendshiporTrust
• Factors• Classification:Whatwouldbethepredictorvariables?Responsevariables?
• PGMs:DirectedorUndirected?Whatwouldbethefactors?
Approaches
10
Final Presentation
• Classification• Network-ClassificationusingNetKit-SRL(StatisticalRelationalLearning)1[Focusofthepresentation]
• Learning usingMarkovLogicNetworks2
1Sofus A.Macskassy,FosterProvost."Classification inNetworkedData:Atoolkitandaunivariate casestudy,"Journal ofMachineLearning,8(May):935-983, 2007.
2Domingos, PedroandRichardson,Matthew(2007).MarkovLogic:AUnifying FrameworkforStatisticalRelationalLearning.InL.Getoor andB.Taskar (eds.), IntroductiontoStatisticalRelationalLearning(pp.339-371),2007.Cambridge,MA:MITPress.
Approaches
11
Final Presentation
• Idea:Takingadvantageofrelationalinformationinadditiontoattributeinformationforentityclassification.Example:Networkeddata.
• Focusesonwithin-network classification• Networksofwebpages,researchpapers,socialnetworks,etc.
• Netkit-SRL:Toolkitdevelopedtoemploystatisticalrelationallearningandinference
Network Classification
12
Final Presentation
• Netkit-SRL• Networklearningtoolkitforclassificationandinference• DevelopedbyDr.Macskassy &Dr.Provost• Has3components
• Non-relationalmodel• Relationalmodel• Collectiveinference
• SpecificOutcomes:• MaximizeP(x|GK),wherexarelabelstobeestimatedandGK iseverythingknowninthenetwork
• Estimatingjointdistributionoverthelabels• Input:
• Graphwithedgesdescribingrelationshipsandattributesofnodes
Network Classification
13
Final Presentation
• Netkit-SRLComponents
Network Classification
14
Purpose Approaches
Local(Non-relational) ClassifierReturnsamodelwhichusesonly attributesofanodetoestimateitsclasslabel.
1)Uniformprior;2)Class-prior
Relational Classifier
Returnsamodelwhichusesnotonlythelocalattributesofanodebutalsoattributesofrelatednodes, including their(estimated)classmembership.
1)Weighted-voterelationalneighbor;2)Class-
distributional relationalneighbor;3)Network-onlymultinomial BayesclassifierwithMarkovRandomField
estimation
Collective Inference
Thismodule appliescollectiveinferenceinorderto
(approximately)maximizethejointprobability ofthelabelsofallnodes inthegraphwhoselabelswereinitially
unknown.
1)Relaxation labeling;2)Iterativeclassification;3)
Gibb’s sampling
Final Presentation
• Possibleinstantiations
Network Classification
15
Author Non-relationalClassifier
RelationalClassifier CollectiveInference
Chakrabartietal.(1998)1
Naïve Bayesclassifier NaïveBayesMarkovRandom Field
Relaxation labeling
Lu&Getoor(2003)2
Logistic regression Logisticregression Iterativeclassification
Macskassy&Provost(2003)3
Classespriors Majorityvoteofneighboring classes
Relaxation labeling
[1] Chakrabarti,S.,Dom,B.,&Indyk,P.(1998).EnhancedHypertextCategorizationUsingHyperlinks.ProceedingsoftheACMSIGMODInternationalConferenceonManagementofData(pp.307–318).[2] Lu,Q.,& Getoor,L.(2003).Link-Based Classification.InternationalConference onMachineLearning,ICML-2003 (pp. 496–503).[3]Macskassy,S.A.,&Provost,F.(2003).ASimpleRelationalClassifier.ProceedingsoftheSecondWorkshoponMulti-RelationalDataMining(MRDM-2003)atKDD-2003(pp.64–76).
Final Presentation
• Weighted-voterelationalneighborclassifier(wv-RN)• Authors:Macskassy,S.A.,&Provost,F.(2003)• Estimatesclassmembershipbyassumingexistenceofhomophily
• Weightedmeanofclass-membershipprobabilitiesofentitiesinDe (whereDe istheneighborsofentity/nodee)
• 𝑃 𝑐 𝑒 = %&∑𝑤 𝑒, 𝑒* ∗ 𝑃(𝑐|𝑒*)
Network Classification
16
Final Presentation
• CollectiveInferenceusingRelaxationLabeling• Definitionofcollectiveinference:
• SimilarbutdifferenttoGibbssamplinginthat:• KeepstrackofclassprobabilityestimatesforXU• Insteadofupdatingthegraphonenodeatatime,updatesclassprobabilitiesofallvertices,atiterationt+1,basedonestimationsfromstept.
Network Classification
17
Final Presentation
• Experiment• Rationale:Participantswhoare“homopholous”(whohavesharedbackgroundincommon),havecommoninterests.
• Hypothesis:Givenasetofcommoninterests,betweenpairsofparticipants,onecanpredictthehomophily-measureswithgoodaccuracy.
• Inputgraph• Nodes:Participants• Attributes:Addiction,Education,Income• Edges:Edgeweightisthenumberofnewsstories+successstories+ educationalmodulesthatbothnodes(connectedviatheedge)haveviewedincommon.
• Predictedattribute:Addiction
Network Classification: Experiment
18
Final Presentation
• PossibleExperimentconfigurations• Non-relationalclassifier:None• Relationalclassifier:(Options)
• WeightedVoteRelationalNeighbor• Class-DistributionalRelationalNeighbor
• Collectiveinference:(Options)• Relaxationlabeling• Gibbssampling• Iterativeclassification
• Data:Nodesandedgesextractedfromexperiment1replicate2(E1R2)participantinteractions.
• Goal:Predict1)PrimaryAddiction(givengraph);2)Education(givengraph);3)Incomebracket(givengraph)
Network Classification: Experiment Config
19
Final Presentation
• E1R2datastatistics• #ofnodes:256;#ofedges:436
Network Classification: Experiment
20
30
139
41
118 17
1 7 1 10
20406080100120140160
Freq
uency
PrimarySubstance
PRIMARYSUBSTANCEBREAKDOWNAMONG256PARTICIPANTS
Final Presentation
Network Classification: Experiment
21
• ExperimentE1R2datastatistics• Edgeweightbreakdown
317
66
2411 7 5 1 3 2 1
0
50
100
150
200
250
300
350
1 2 3 4 5 6 7 8 11 12
Freq
uency
EdgeStrength
EDGE WEIGHT BREAKDOWN
Final Presentation
RelationalClassifier/CollectiveInferencemethods
Relaxation Labeling Gibbs Sampling IterativeClassification
Weighted VoteRelationalNeighbor
(wvRN)0.36601 0.37908 0.39216
Class-DistributionalRelational Neighbor 0.15686 0.22222 0.18954
Network Classification: Experiment Results
22
• Networkclassificationframeworkresults(variousexperimentconfigurationsgivenasrow/columnnames)(Metric:Accuracy)• Goal:Predict“PrimaryAddiction”ofparticipants
Final Presentation
• PredictedResponse/Class=PrimaryAddiction• Configuration:wvRNwithrelaxationlabeling• ConfusionMatrix
Network Classification: Experiment Results
23
Final Presentation
• PredictedResponse/Class=Education• Configuration:wvRNwithrelaxationlabeling• ConfusionMatrix
Network Classification: Experiment Results
24
Final Presentation
• PredictedResponse/Class=Income• Configuration:wvRNwithrelaxationlabeling• ConfusionMatrix
Network Classification: Experiment Results
25
Final Presentation
• Conclusion• Thehighestaccuracyforallexperimentconfigurationsforpredictingprimaryaddictionasshowninslide22,is0.392
• Theconfusionmatrixforpredictingeachofprimaryaddiction,educationandincomeshowsmoredetailsontheaccuracyofpredictingeachclass.
• Theaccuracyislow.• Thisisprobably duetothefactthatourexperimentconfigurationdoesNOTincludeanon-relationalcomponent.
• Furthermore,ourgraphedges,andattributeshaveonly1-3fields.Thegraphneedstobemoredensewithalotmoreinformationtobeusedfornetwork-basedinference.
Network Classification: Experiment Conclusion
26
Final Presentation
• Possibleextensionsofthework:• Buildgraphwithdifferentrepresentationofedges• Constructmoreattributesofthenodefornon-relational(local)classifierstep
• Tryexperimentswithpriorslearntfromvarioustraditionalclassificationmodels.
• Problem/Challenge• Extensionorfurtherworkisopen-ended.• Partofdoctoralwork:Buildalogicalflowchartofinquiries/hypotheses.
• Thelogicalflowchartofinquiriescanbeusedandcalleduponbasedonuser’slineofinquiry.
Network Classification: Next Steps
27
Final Presentation
• AMarkovLogicNetwork(MLN) isasetofpairs(F,w)where
• F isaformulainfirst-orderlogic• w isarealnumber
• Togetherwithasetofconstants,itdefinesaMarkovnetworkwith
• OnenodeforeachgroundingofeachpredicateintheMLN• OnefeatureforeachgroundingofeachformulaF intheMLN,withthecorrespondingweightw
*Slidesource:http://www.cs.washington.edu/homes/pedrod/psrai.ppt
Learning via Markov Logic Networks
28
Final Presentation
Learning via Markov Logic Networks
29
Twoconstants:Anna (A)andBob (B)
Cancer(A)
Smokes(A) Smokes(B)
Cancer(B)
*Slidesource:http://www.cs.washington.edu/homes/pedrod/psrai.ppt
Final Presentation
Learning via Markov Logic Networks
30
Cancer(A)
Smokes(A)Friends(A,A)
Friends(B,A)
Smokes(B)
Friends(A,B)
Cancer(B)
Friends(B,B)
*Slidesource:http://www.cs.washington.edu/homes/pedrod/psrai.ppt
Final Presentation
Learning via Markov Logic Networks
31
Cancer(A)
Smokes(A)Friends(A,A)
Friends(B,A)
Smokes(B)
Friends(A,B)
Cancer(B)
Friends(B,B)
*Slidesource:http://www.cs.washington.edu/homes/pedrod/psrai.ppt
Final Presentation
Learning via Markov Logic Networks
32
Cancer(A)
Smokes(A)Friends(A,A)
Friends(B,A)
Smokes(B)
Friends(A,B)
Cancer(B)
Friends(B,B)
Weightof formulai
No.oftruegroundings of formulai inx
⎟⎠
⎞⎜⎝
⎛= ∑
iii xnw
ZxP )(exp1)(Probabilityofaworldx:
*Slidesource:http://www.cs.washington.edu/homes/pedrod/psrai.ppt
Final Presentation
Tasks/Applications
Learning via Markov Logic Networks
33
• Basics• Logisticregression• Hypertextclassification• Informationretrieval• Entityresolution• HiddenMarkovmodels• Informationextraction
• Statisticalparsing• Semanticprocessing• Bayesiannetworks• Relationalmodels• Robotmapping• PlanningandMDPs• Practicaltips
*Slidesource:http://www.cs.washington.edu/homes/pedrod/psrai.ppt
Final Presentation
• Nextsteps• Extractmoreattributesforeachparticipant
• Compiledifferentwaystorepresentedgeweight
• BuildlocalclassifierandtestingresultsforNetkit-SRL
• UseAlchemytorepresentdatausingMarkovLogicnetworks.
Future work
34
Questions?
Final Presentation
• Otherworks• Inductivelogicprogramming• Markovrandomfields• Conditionalrandomfields• Probabilisticrelationalmodels• RelationalBayesiannetworks• Relationaldependencynetworks• RelationalMarkovnetworks
Network Classification
36
top related