BillRand AssistantProfessorofBusinessManagement
PooleCollegeofManagementNorthCarolinaStateUniversity
An Introduction to Agent-Based Modeling
Unit 7: Verification, Validation, and Replication
Verificationvs.Validation
Verificationistheprocessofmakingsureyourconceptualmodelmatchesyourimplementedmodel.Validationistheprocessofmakingsureyourimplementedmodelcorrespondstotherealworld.
Verification
• Communicationisessentialtoverification• Oftenthemodeldeveloperandthemodelauthorarenotthesameperson
• Reducing/eliminatingthegapbetweendeveloperandauthorimprovesverification
• Weneedacommonlanguagetodescribeconceptualmodelsthatisrecognizedbybothauthorsanddevelopers
RigorinVerification(Rand&Rust,2011,IJRM)
Documentation–Conceptualdesignandtheimplementedmodelshouldbedocumented.
ProgrammaticTesting–Testingofthecodeofthemodel.UnitTesting–Eachunitoffunctionalcodeisseparatelytested.CodeWalkthroughs–Thecodeisexaminedinagroupsetting.DebuggingWalkthroughs–Executionofthecodeissteppedthrough.FormalTesting–Proofofverificationusingformallogic.
TestCasesandScenarios–Withoutusingdata,modelfunctionsareexaminedtoseeiftheyoperateaccordingtotheconceptualmodel.CornerCases–Extremevaluesareexaminedtomakesurethemodeloperates
asexpected.SampledCases–Asubsetofparameterinputsareexaminedtodiscoverany
aberrantbehavior.SpecificScenarios–Specificinputsforwhichtheoutputsarealreadyknown.RelativeValueTesting–Examiningtherelationshipbetweeninputsandoutputs.
AVotingModel
Imaginethatyouareapproachedbyagroupofpoliticalscientistswhoknownothingaboutwritingagent-basedmodels,butwanttodevelopamodelabouthowpeoplechoosewhichcandidatetheywillvotefor.Theyassumepeoplehavesomeinitialcandidateinmind,butastheytalktotheirneighborstheymaychangetheirmind.
Flowcharts
PseudocodeVoters have color = {red, green}
For each voter set vote = random(1, 0)
Loop until election For each voter
If vote = 1 then color = red Else color = green
If count of neighbors’ votes >= 4 and vote = 0 then vote = 1 Else If count of neighbors’ votes <= 4 and vote = 1 then vote = 0
Display count of voters with vote = 1 Display count of voters with vote = 0 End loop
VerificationTesting
• Ifwehaveallagreedontheconceptualmodel,wecannowstartimplementingthemodel
• However,wewanttomakesurethatthemodelisdoingwhatwethinkitshouldbedoing
• Needtoimplementtests• Oftengoodpracticetowritethetestbeforeyouwritethecode
ComponentTesting
• Testindividualaspectsofthemodelthatshouldhaveparticularbehavior
• Allowsyoutoaddnewmechanisms,andthenreruntheteststoseeifthecodehasbroken
• Componenttestscanbeintendedtoberunon-oroff-line
ComponentTesting
• We can start by writing code to make sure that are setup routine is distributing votes evenly
to check-setup let diff abs ( count patches with [ vote = 0 ] - count
patches with [ vote = 1 ] ) if diff > .1 * count patches [ print "Difference in initial voters is greater than 10%." ] end
SensitivityAnalysis
• SensitivityAnalysis-examininghowsensitivearesultistoaparticularparameterofthemodel
• Relatedtoverificationbecauseweoftenhaveexpertiseabouthowmuchaparametershouldaffectanoutcome
• Forexample,examinethevotingmodeltoexplorehowthepercentageofinitialvotersaffectstheoutcome
Initial Green vs. Final Green
-20
0
20
40
60
80
100
25 30 35 40 45 50 55 60 65 70 75
Initial Percentage of Green Patches
Fin
al P
ercen
tag
e o
f G
reen
Patc
hes
LackofVerification
• Justbecauseamodeldoesnot“verify”doesnotmeanitiswrong
• Thereareseveralcausesformodelproblems1. Bugs2. MiscommunicaWon3. Theresultmightalsonotcorrespondto
yourconcepWons,butsWllbeacorrectresult--emergence
VerificationBenefits
• Aidsinmodelunderstanding• Eventhoughwehaveamodel,wheremicro-rulesleadtomacro-outcomes,weneedtounderstandhowandwhythathappens,i.e.,weneedagenerativeexplanation
• Asmodelsbecomeincreasinglycomplicated,verificationisdifficult,soverifyearlyandoften!
• Importanttorealizethatverificationisnotadichotomybutacontinuum,youcanalwaysverifymore
Validation
• Ensuringthattheimplementedmodelcorrespondstoreality
• Allmodelsaresimplificationsofreality• “Allmodelsarewrongbutsomeareuseful.”-GeorgeE.P.Box
Macro-vs.Micro-Results
• Macro-Validation• Comparingaggregateresultsatmanydifferentlevels
• Arethepatternsofadoptionsimilar?• Micro-Validation
• Comparingindividualrules• Doconsumersusethesameinformationwhendecidingwhetherornottoadopt?
• Comparingindividualproperties• Arethepropertiessimilar?
FaceValidationvs.EmpiricalValidation
• FaceValidation• Dothegeneralideasaboutbehaviorandpropertiescomparetoreal-worldphenomena?
• EmpiricalValidation• Doesdatafromthemodelcorrespondtoreal-worlddata?
RigorinValidation(Rand&Rust,2011,IJRM)
Micro-FaceValidation–Showingthattheelementsoftheimplementedmodelcorrespondtorealworldelements.
Macro-FaceValidation–Showingthattheprocessesandpatternsoftheimplementedmodelcorrespond“onface”torealworldprocessesandpatterns.
EmpiricalInputValidation–Showingthatthedatausedasinputstothemodelcorrespondstorealworlddataandfacts.
EmpiricalOutputValidation–Showingthattheoutputofthemodelcorrespondstorealworlddataandfacts.StylizedFacts/SubjectMatterExperts–Generallyknownpatternsofbehaviorthatareimportant
toreproduceinthemodel.RealWorldData–Recreatingrealworldresultsusingthemodel.Cross-Validation–Comparingthenewmodeltoapreviousmodelthathasalreadybeenshownto
bevalid.
BassModelComparison
ValidationandCalibration(StonedahlandRand,2012,WCSS)
ModelM
RealWorldR
Valida-on
ParametersP
TrainingDataRtrain
Tes-ngDataRtest
EnvironmentEtrain
EnvironmentEtest
Calibra-on
CalibrationasSearch
CalibraWonisfindingsomesetofparametersP*suchthatitminimizesanerrorfuncWon ε(Rtrain,M(P*,Etrain))
ValidaWoncanthenbecarriedoutbyassessingtheerrorfuncWon
ε(Rtest,M(P*,Etest))
Machinelearningcanbeusedforthesearchprocess,butwhaterrorfuncWontouse?
ErrorMeasures
• Corr–PearsonCorrelationCoefficient
• L0–Simplycountthenumberoftimesthetworesultsdiffer
• L1–ManhattanDistance• L2–EuclideanDistance• Linf–Chebyshev/MaximumDistance
BehaviorSearchAvailable at: www.behaviorsearch.org
StochasticityandValidation
• Stochasticitymeansthatyoucannotcompareonesetofresultsfromtherealworldandonesetofmodelresults
• Identifyingtheappropriatecomparisonmeasurecanbedifficultattimesandoftenisdeterminedinpartbythefieldandthetypeofdata
• Commonmeasuresincludeanyoftheerrormeasuresdiscussedaboveandmore
VariantandInvariantResults
• Sometimesitisusefultobreakamodelupintovariantvs.invariantresults
• Examiningmodeloutcomesthatoccurallthetimevs.modeloutcomesthatonlyoccursomeofthetime
• Forexample,diffusionwillalwaysexceedpeakbeforetimet,but50%ofthetimeitpeaksbeforetimet-n
• Thiscanhelptounderstandthestochasticityofyourmodel
PathDependency
• Variantvs.Invariantanalysiscanhelptounderstandpathdependency
• Pathdependencyiswheneventsthatoccurearlyinamodelrungreatlyaffecttheoutcome
• Ifamodelexhibitsalargenumberofvariantoutcomesthatmaybeduetopathdependencyortraditionalstochasticity
Brown et al., 2005, IJGIS
BenefitsandIssuesofValidation
• Validmodelshelpustounderstandtheworldaroundus
• Helpustounderstandfuturepossibilitiesabouthowthingsmightunfold(i.e.,flightsimulator)
• Notadichotomybutacontinuum• Canbecomeadeeplyphilosophicalissue,buttheimportantthingtorememberisthatamodelisvalidifthemodelmeetsvaliditystandardssetbytheexpectedaudience
Replication(seeWilenskyandRand,2007,“MakingModelsMatch”,JASSS)
Replication is the implementation (replicated model) by one scientist or group of scientists (model replicators) of a conceptual model described and already implemented (original model) by a scientist or group of scientists at a previous time (model builders)
Foundational concept within the scientific process Results must be replicated to be considered part of
the scientific knowledge
DimensionsofReplication
Replicationoccursacrossmanydifferentdimensions
1. Time2.Hardware3. Languages
4. Toolkits5. Algorithms6.Authors
ReplicationStandard
Beforeundertakingareplication,itisimportanttoestablishareplicationstandard(RS)
AnRSisacriteriathatspecifieswhenasuccessfulreplicationhasoccurred
ThreeGeneralCategoriesofRS(Axtelletal.,1996)NumericalIdentity
DistributionalEquivalence
RelationalAlignment
IssuesforModelReplicatorsCategories of Replication Standards: Numerical Identity, Distributional Equivalence,
Relational Alignment Focal Measures: Identify Particular measures used to meet goal Level of Communication: None, Brief Email Contact, Rich Discussion and Personal Meetings Familiarity with Language / Toolkit of Original Model: None, Surface Understanding, Have Built Other Models in this language / toolkit Examination of Source Code: None, Referred to for particular questions, Studied in-depth Exposure to Original Implemented Model: None, Run, Re-ran original experiments, Ran experiments other than original ones Exploration of Parameter Space: Only examined results from original paper, Examined other areas of the parameter space
IssuesforModelAuthorsLevel of Detail of Conceptual Model: Textual Description, Pseudo-code
Specification of Details of the Model: Order of events, Random vs. Non-random Activation
Model Authorship / Implementation: Who designed the model, who implemented the model, and how to contact them Availability of Model:
Results beyond that in the paper available, Binary available, Source Code available Sensitivity Analysis:
None, Few key parameters varied, All parameters varied, Design of Experiment Analysis
BenefitsofReplication
Thereplicatorgainsknowledge
Increasessharedunderstanding
Improvedverification:maypointoutproblemsintheoriginalmodel
Improvedvalidation:replicatorsareforcedtoconsiderthedifferencesbetweenthemodelandtherealworld
Unit7Overview
• Verification• ModelDescriptions
• ComponentTesting• SensitivityAnalysis
• Validation• Micro-vs.Macro-
• Facevs.Empirical
• Calibration
• Stochasticity
• Replication• Unit7Slides• CourseFeedback• Unit7Test