assessing the frequency of empirical evaluation in software modeling research
DESCRIPTION
Assessing the Frequency of Empirical Evaluation in Software Modeling Research. Workshop on Experiences and Empirical Studies in Software Modelling ( EESSMod ) October 17, 2011. Jeffrey C. Carver, Eugene Syriani and Jeff Gray (presenter) University of Alabama - PowerPoint PPT PresentationTRANSCRIPT
Assessing the Frequency ofEmpirical Evaluation in
Software Modeling Research
Workshop on Experiences and Empirical Studies in Software Modelling (EESSMod)
October 17, 2011
Jeffrey C. Carver, Eugene Syriani and Jeff Gray (presenter)
University of AlabamaDepartment of Computer Science{carver, esyriani, gray}@cs.ua.edu
2
Carver, Syriani, Gray Empirical Evaluation at MoDELS
Background• Many creative modeling ideas
• Impression that the field has not followed the traditional Scientific Method
• Most new techniques are not (thoroughly) evaluated
• Investigate the prevalence of this phenomenon– Considered MODELS papers from 2006-2010– Also considered papers from empirical conference (ESEM)
3
Carver, Syriani, Gray Empirical Evaluation at MoDELS
Background: Empirical Studies
The understanding of a discipline evolves over time We get more sophisticated in our methods We are able to test and prove or disprove hypotheses
The empirical paradigm has been used in many other fields, e.g., physics, medicine, manufacturing
Understanding a Discipline
BuildingModels
application domain, workflows,
problem solving processes
Checking Understanding
testing models, experimenting in the real
world
Analyzing Results
learn, encapsulate knowledge
and refine models
Evolving Models
“models” used moregenerally on this slide
5
Carver, Syriani, Gray Empirical Evaluation at MoDELS
Empirical Studies: Misconceptions
• Empirical studies are not “one-shot deals.” Studies on live development projects are not the only ones that matter.
• Software engineering is a laboratory science– Understanding our discipline involves
• Observation, reflection, model building, experimentation• Followed by iteration
– Symbiotic nature of research and development• Research needs laboratories to observe & manipulate variables• Development needs to understand how to build systems better
6
Carver, Syriani, Gray Empirical Evaluation at MoDELS
Empirical Studies: Misconceptions• Overall purpose
• “We ran a study of technology X and now we know…”– Technology X doesn’t work (NO)– Technology X performed worse than technology Y in our environment (YES)
• “Environment” includes people & their expertise, project goals, etc.• Measuring performance implies we decided on some metric that we felt was
an important indicator– No solution is really expected to be better for all users under all conditions
Yes/No Certification of a technology
Assist in evolution
Find appropriate environment
Yield insightsand answers
7
Carver, Syriani, Gray Empirical Evaluation at MoDELS
Empirical Studies: Outputs
• Empirical study can help to provide information of interest to teams that might eventually adopt a technology:– Does it work better for certain types of people?
• Novices: It’s a good solution for training• Experts: Users need certain background knowledge…
– Does it work better for certain types of systems?• Static/dynamic aspects, complexity• Familiar/unfamiliar domains
– Does it work better in certain development environments?• Users [did/didn’t] have the right documentation, knowledge, amount
of time, etc… to use itShull, 2004
8
Carver, Syriani, Gray Empirical Evaluation at MoDELS
Our Objective and Methodology
• Goal: Determine how many recent modeling papers had some type of empirical evaluation of their claims
• Three step methodology– Develop initial characterization scheme– Identify candidate papers– Review candidate papers and finalize
characterization
9
Carver, Syriani, Gray Empirical Evaluation at MoDELS
Characterization SchemeType Empirical
EvaluationInvolved Human
ParticipantsComparison against
other Methods1. No evaluation X X X
2. Non-human, proposed tool only
X X
3. Non-human, comparison
X
4. Human observation
X
5. Human-based Controlled Experiment
Formative Case Studies: Papers gather information about use of technique in practice
10
Carver, Syriani, Gray Empirical Evaluation at MoDELS
ResultsYear Total No Eval Non-Human Human-Based
No Comparison
Comparison Observation Controlled Experiment
Formative Case Study
2006 51 42 (82%) 6 (12%) 0 (0%) 1 (2%) 1 (2%) 1 (2%)
2007 45 36 (80%) 2 (4%) 5 (11% 0 (0%) 2 (4%) 0 (0%)
2008 58 39 (67%) 8 (14%) 2 (3%) 2 (3%) 4 (7%) 3 (5%)
2009 58 45 (78%) 5 (9%) 2 (3%) 2 (3%) 1 (2%) 3 (5%)
2010 54 33 (61%) 8 (15%) 4 (7%) 2 (4%) 4 (7%) 3 (6%)
Total 266 195 (73%) 29 (11%) 13 (5%) 7 (3%) 12 (5%) 10 (3%)
11
Carver, Syriani, Gray Empirical Evaluation at MoDELS
73%
11%
5%3%
5% 4%
No EvaluationNo ComparisonComparisonObservationControlled ExperimentFormative Case Study
Results – Summary from 2006-2010
17%
12
Carver, Syriani, Gray Empirical Evaluation at MoDELS
Results - Trends
2006 2007 2008 2009 20100%
10%
20%
30%
40%
50%
60%
70%
80%
90%
No Evaluation No Comparison ComparisonObservation Controlled Experiment Formative Case Study
13
Carver, Syriani, Gray Empirical Evaluation at MoDELS
Results:Human-Based Controlled Experiments
• Total of 12 in 5 years! Should be more
• Observations– Generally, low level of detail reported– Most had less than 25 participants
• 2 had over 50, 1 did not even report the number– Most participants were undergraduate students– General misunderstanding in many papers by
equating “discussion” to “evaluation”
14
Carver, Syriani, Gray Empirical Evaluation at MoDELS
Results:Formative Case Studies
• Total of 10, need to see more
• 4 did not involve humans– Analyze existing source code to understand how various modeling
tools would/would not work
• 6 involved humans– Surveys to understand how existing tools were not meeting
developer needs
• Generally, a study of output requirements for needed tools
15
Carver, Syriani, Gray Empirical Evaluation at MoDELS
Results:ESEM Focus
• The ESEM conference has three types of papers: Regular Papers, Short Papers, and Posters
• Across the same 5 year period, we only found 17 modeling papers– Of those 17 papers, only 4 were Regular Papers (10 pages
IEEE or ACM format) out of 178 Regular candidates– 10 were Short Papers (4 pages) out of a total of 118 Short
Papers– 3 of the papers were Poster summaries
• Even with the empirical area, modeling papers are not very well represented (typically, just short papers)
16
Carver, Syriani, Gray Empirical Evaluation at MoDELS
Conclusions• Summary:
– Rigor of empirically validated research in software modeling is weak– Very large percentage of papers with no evaluation– Did not include technical reports or extended publication in a journal– Plan to repeat analysis with SoSym– Would like to push the community to conduct more empirical
evaluations– Paper has URLs pointing to the data from our observations
• Recommendations:– Team up with empirical researchers– Venues need to provide additional space for reporting empirical
results (e.g., 2 extra pages in paper length for those papers that have a clear evaluation)
17
Carver, Syriani, Gray Empirical Evaluation at MoDELS
Questions or comments?