assessing the frequency of empirical evaluation in software modeling research

16
Assessing the Frequency of Empirical Evaluation in Software Modeling Research Workshop on Experiences and Empirical Studies in Software Modelling (EESSMod) October 17, 2011 rey C. Carver, Eugene Syriani and Jeff Gray (presen University of Alabama Department of Computer Science {carver, esyriani, gray}@cs.ua.edu

Upload: louise

Post on 23-Mar-2016

39 views

Category:

Documents


0 download

DESCRIPTION

Assessing the Frequency of Empirical Evaluation in Software Modeling Research. Workshop on Experiences and Empirical Studies in Software Modelling ( EESSMod ) October 17, 2011. Jeffrey C. Carver, Eugene Syriani and Jeff Gray (presenter) University of Alabama - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Assessing the Frequency  of Empirical  Evaluation in  Software  Modeling Research

Assessing the Frequency ofEmpirical Evaluation in

Software Modeling Research

Workshop on Experiences and Empirical Studies in Software Modelling (EESSMod)

October 17, 2011

Jeffrey C. Carver, Eugene Syriani and Jeff Gray (presenter)

University of AlabamaDepartment of Computer Science{carver, esyriani, gray}@cs.ua.edu

Page 2: Assessing the Frequency  of Empirical  Evaluation in  Software  Modeling Research

2

Carver, Syriani, Gray Empirical Evaluation at MoDELS

Background• Many creative modeling ideas

• Impression that the field has not followed the traditional Scientific Method

• Most new techniques are not (thoroughly) evaluated

• Investigate the prevalence of this phenomenon– Considered MODELS papers from 2006-2010– Also considered papers from empirical conference (ESEM)

Page 3: Assessing the Frequency  of Empirical  Evaluation in  Software  Modeling Research

3

Carver, Syriani, Gray Empirical Evaluation at MoDELS

Background: Empirical Studies

The understanding of a discipline evolves over time We get more sophisticated in our methods We are able to test and prove or disprove hypotheses

The empirical paradigm has been used in many other fields, e.g., physics, medicine, manufacturing

Understanding a Discipline

BuildingModels

application domain, workflows,

problem solving processes

Checking Understanding

testing models, experimenting in the real

world

Analyzing Results

learn, encapsulate knowledge

and refine models

Evolving Models

“models” used moregenerally on this slide

Page 4: Assessing the Frequency  of Empirical  Evaluation in  Software  Modeling Research

5

Carver, Syriani, Gray Empirical Evaluation at MoDELS

Empirical Studies: Misconceptions

• Empirical studies are not “one-shot deals.” Studies on live development projects are not the only ones that matter.

• Software engineering is a laboratory science– Understanding our discipline involves

• Observation, reflection, model building, experimentation• Followed by iteration

– Symbiotic nature of research and development• Research needs laboratories to observe & manipulate variables• Development needs to understand how to build systems better

Page 5: Assessing the Frequency  of Empirical  Evaluation in  Software  Modeling Research

6

Carver, Syriani, Gray Empirical Evaluation at MoDELS

Empirical Studies: Misconceptions• Overall purpose

• “We ran a study of technology X and now we know…”– Technology X doesn’t work (NO)– Technology X performed worse than technology Y in our environment (YES)

• “Environment” includes people & their expertise, project goals, etc.• Measuring performance implies we decided on some metric that we felt was

an important indicator– No solution is really expected to be better for all users under all conditions

Yes/No Certification of a technology

Assist in evolution

Find appropriate environment

Yield insightsand answers

Page 6: Assessing the Frequency  of Empirical  Evaluation in  Software  Modeling Research

7

Carver, Syriani, Gray Empirical Evaluation at MoDELS

Empirical Studies: Outputs

• Empirical study can help to provide information of interest to teams that might eventually adopt a technology:– Does it work better for certain types of people?

• Novices: It’s a good solution for training• Experts: Users need certain background knowledge…

– Does it work better for certain types of systems?• Static/dynamic aspects, complexity• Familiar/unfamiliar domains

– Does it work better in certain development environments?• Users [did/didn’t] have the right documentation, knowledge, amount

of time, etc… to use itShull, 2004

Page 7: Assessing the Frequency  of Empirical  Evaluation in  Software  Modeling Research

8

Carver, Syriani, Gray Empirical Evaluation at MoDELS

Our Objective and Methodology

• Goal: Determine how many recent modeling papers had some type of empirical evaluation of their claims

• Three step methodology– Develop initial characterization scheme– Identify candidate papers– Review candidate papers and finalize

characterization

Page 8: Assessing the Frequency  of Empirical  Evaluation in  Software  Modeling Research

9

Carver, Syriani, Gray Empirical Evaluation at MoDELS

Characterization SchemeType Empirical

EvaluationInvolved Human

ParticipantsComparison against

other Methods1. No evaluation X X X

2. Non-human, proposed tool only

X X

3. Non-human, comparison

X

4. Human observation

X

5. Human-based Controlled Experiment

Formative Case Studies: Papers gather information about use of technique in practice

Page 9: Assessing the Frequency  of Empirical  Evaluation in  Software  Modeling Research

10

Carver, Syriani, Gray Empirical Evaluation at MoDELS

ResultsYear Total No Eval Non-Human Human-Based

No Comparison

Comparison Observation Controlled Experiment

Formative Case Study

2006 51 42 (82%) 6 (12%) 0 (0%) 1 (2%) 1 (2%) 1 (2%)

2007 45 36 (80%) 2 (4%) 5 (11% 0 (0%) 2 (4%) 0 (0%)

2008 58 39 (67%) 8 (14%) 2 (3%) 2 (3%) 4 (7%) 3 (5%)

2009 58 45 (78%) 5 (9%) 2 (3%) 2 (3%) 1 (2%) 3 (5%)

2010 54 33 (61%) 8 (15%) 4 (7%) 2 (4%) 4 (7%) 3 (6%)

Total 266 195 (73%) 29 (11%) 13 (5%) 7 (3%) 12 (5%) 10 (3%)

Page 10: Assessing the Frequency  of Empirical  Evaluation in  Software  Modeling Research

11

Carver, Syriani, Gray Empirical Evaluation at MoDELS

73%

11%

5%3%

5% 4%

No EvaluationNo ComparisonComparisonObservationControlled ExperimentFormative Case Study

Results – Summary from 2006-2010

17%

Page 11: Assessing the Frequency  of Empirical  Evaluation in  Software  Modeling Research

12

Carver, Syriani, Gray Empirical Evaluation at MoDELS

Results - Trends

2006 2007 2008 2009 20100%

10%

20%

30%

40%

50%

60%

70%

80%

90%

No Evaluation No Comparison ComparisonObservation Controlled Experiment Formative Case Study

Page 12: Assessing the Frequency  of Empirical  Evaluation in  Software  Modeling Research

13

Carver, Syriani, Gray Empirical Evaluation at MoDELS

Results:Human-Based Controlled Experiments

• Total of 12 in 5 years! Should be more

• Observations– Generally, low level of detail reported– Most had less than 25 participants

• 2 had over 50, 1 did not even report the number– Most participants were undergraduate students– General misunderstanding in many papers by

equating “discussion” to “evaluation”

Page 13: Assessing the Frequency  of Empirical  Evaluation in  Software  Modeling Research

14

Carver, Syriani, Gray Empirical Evaluation at MoDELS

Results:Formative Case Studies

• Total of 10, need to see more

• 4 did not involve humans– Analyze existing source code to understand how various modeling

tools would/would not work

• 6 involved humans– Surveys to understand how existing tools were not meeting

developer needs

• Generally, a study of output requirements for needed tools

Page 14: Assessing the Frequency  of Empirical  Evaluation in  Software  Modeling Research

15

Carver, Syriani, Gray Empirical Evaluation at MoDELS

Results:ESEM Focus

• The ESEM conference has three types of papers: Regular Papers, Short Papers, and Posters

• Across the same 5 year period, we only found 17 modeling papers– Of those 17 papers, only 4 were Regular Papers (10 pages

IEEE or ACM format) out of 178 Regular candidates– 10 were Short Papers (4 pages) out of a total of 118 Short

Papers– 3 of the papers were Poster summaries

• Even with the empirical area, modeling papers are not very well represented (typically, just short papers)

Page 15: Assessing the Frequency  of Empirical  Evaluation in  Software  Modeling Research

16

Carver, Syriani, Gray Empirical Evaluation at MoDELS

Conclusions• Summary:

– Rigor of empirically validated research in software modeling is weak– Very large percentage of papers with no evaluation– Did not include technical reports or extended publication in a journal– Plan to repeat analysis with SoSym– Would like to push the community to conduct more empirical

evaluations– Paper has URLs pointing to the data from our observations

• Recommendations:– Team up with empirical researchers– Venues need to provide additional space for reporting empirical

results (e.g., 2 extra pages in paper length for those papers that have a clear evaluation)

Page 16: Assessing the Frequency  of Empirical  Evaluation in  Software  Modeling Research

17

Carver, Syriani, Gray Empirical Evaluation at MoDELS

Questions or comments?