research in empirical software eng. reduced-parameter modeling (rpm) for cost estimation models...

12
Research in Empirical Software Research in Empirical Software Eng. Eng. Reduced-Parameter Modeling (RPM) for Cost Estimation Models Zhihao Chen [email protected]

Post on 22-Dec-2015

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Research in Empirical Software Eng. Reduced-Parameter Modeling (RPM) for Cost Estimation Models Zhihao Chen zhihaoch@cse.usc.edu

Research in E

mpirical S

oftware E

ng.R

esearch in Em

pirical Softw

are Eng.

Reduced-Parameter Modeling (RPM) for Cost Estimation Models

Zhihao Chen

[email protected]

Page 2: Research in Empirical Software Eng. Reduced-Parameter Modeling (RPM) for Cost Estimation Models Zhihao Chen zhihaoch@cse.usc.edu

2

Research in E

mpirical S

oftware E

ng.R

esearch in Em

pirical Softw

are Eng.

Reduced-Parameter Modeling (RPM)

What Is RPM?

How Does It Work?

Why Is It Useful?

What Should You Not Use It?

Page 3: Research in Empirical Software Eng. Reduced-Parameter Modeling (RPM) for Cost Estimation Models Zhihao Chen zhihaoch@cse.usc.edu

3

Research in E

mpirical S

oftware E

ng.R

esearch in Em

pirical Softw

are Eng.

What is RPM?

• A machine learning technique for determining a minimum-essential set of cost model parameters

• Using an organization’s particular project data points

• Assuming that the organization’s project data points will be representative of its future projects

Page 4: Research in Empirical Software Eng. Reduced-Parameter Modeling (RPM) for Cost Estimation Models Zhihao Chen zhihaoch@cse.usc.edu

4

Research in E

mpirical S

oftware E

ng.R

esearch in Em

pirical Softw

are Eng.

Why Is It Useful?

• Simplifies cost model usage and data collection

• Often improves estimation accuracy– Eliminates highly-correlated, weak-

dispersion, or noisy-data parameters

• Identifies organization’s most important cost drivers for productivity improvement

Page 5: Research in Empirical Software Eng. Reduced-Parameter Modeling (RPM) for Cost Estimation Models Zhihao Chen zhihaoch@cse.usc.edu

5

Research in E

mpirical S

oftware E

ng.R

esearch in Em

pirical Softw

are Eng.

Organizations Have Different Data Distributions

Correlation Analysis of COCOMO81 63 Projects

Correlation Analysis of NASA Project02 22 Projects

Page 6: Research in Empirical Software Eng. Reduced-Parameter Modeling (RPM) for Cost Estimation Models Zhihao Chen zhihaoch@cse.usc.edu

6

Research in E

mpirical S

oftware E

ng.R

esearch in Em

pirical Softw

are Eng.

Under-sampling: A Case Study for CPLX in NASA 60

If the even higher complexity projects were the most important ones to NASA, redefine the complexity for the highly complex NASA systems.

Is software complexity a useful cost driver in this domain?

•In NASA60 data set, CPLX=high (usually);

•Little information in this parameter

•Consider dropping the parameter

2

5

50

2 1

0

5

10

15

20

25

30

35

40

45

50

Number

Low Nomi nal Hi gh Very_Hi gh Ext ra_Hi gh

CPLX i n 60 NASA COCOMO I proj ects

LowNomi nalHi ghVery_Hi ghExt ra_Hi gh

Page 7: Research in Empirical Software Eng. Reduced-Parameter Modeling (RPM) for Cost Estimation Models Zhihao Chen zhihaoch@cse.usc.edu

7

Research in E

mpirical S

oftware E

ng.R

esearch in Em

pirical Softw

are Eng.

How Does It Work – Technically?

• Organization collects critical mass of similar project data

• RPM tool starts with Size, tests which additional parameter produces most accurate estimates– By calibrating many times to random

data subsets, testing on holdout data points

• RPM tool continues to add next best parameters until accuracy starts to decrease– This produces best RPM for the data

set

Page 8: Research in Empirical Software Eng. Reduced-Parameter Modeling (RPM) for Cost Estimation Models Zhihao Chen zhihaoch@cse.usc.edu

8

Research in E

mpirical S

oftware E

ng.R

esearch in Em

pirical Softw

are Eng.

Real and Large Industry Data• Research is supported by CSE and NASA/JPL

• Two datasets are public and available from PROMISE Software Engineering Repository - http://promise.site.uottawa.ca/– 63 projects in Cocomo81/Software cost estimation– 60 projects NASA/Software cost estimation

• Two datasets from COCOMO II database– 161 projects in COCOMO II 2000– 119 projects in COCOMO II 2004

• More data are coming– 30 more projects from JPL

• The techniques can be applied and basic results generalized to any model

Page 9: Research in Empirical Software Eng. Reduced-Parameter Modeling (RPM) for Cost Estimation Models Zhihao Chen zhihaoch@cse.usc.edu

9

Research in E

mpirical S

oftware E

ng.R

esearch in Em

pirical Softw

are Eng.

Exampl e Resul t from NASA Proj ect02 dateset

0.00

20.00

40.00

60.00

80.00

100.00

LOC FS01 FS02 FS03 FS04 AllParameterSubset

Perc

enta

ge

MeanSD

Example Result

TURNLEXP

TI MEMODP

DATATOOLSCED

RELYVEXPCPLX

AEXPPCAPVI RTACAPSTOR

LOC FS01 FS02 FS03 FS04 Al lMean 85.24 92.86 97.14 94.76 84.76 15. 71SD 10.93 11.10 6.92 8.78 12. 40 12. 07

Page 10: Research in Empirical Software Eng. Reduced-Parameter Modeling (RPM) for Cost Estimation Models Zhihao Chen zhihaoch@cse.usc.edu

10

Research in E

mpirical S

oftware E

ng.R

esearch in Em

pirical Softw

are Eng.

What Should You Not Use It

• Do not subtract the parameters are important.– In many domains, expert business

users hold in their head more knowledge than might be available in historical databases

• Do not subtract parameter you still might need them. – User needs some of the subtracted

parameters to make a business decision.

Page 11: Research in Empirical Software Eng. Reduced-Parameter Modeling (RPM) for Cost Estimation Models Zhihao Chen zhihaoch@cse.usc.edu

11

Research in E

mpirical S

oftware E

ng.R

esearch in Em

pirical Softw

are Eng.

Published Results

• Chen, Menzies, Port, and Boehm. "Finding the Right Data for Software Cost Modeling", IEEE Software 11/2005.

• Menzies, Port, Chen, and Hihn. "Specialization and Extrapolation of Software Cost Models", ASE 2005, Long Beach, California, 11/2005.

• Menzies, Port, Chen, Hihn, and Stukes. "Validation Methods for Calibration Software Effort Models", ICSE 2005, 05/2005, St. Louis, Missouri

• Yang, Chen, Valerdi, and Boehm. "Effect of Schedule Compression on Project Effort", ISPA 2005, 06/2005, Denver, Colorado

• Chen, Menzies, Port, and Boehm. "Feature Subset Selection Can Improve Software Cost Estimation Accuracy", PROMISE 2005, 05/2005, St. Louis, Missouri

• Menzies, Chen, Port, and Hihn. "Simple Software Cost Analysis: Safe or Unsafe?", PROMISE 2005, 05/2005, St. Louis, Missouri 

Some results have been recently published on the use of data mining and machine learning techniques to analyze cost estimation models and data

All papers are available from http://www.ssei.org/chen/papers/papers.html

Page 12: Research in Empirical Software Eng. Reduced-Parameter Modeling (RPM) for Cost Estimation Models Zhihao Chen zhihaoch@cse.usc.edu

12

Research in E

mpirical S

oftware E

ng.R

esearch in Em

pirical Softw

are Eng.

Question and Answer