research in empirical software eng. reduced-parameter modeling (rpm) for cost estimation models...

Post on 22-Dec-2015

223 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Research in E

mpirical S

oftware E

ng.R

esearch in Em

pirical Softw

are Eng.

Reduced-Parameter Modeling (RPM) for Cost Estimation Models

Zhihao Chen

zhihaoch@cse.usc.edu

2

Research in E

mpirical S

oftware E

ng.R

esearch in Em

pirical Softw

are Eng.

Reduced-Parameter Modeling (RPM)

What Is RPM?

How Does It Work?

Why Is It Useful?

What Should You Not Use It?

3

Research in E

mpirical S

oftware E

ng.R

esearch in Em

pirical Softw

are Eng.

What is RPM?

• A machine learning technique for determining a minimum-essential set of cost model parameters

• Using an organization’s particular project data points

• Assuming that the organization’s project data points will be representative of its future projects

4

Research in E

mpirical S

oftware E

ng.R

esearch in Em

pirical Softw

are Eng.

Why Is It Useful?

• Simplifies cost model usage and data collection

• Often improves estimation accuracy– Eliminates highly-correlated, weak-

dispersion, or noisy-data parameters

• Identifies organization’s most important cost drivers for productivity improvement

5

Research in E

mpirical S

oftware E

ng.R

esearch in Em

pirical Softw

are Eng.

Organizations Have Different Data Distributions

Correlation Analysis of COCOMO81 63 Projects

Correlation Analysis of NASA Project02 22 Projects

6

Research in E

mpirical S

oftware E

ng.R

esearch in Em

pirical Softw

are Eng.

Under-sampling: A Case Study for CPLX in NASA 60

If the even higher complexity projects were the most important ones to NASA, redefine the complexity for the highly complex NASA systems.

Is software complexity a useful cost driver in this domain?

•In NASA60 data set, CPLX=high (usually);

•Little information in this parameter

•Consider dropping the parameter

2

5

50

2 1

0

5

10

15

20

25

30

35

40

45

50

Number

Low Nomi nal Hi gh Very_Hi gh Ext ra_Hi gh

CPLX i n 60 NASA COCOMO I proj ects

LowNomi nalHi ghVery_Hi ghExt ra_Hi gh

7

Research in E

mpirical S

oftware E

ng.R

esearch in Em

pirical Softw

are Eng.

How Does It Work – Technically?

• Organization collects critical mass of similar project data

• RPM tool starts with Size, tests which additional parameter produces most accurate estimates– By calibrating many times to random

data subsets, testing on holdout data points

• RPM tool continues to add next best parameters until accuracy starts to decrease– This produces best RPM for the data

set

8

Research in E

mpirical S

oftware E

ng.R

esearch in Em

pirical Softw

are Eng.

Real and Large Industry Data• Research is supported by CSE and NASA/JPL

• Two datasets are public and available from PROMISE Software Engineering Repository - http://promise.site.uottawa.ca/– 63 projects in Cocomo81/Software cost estimation– 60 projects NASA/Software cost estimation

• Two datasets from COCOMO II database– 161 projects in COCOMO II 2000– 119 projects in COCOMO II 2004

• More data are coming– 30 more projects from JPL

• The techniques can be applied and basic results generalized to any model

9

Research in E

mpirical S

oftware E

ng.R

esearch in Em

pirical Softw

are Eng.

Exampl e Resul t from NASA Proj ect02 dateset

0.00

20.00

40.00

60.00

80.00

100.00

LOC FS01 FS02 FS03 FS04 AllParameterSubset

Perc

enta

ge

MeanSD

Example Result

TURNLEXP

TI MEMODP

DATATOOLSCED

RELYVEXPCPLX

AEXPPCAPVI RTACAPSTOR

LOC FS01 FS02 FS03 FS04 Al lMean 85.24 92.86 97.14 94.76 84.76 15. 71SD 10.93 11.10 6.92 8.78 12. 40 12. 07

10

Research in E

mpirical S

oftware E

ng.R

esearch in Em

pirical Softw

are Eng.

What Should You Not Use It

• Do not subtract the parameters are important.– In many domains, expert business

users hold in their head more knowledge than might be available in historical databases

• Do not subtract parameter you still might need them. – User needs some of the subtracted

parameters to make a business decision.

11

Research in E

mpirical S

oftware E

ng.R

esearch in Em

pirical Softw

are Eng.

Published Results

• Chen, Menzies, Port, and Boehm. "Finding the Right Data for Software Cost Modeling", IEEE Software 11/2005.

• Menzies, Port, Chen, and Hihn. "Specialization and Extrapolation of Software Cost Models", ASE 2005, Long Beach, California, 11/2005.

• Menzies, Port, Chen, Hihn, and Stukes. "Validation Methods for Calibration Software Effort Models", ICSE 2005, 05/2005, St. Louis, Missouri

• Yang, Chen, Valerdi, and Boehm. "Effect of Schedule Compression on Project Effort", ISPA 2005, 06/2005, Denver, Colorado

• Chen, Menzies, Port, and Boehm. "Feature Subset Selection Can Improve Software Cost Estimation Accuracy", PROMISE 2005, 05/2005, St. Louis, Missouri

• Menzies, Chen, Port, and Hihn. "Simple Software Cost Analysis: Safe or Unsafe?", PROMISE 2005, 05/2005, St. Louis, Missouri 

Some results have been recently published on the use of data mining and machine learning techniques to analyze cost estimation models and data

All papers are available from http://www.ssei.org/chen/papers/papers.html

12

Research in E

mpirical S

oftware E

ng.R

esearch in Em

pirical Softw

are Eng.

Question and Answer

top related