cs 568 spring 10 lecture 5 estimation
DESCRIPTION
Estimation insightsTRANSCRIPT
Lecture 5 EstimationEstimate size, thenEstimate effort, schedule and cost from size &
complexity
CS 568
Project Metrics
Cost and schedule estimation Measure progress Calibrate models for future estimating Metric/Scope Manager Product
Number of projects x number of metrics = 15-20
Approaches to Cost Estimation
• By experts• By analogies• Decomposition• Parkinson’s Law; work expands to fill time• Pricing to win: customer willingness to pay• Lines of Code• Function Points• Mathematical Models: Function Points &
COCOMO
Time
Staff-month
Ttheoretical
75% * Ttheoretical
Impossible design
Linear increase
Boehm: “A project can not be done in less than 75% of theoretical time”
Ttheoretical = 2.5 * 3√staff-months
But, how can I estimate staff months?
PERT estimation
Mean Schedule date = [ earliest date +
4 likely date +
latest date] /6
Variance = [latest date –earliest date]/6
This is a β distribution.
Example
If min = 10 months
mode = 13.5 months
max = 20 months, then
Mean = 14 months
Std. Dev = 1.67 months
Probability Distributions
See www.brighton-webs.co.uk/distributions/beta.asp
Beta Triangular
Mean 14.00 14.50
Mode 13.65 13.5
Standard Deviation
1.67 2.07
Q1 (25%) 12.75 12.96
Q2 (50% - Median)
13.91 14.30
Q3 (75%) 15.17 15.97
The mean, mode and standard deviation in the above table are derived from the minimum, maximum and shape factors which resulted from the use of the PERT approximations.
Sizing Software Projects
Effort = (productivity)-1 (size)c
productivity ≡ staff-months/ksloc
size ≡ ncksloc
c is a function of staff skillsStaff
months
Lines of Code or
Function Points
500
Understanding the equations
Consider a transaction project of 38,000 lines of code, what is the shortest time it will take to develop? Module development is about 400 SLOC/staff month
Effort = (productivity)-1 (size)c
= (1/.400 KSLOC/SM) (38 KSLOC)1.02
= 2.5 (38)1.02 ≈ 100 SMMin time = .75 T= (.75)(2.5)(SM)1/3
≈ 1.875(100)1/3
≈ 1.875 x 4.63 ≈ 9 months
How many software engineers?
1 full time staff week = 60 hours, half spent on project (30 hours)
1 student week = 20 hours. Therefore, an estimation of 100 staff months is
actually 150 student months. 150 staff months/5 months/semester = 30 student
software engineers, therefore simplification is mandatory
Lines of Code
LOC ≡ Line of Code KLOC ≡ Thousands of LOC KSLOC ≡ Thousands of Source LOC NCSLOC ≡ New or Changed KSLOC
Productivity per staff-month:» 50 NCSLOC for OS code (or real-time system)
» 250-500 NCSLOC for intermediary applications (high risk, on-line)
» 500-1000 NCSLOC for normal applications (low risk, on-line)
» 10,000 – 20,000 NCSLOC for reused code
Reuse note: Sometimes, reusing code that does not provide the exact functionality needed can be achieved by reformatting input/output. This decreases performance but dramatically shortens development time.
Bernstein’s rule of thumb for small components
“Productivity” as measured in 2000
Classical rates 130 – 195 NCSLOC/sm
Evolutionary or Incremental approaches (customized)
244 – 325 NCSLOC/sm
New embedded flight software (customized)
Reused Code
17 – 105 NCSLOC/sm
1000-2000 NCSLOC/sm
Code for reuse 3 x code for customized
QSE Lambda Protocol
Prospectus Measurable Operational Value Prototyping or Modeling sQFD Schedule, Staffing, Quality Estimates ICED-T Trade-off Analysis
Universal Software Engineering Equation
Reliability (t) = ℮ -k t
when the error rate is constant and where k is a normalizing constant for a software shop and
= Complexity/ effectiveness x staffing
Post-Release Reliability Growth in Software Products
Author: Pankaj Jalote ,Brendan Murphy, Vibhu Saujanya Sharma
Guided By: Prof. Lawrence BernsteinPrepared By: Mautik Shah
Introduction
The failure rate of software products decreases with time, even when there no software changes are being made.
This violates our intuition where there is a growth in reliability without any fault removal.
Modeling this reliability growth in the initial stages after product release is the focus of this paper.
Three possible reasons:
Users learn to avoids faults that cause failure and a failure is never random.
After Initially exploring many different features and options, users choose a small set of product features, thereby reducing the number of fault carrying paths that are actually exercised.
Installing new software onto existing systems often results in versioning and configuration issues which cause failures.
Failure rate model
Using product support data
Using data from Automated Reporting
Product stabilization time
Stabilization time indicates the product’s transient defects as well as the user experience.
A smaller value of stabilization time means that the end users will have fewer troubles.
If the steady state failure rate of a product is acceptable, then instead of investing in system testing the vendor may need to focus on improving issues related to installation, configuration, usage, etc. to reduce stabilization time
A high stabilization time will require a different strategy for improving the user experience than is needed for dealing with a high steady state failure rate of a product.
.
Conclusion
Traditional software reliability models generally assume that software reliability is primarily a function of the fault content and remains unchanged if the software is unchanged. But, the failure rate often gets smaller with time, even without any changes being made to the product. T
This may be due to users learning to avoid the situations that cause failures, using a limited amount of features functionality or resolving configuration issues, etc.
Stabilization time is the time it takes after installation for the failure rate to reach its steady state value.
For an organization which plans to have its employees use a software product, the stabilization time could indicate the period after which the organization could expect the production usage of the product.
Derivation of Reliability Equation valid after the stabilization intereval.
Let T be the stabilization time, then g(T) is some constant failure rate, F.
. To convert from a rate to a time function we need to intergrate the Fourier transform:
R(t-T) = ∫ g(ω) exp(-λ(t-T)) from o to ∞,
With g(w) is a constant F and τ= t-T
R(τ)= F exp(- λτ) and
λ = complexity/effective staffing
Function Point (FP) Analysis
Useful during requirement phase Substantial data supports the methodology Software skills and project characteristics are accounted
for in the Adjusted Function Points FP is technology and project process dependent so that
technology changes require recalibration of project models.
Converting Unadjusted FPs (UFP) to LOC for a specific language (technology) and then use a model such as COCOMO.
(start here)
0
2
4
6
8
10
12
20 40 80 160 320 640 1280 2560 5120 10240 20480 40960
Function Points
Bell Laboratories data
Capers Jones data
Prod
uctiv
ity (F
unct
ion
poin
ts /
staf
f mon
th)
Productivity= f (size)
Adjusted Function Points
Accounting for Physical System Characteristics
Characteristic Rated by System User
• 0-5 based on “degree of influence”
• 3 is average
UnadjustedFunction
Points (UFP)
UnadjustedFunction
Points (UFP)
General SystemCharacteristics
(GSC)
General SystemCharacteristics
(GSC)
X
=
AdjustedFunction
Points (AFP)
AdjustedFunction
Points (AFP)
AFP = UFP (0.65 + .01*GSC), note GSC = VAF= TDI
1. Data Communications
2. Distributed Data/Processing
3. Performance Objectives
4. Heavily Used Configuration
5. Transaction Rate
6. On-Line Data Entry
7. End-User Efficiency
8. On-Line Update
9. Complex Processing
10. Reusability
11. Conversion/Installation Ease
12. Operational Ease
13. Multiple Site Use
14. Facilitate Change
Function Point Calculations
Unadjusted Function Points
UFP= 4I + 5O + 4E + 10L + 7F, Where
I ≡ Count of input types that are user inputs and change data structures. O ≡ Count of output typesE ≡ Count of inquiry types or inputs controlling execution.
[think menu selections]L ≡ Count of logical internal files, internal data used by system
[think index files; they are group of logically related data entirely within the applications boundary and maintained by external inputs. ]
F ≡ Count of interfaces data output or shared with another application
Note that the constants in the nominal equation can be calibrated to a specific software product line.
Complexity Table
TYPE: SIMPLE AVERAGE COMPLEX
INPUT (I) 3 4 6
OUTPUT(O) 4 5 7
INQUIRY(E) 3 4 6
LOG INT (L) 7 10 15
INTERFACES (F)
5 7 10
Complexity Factors
1. Problem Domain ___2. Architecture Complexity ___3. Logic Design -Data ___4. Logic Design- Code ___
Total ___
Complexity = Total/4 = _________
Problem Domain Measure of Complexity (1 is simple and 5 is complex)
1. All algorithms and calculations are simple.2. Most algorithms and calculations are simple.3. Most algorithms and calculations are moderately
complex.4. Some algorithms and calculations are difficult.5. Many algorithms and calculations are difficult.
Score ____
Architecture ComplexityMeasure of Complexity (1 is simple and 5 is complex)
1. Code ported from one known environment to another. Application does not change more than 5%.2. Architecture follows an existing pattern. Process design is straightforward. No complex hardware/software interfaces.3. Architecture created from scratch. Process design is straightforward. No complex hardware/software interfaces.4. Architecture created from scratch. Process design is complex. Complex hardware/software interfaces exist but they are well defined and unchanging.5. Architecture created from scratch. Process design is complex. Complex hardware/software interfaces are ill defined and changing.
Score ____
Logic Design -Data
1. Simple well defined and unchanging data structures. Shallow inheritance in class structures. No object classes have inheritance greater than 3.
2. Several data element types with straightforward relationships. No object classes have inheritance greater than
3. Multiple data files, complex data relationships, many libraries, large object library. No more than ten percent of the object classes have inheritance greater than three. The number of object classes is less than 1% of the function points
4. Complex data elements, parameter passing module-to-module, complex data relationships and many object classes has inheritance greater than three. A large but stable number of object classes.
5. Complex data elements, parameter passing module-to-module, complex data relationships and many object classes has inheritance greater than three. A large and growing number of object classes. No attempt to normalize data between modules
Score ____
Logic Design- Code
1. Nonprocedural code (4GL, generated code, screen skeletons). High cohesion. Programs inspected. Module size constrained between 50 and 500 Source Lines of Code (SLOCs).
2. Program skeletons or patterns used. ). High cohesion. Programs inspected. Module size constrained between 50 and 500 SLOCs. Reused modules. Commercial object libraries relied on. High cohesion.
3. Well-structured, small modules with low coupling. Object class methods well focused and generalized. Modules with single entry and exit points. Programs reviewed.
4. Complex but known structure randomly sized modules. Some complex object classes. Error paths unknown. High coupling.
5. Code structure unknown, randomly sized modules, complex object classes and error paths unknown. High coupling.
Score __
Computing Function Points
See http://www.engin.umd.umich.edu/CIS/course.des/cis525/js/f00/artan/functionpoints.htm
Adjusted Function Points- review
Now account for 14 characteristics on a 6 point scale (0-5) Total Degree of Influence (DI) is sum of scores. DI is converted to a technical complexity factor (TCF)
TCF = 0.65 + 0.01DI Adjusted Function Point is computed by
FP = UFP X TCF For any language there is a direct mapping from
Unadjusted Function Points to LOC
Beware function point counting is hard and needs special skills
Function Points Qualifiers
Based on counting data structures Focus is on-line data base systems Less accurate for WEB applications Even less accurate for Games, finite state machine and
algorithm software Not useful for extended machine software and compliers
An alternative to NCKSLOC because estimates can be based on requirements and design data.
Function Point pros and cons
Pros:
• Language independent
• Understandable by client
• Simple modeling
• Hard to fudge
• Visible feature creep
Cons:• Labor intensive• Extensive training • Inexperience results in
inconsistent results• Weighted to file
manipulation and transactions
• Systematic error introduced by single person, multiple raters advised
Initial Conversion
Language Median SLOC/ UFP C 104
C++ 53
HTML 42
JAVA 59
Perl 60
J2EE 50
Visual Basic 42
http://www.qsm.com/FPGearing.html
SLOC
78 UFP * 53 (C++ )SLOC / UFP = 4,134 SLOC
≈ 4.1 KSLOC
.
(Reference for SLOC per function point: http://www.qsm.com/FPGearing.html)
3
15
3037.5
47
75
113142
475638
81
1
10
100
1000
1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005
ExpansionFactor
TechnologyChange:
RegressionTesting
4GL Small ScaleReuse
MachineInstructions
High LevelLanguages
MacroAssemblers
DatabaseManagers
On-LineDev
Prototyping SubsecTimeSharing
ObjectOrientedProgramming
Large ScaleReuse
Order of MagnitudeEvery Twenty Years
Each date is an estimate of widespread use of a software technology
The ratio ofSource line of code to a machine level line of code
Expansion Trends
Heuristics to do Better Estimates
Decompose Work Breakdown Structure to lowest possible level and type of software.
Review assumptions with all stakeholders Do your homework - past organizational experience Retain contact with developers Update estimates and track new projections (and warn) Use multiple methods Reuse makes it easier (and more difficult) Use ‘current estimate’ scheme
Heuristics to meet aggressive schedules
Eliminate features Simplify features & relax specific feature
specifications Reduce gold plating Delay some desired functionality to version 2 Deliver functions to integration team
incrementally Deliver product in periodic releases
Specification for Development Plan
Project Feature List Development Process Size Estimates Staff Estimates Schedule Estimates Organization Gantt Chart
COCOMO
COnstructive COst MOdel Based on Boehm’s analysis of a database of 63
projects - models based on regression analysis of these systems
Linked to classic waterfall model Effort is number of Source Lines of Code (SLOC)
expressed in thousands of delivered source instructions) - excludes comments and unmodified utility software
COCOMO Formula
Effort in staff months =a*KDLOCb
a b
organic 2.4 1.05
semi-detached
3.0 1.12
embedded 3.6 1.20
A Retrospective on the Regression Models
They came to similar conclusions:• Time:
» Watson-Felix T = 2.5E 0.35
» COCOMO(organic) T = 2.5E 0.38
» Putnam T = 2.4E 0.33
• Effort:» Halstead E = 0.7 KLOC 1.50
» Boehm E = 2.4 KLOC 1.05
» Watson-Felix E = 5.2 KLOC 0.91
Initial Conversion
Language Median SLOC/function point
C 104
C++ 53
HTML 42
JAVA 59
Perl 60
J2EE 50
Visual Basic 42
http://www.qsm.com/FPGearing.html
Delphi Method
A group of experts can give a better estimate The Delphi Method:
• Coordinator provides each expert with spec
• Experts discuss estimates in initial group meeting
• Each expert gives estimate in interval format: most likely value and an upper and lower bound
• Coordinator prepares summary report indicating group and individual estimates
• Group iterates until consensus
Function Point Method
External Inputs External Outputs External Inquiries Internal Logical Files External Interface Files
External Input
External Inquiry
External Output
InternalLogical
Files
External Interface
File
Five key components are identified based on logical user view
Application
Downside
Function Point terms are confusing Too long to learn, need an expert Need too much detailed data Does not reflect the complexity of the application Does not fit with new technologies Takes too much time “We tried it once”
Complexity
RecordElement
Types
Data Elements (# of unique data fields)
or File Types Referenced
Low Average High Low
Low Average
HighAverage High
Components: Low Avg. High Total
Internal Logical File (ILF) __ x 7 __ x 10 __ x 15 ___
External Interface File (EIF) __ x 5 __ x 7 __ x 10 ___
External Input (EI) __ x 3 __ x 4 __ x 6 ___
External Output (EO) __ x 4 __ x 5 __ x 7 ___
External Inquiry (EQ) __ x 3 __ x 4 __ x 6 ___
___Total Unadjusted FPs
Data Relationships
1 3
3
For each component compute a Function Point value based on its make-up and complexity of its data
When to Count
CORRECTIVEMAINTENANCE
PROSPECTUS ACHITECTURE TESTING DELIVERY REQUIREMENTS
IMPLEMENTATION
SIZING
SIZING
ChangeRequest
ChangeRequest SIZING SIZING
SIZING
SIZING
:
• Technology (tools, languages, reuse, platforms)• Processes including tasks performed, reviews,
testing, object oriented • Customer/User and Developer skills • Environment including locations & office space• System type such as information systems;
control systems, telecom, real-time, client server, scientific, knowledge-based, web
• Industry such as automotive, banking, financial, insurance, retail, telecommunications, DoD
Estimates vary f{risk factors}
Using the equations
For a 59 function point project to be written in C++, we need to write 59 x 53 = 3127 SLOC
Effort = (productivity)-1 (size)c
= [1/(.9 x 53 KSLOC/SM)] (3.127 KSLOC)1.02
= 2.1 (3.127 )1.02 = 2.1 (3.127 )1 (3.127 ).02
≈ 7 SM
Baseline current performance levels
PERFORMANCEPRODUCTIVITY
CAPABILITIES
PERFORMANCE
SOFTWAREPROCESS
IMPROVEMENT
TIME TO MARKET
EFFORT
DEFECTSMANAGEMENT
SKILL LEVELS
PROCESS
TECHNOLOGYPRODUCTIVITY
IMPROVEMENT INITIATIVES / BEST PRACTICES
RISKS
MEASUREDBASELINE
0.00
5.00
10.00
15.00
20.00
25.00
30.00
35.00
0 100 200 400 800 1600 3200 6400
SubPerformance
BestPractices
IndustryAverages
Organization Baseline
Modeling Estimation
SIZEREQUIREMENT
REQUIREMENT
Analyst
ESTABLISHPROFILE
SELECT MATCHING
PROFILE
GENERATE ESTIMATE
WHAT IFANALYSIS
ACTUALS
Counter ProjectManager Software PM / User Metrics
Database
Plan vs. ActualReport
ProfileSize Time
The estimate is based on the best available information.A poor requirements document
will result in a poor estimate
Accurate estimating is a function of using historical data with an effective
estimating process.
Rate of DeliveryFunction Points per Staff Month
0200
400600800
100012001400
1600
180020002200
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36
SoftwareSize
Establish a baseline
Performance Productivity
A representative selectionof projects is measured
Size isexpressedin terms of functionalitydelivered to theuser
Rate of delivery is a measure of productivity
Organizational Baseline
9
Monitoring improvements
Track Progress
Rate of DeliveryFunction Points per Person Month
0200
400600800
100012001400
1600
180020002200
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36
SoftwareSize
Second year
Brooks Calling the Shot
Do not estimate the whole task by estimating coding and multiplying by 6 or 9!
Effort increases as a power of size Unrealistic assumptions about developer’s
time - studies show at most 50% of the time is allotted to development
Productivity is also related to complexity of the task, more complex, less lines/year - high level languages & reuse critical