6.source code length

Upload: hibaltazar

Post on 06-Apr-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 6.Source Code Length

    1/21

    1

    LENGTH OF SOURCE CODE

    BY PANKAJ KAMTHAN

    1. INTRODUCTION

    Not everything that counts can be counted, and not everything that can be counted counts.

    Albert Einstein

    This document discusses the issue of the length of source code from the perspective of

    theory and practice of software measurement and, in doing so, highlights some of the

    challenges.

    There are a number of products created during development. There are internal product

    attributes for each of these products, one of which is size.

    The knowledge of the size of the following products can be useful:

    Early Conceptual Models.

    Specification. Design. Source Code.There are a number ofaspects of size, one of which is its physical size, namely length. In

    this document, the focus is on the size, specifically the length, of the source code.

    2. CONVENTION

    In the rest of the document, [SLOC] represents an arbitrary metric for the source lines of

    code (SLOC). In this sense, [SLOC] is an abstraction.

    The terms KLOC and KSLOC are also commonly used in relation to SLOC.

  • 8/3/2019 6.Source Code Length

    2/21

    2

    The [SLOC] is arguably the oldest, most commonly-used, and one of the most widely-

    cited size metrics in the industry. For example, it has been suggested [NASA, 1995, Page

    45] that [] use lines of code to represent size.

    However, with the passage of time, [SLOC] has also become one of the most

    controversial metrics [Galorath, Evans, 2006, Chapter 5], and there are a number of

    variations of it.

    3. UNDERSTANDING SLOC

    It is important to develop an understanding of [SLOC] prior to any action, such as

    measurement, on it. A conceptual model contributes towards creating such an

    understanding.

    Definition [Model]. A model is a simplification, with respect to some goal, of a thing.

    To be able to devise a metric for length, there needs to be (1) a conceptual model of

    computer program (and therefore of its source code), and (2) a conceptual model of

    length.

    Figure 1 presents a conceptual model for the source code of a program. It illustrates a

    number of elements and their interrelationships.

    The presence of a loop on an element means that the element is related to itself. It is

    possible to consider n-ry relationships, multiple relationships, and inverse relationships.

    However, for the sake of simplicity, the relationships are limited to binary, single,

    unidirectional relationships.

  • 8/3/2019 6.Source Code Length

    3/21

    3

    Figure 1. A conceptual model for the source code of a program.

    There are a number of ways to represent source code. For example, the L in [SLOC] is

    based on the presumption that the source code is being represented in text.

    Program Source Code Text Line

    Remark. It is important to make a distinction between a program and its source code,

    and the source code and its representation type.

    Source Code Size Length [SLOC]

    4. HISTORY OF SLOC

    The term SLOC has its origins in line-oriented programming languages including,

    but not limited to, PL/I, FORTRAN, and assembly languages. Figure 2 illustrates a punch

    card with a single physical line of source code.

  • 8/3/2019 6.Source Code Length

    4/21

    4

    Figure 2. A punch card (based on an emulator1) with a PL/1 statement.

    5. EVOLUTION OF PROGRAMMING AND THE LENGTH OF SOURCE CODE

    2

    The model of source code length can be influenced by the approach to programming and

    the nature of the programming language deployed.

    There have been changes over the years due to the evolution in programming

    approaches and programming language paradigms. For example, initially and up untilearly 1990s, the source code produced during programming was textual only.

    The advent of third and higher generation programming languages, especially visual

    programming languages, has changed this.

    For example, there are languages that enable a programmer to produce structure and

    behavior ofuser interfaces with little or no source text. In such cases, the concept of

    line may not even apply, and length may have a meaning different from its

    conventional sense.

    The advent of object-oriented programming languages, especially those that are class-

    based, have also provided an alternative to the notion of a line, namely that of a class.

    1URL: http://www.kloth.net/services/cardpunch.php .

    2URL: http://oreilly.com/news/languageposter_0504.html .

  • 8/3/2019 6.Source Code Length

    5/21

    5

    6. VIEWPOINTS

    There can be a number of different (but not necessarily orthogonal) viewpoints of

    source code length.

    Source Code

    Length

    Corresponds-

    To

    View and

    Approach

    Corresponds-

    To

    Development Viewpoint Delivery Viewpoint Usage Viewpoint

    Indeed, each viewpoint is intended for a specific purpose, and thus has (1) its own view

    of the source code length and (2) its own approach for counting the lines of source

    code.

    6.1. DEVELOPMENT VIEWPOINT

    The definition of source code length adopted can be influenced by the ways in which its

    corresponding program is developed and run.

    It can be acknowledged that source code includes relatively more header statements and

    data declarations, and relatively less code that actually executes.

    For example, for certain purposes, such as testing, it is important to know the number of

    Executable Statements (ES).

    This metric considers separate statements on the same physical line as distinct, each

    of which contributes towards the count. It ignores comment lines, header statements, and

    data declarations.

    6.2. DELIVERY VIEWPOINT

    The definition of source code length adopted can be influenced by what really matters to

    the recipients at the end of the software development process, that is, what is eventually

    delivered to the client.

    The amount of source code that is delivered can be significantly different from the

    amount of source code that is actually developed.

    For example, for development and testing, drivers, stubs, prototypes, and scaffolding

    code may be written or generated by the programming team. However, these may be

  • 8/3/2019 6.Source Code Length

    6/21

    6

    discarded or ignored at the time the final version is tested and subsequently turned over

    to the client.

    Therefore, there is a need to distinguish the amount ofdelivered source code from the

    amount ofdeveloped source code.

    The number ofDelivered Source Instructions (DSI) encapsulates this aspect of length.

    This metric considers separate statements on the same physical line as distinct, each

    of which contributes towards the count. It ignores comment lines.

    DSI is different from ES in the sense that it includes header statements and data

    declarations.

    6.3. USAGE VIEWPOINT

    The definition of source code length adopted can be influenced by the ways in which the

    measure of length is expected to be used.

    For example, an organization may use length for inter-project conclusions such as (1)

    compare projects to determine average project size, or (2) observe the trends in project

    size over time.

    For example, an organization may use length for intra-project conclusions such as (1)

    compare units to determine average unit size, or (2) to explore the relationship between

    unit length and the number of faults (for example, if the length influences the number

    of faults).

    7. MODELS OF SLOC

    There are two common models of SLOC, namely the physical SLOC and the logical

    SLOC.

    7.1. PHYSICAL SLOC

    The most brilliant decision in all of Unix was the choice of a single character for the newline

    sequence.

    Mike ODell

    The model of source code length can be influenced by what physically (spatially) exists

    upon development.

  • 8/3/2019 6.Source Code Length

    7/21

    7

    DEFINITION OF PHYSICAL SLOC

    The following definition is one of the earliest definitions of the physical SLOC. It

    specifically includes all lines containing program headers, declarations, and executable

    and non-executable statements.

    Definition 1 [Source Line of Code (SLOC)] [Conte, Dunsmore, Shen, 1986]. A source

    line of code is any line of program text that is not a comment or blank line, regardless of

    the number of statements or fragments of statements on the line.

    The following definition resulted from putting measurement programs into practice at

    Hewlett-Packard.

    Definition 2 [Source Line of Code (SLOC)] [Grady, Caswell, 1987]. A line of code is

    a non-commented source statement: any statement in the source code except for blanklines or comment lines.

    The following definition resulted from a standardization effort.

    Definition 3 [Source Line of Code (SLOC)] [Park, 1992]. [A single] physical SLOC

    [corresponds] to one line starting with the first character and ending by a carriage return

    or an end-of-file marker of the same line, and which excludes the blank and comment

    line.

    EXAMPLES

    Example 1.

    sum = a + b + c +

    d + e + f +

    g + h + i;

    Example 2.

    /* The following has a semantic error. */

    if (x < 0) {

    printf("x is a positive number");

    }

    In each of these cases, the physical SLOC = 3.

  • 8/3/2019 6.Source Code Length

    8/21

    8

    ADVANTAGES AND LIMITATIONS OF PHYSICAL SLOC

    A physical SLOC count based on these definitions does not take in account syntactic and

    other variations across different programming languages. Therefore, it can be viewed as

    language-independent.

    However, a physical SLOC count is dependent of the style conventions of the statements

    that are being counted.

    TOOLS FOR CALCULATING PHYSICAL SLOC

    There are a number of tools for calculating the physical SLOC, including SLOCCount3,

    LocMetrics4, and CLOC

    5. These tools vary in a number of ways (commercial or non-

    commercial, textual or graphical interface, and so on).

    There are SLOCCount implementations for different operating systems and for different

    programming languages. For example, there are SLOCCount implementations on Linux

    (Debian and Ubuntu). SLOCCount has also inspired the development of LocMetrics and

    CLOC.

    7.2. LOGICAL SLOC

    The definition of source code length can be influenced by what in the source code

    actually does something when its corresponding program is executed.

    The logical SLOC is given by the number of statements in a program.

    ADVANTAGES AND LIMITATIONS OF LOGICAL SLOC

    The purpose of counting SLOC for a program is usually related to its logic and, in this

    sense, logical SLOC has an advantage over physical SLOC.

    A logical SLOC count is independent of the style conventions of the statements that arebeing counted. Therefore, logical SLOC can provide an accurate count of cases such as

    multiple logical statements residing on a single line, or that single logical statement

    spanning multiple lines.

    3URL: http://www.dwheeler.com/sloccount/ .

    4URL: http://www.locmetrics.com/ .

    5URL: http://cloc.sourceforge.net/ .

  • 8/3/2019 6.Source Code Length

    9/21

    9

    The notion of a statement varies across programming languages. This variation makes

    logical SLOC language-dependent.

    For example, a logical SLOC measure for C (and other programming languages that have

    been inspired by it) is the number of statement-terminating semicolons (;).

    EXAMPLES

    Example 1.

    sum = a + b + c +

    d + e + f +

    g + h + i;

    In this case, the logical SLOC = 1.

    Example 2.

    /* The following has a semantic error. */

    if (x < 0) {

    printf("x is a positive number");

    }

    In this case, depending on the interpretation, the logical SLOC = 1 (based on the

    presence of the number of ;) or logical SLOC = 2 (based on the presence of the if

    statement and theprintf statement).

    Remark. The counting of the logical SLOC is ambiguous.

    7.3. LOGICAL SLOC: REPRISE

    The counting of the logical SLOC should not depend on a specific syntactic construct.

    Indeed, determining the beginning and the end of each statement leads to a number ofissues in counting logical SLOC.

    For example, a semicolon may not be used by a programming language; its use may be

    optional; or it may not play the role of a statement terminator.

    This realization has led to improvements in the understanding of logical SLOC.

  • 8/3/2019 6.Source Code Length

    10/21

    10

    Definition [Source Statement] [Nguyen, Deeds-Rubin, Tan, Boehm, 2007]. A source

    statement is considered as a block of code that performs some action at runtime or directs

    compilers at compile time.

    The source statements can be classified into three types: executable, declaration, and

    compiler directive.

    Definition [Executable Line of Code] [Nguyen, Deeds-Rubin, Tan, Boehm, 2007]. A

    line that contains software instruction executed during runtime and on which a breakpoint

    can be set in a debugging tool. An instruction can be stated in a simple or compound

    form.

    Definition [Data Declaration Line] [Nguyen, Deeds-Rubin, Tan, Boehm, 2007]. A

    line that contains declaration of data and used by an assembler or compiler to interpret

    other elements of the program.

    Definition [Compiler Directive] [Nguyen, Deeds-Rubin, Tan, Boehm, 2007]. A

    statement that tells the compiler how to compile a program, but not what to compile.

    A source statement is an atomic and relatively independent unit at the source code

    level. In other words, the statement is considered as the smallest increment of work

    carried out by a programmer performs at a given unit of time.

    Thus, simple and compound statements yield the same number of logical SLOC.

    For example, the for statement, which consists of initialization, condition, and

    increment statements, is counted as one logical SLOC rather than three (one for each

    enclosed statement).

    EXAMPLES

    Example 1. Java.

    if (i > 10) break;

    Example 2. Perl.

    if ($x != 0) {

    print "non-zero";

    }

  • 8/3/2019 6.Source Code Length

    11/21

    11

    Example 3. XML.

    In each of these cases, the logical SLOC = 2.

    TOOLS FOR CALCULATING LOGICAL SLOC

    There are a number of tools for calculating logical SLOC, including USC

    CodeCountTM6. There are USC CodeCountTM implementations for different

    programming languages and markup languages. In USC CodeCountTM

    , logical SLOC

    is the total number of source statements in the source code.

    8. TOWARDS A STANDARDIZATION OF SLOC

    The pursuit ofovercoming the difficulties in counting in both logical and physical SLOC

    has led to standardization efforts.

    IEEE

    The IEEE Standard 1045-1992 [IEEE, 1993] provides definitions and attributes of

    SLOC-related metrics.

    SEI/CMU

    The initiative [Park, 1992] of the Software Engineering Institute (SEI) at Carnegie

    Mellon University (CMU) provides counting methods that could be used to define aconsistent and repeatable SLOC measurement. It includes counting definitions and

    checklists to be used as guidelines.

    It is referred to and used by SLOCCount.

    6URL: http://csse.usc.edu/research/CODECOUNT/ .

  • 8/3/2019 6.Source Code Length

    12/21

    12

    However, the focus of the framework is on what to count, not on how many to count. It

    includes both logical and physical SLOC, and allows for variations in each of these

    counts.

    This poses difficulties in the development of counting tools [Nguyen, Deeds-Rubin,

    Tan, Boehm, 2007]. For example, USC CodeCountTM counts each compiler directive as

    a logical SLOC, while LocMetrics does not.

    9. [SLOC] AND METRICS TAXONOMY

    From the perspective of one of the metrics taxonomy, the following is a faceted

    classification of [SLOC]:

    Metric Coordinates of Classification[SLOC] Internal, Product, Implementation, Direct (Atomic), Static, Objective

    From the perspective of another metrics taxonomy7, [SLOC] is a kind of linguistic

    metric.

    10. [SLOC] AND THEORY OF SOFTWARE MEASUREMENT

    The length of a programs source code is measured on the ratio scale. The zero-length

    element is an empty piece of code. It is possible to measure length in a variety of ways,

    including lines of code, the number of executable statements, the number of characters,

    and so on.

    Let M be the measure of the length of a programs source code in [SLOC] and M' be the

    length in the number of characters. Then, it is possible to convert from one length

    measure to another using a transformation of the form

    M' = aM,

    where a is a constant, namely the average number of characters per line of code.

    7URL: http://www.cs.technion.ac.il/Courses/OOP/slides/export/236804-Fall-1997/metrics/part1.html .

  • 8/3/2019 6.Source Code Length

    13/21

    13

    11. THE ADVANTAGES OF [SLOC]

    There are a number of advantages and disadvantages, including risks, of using [SLOC]

    for estimation have been pointed out elsewhere [Pfleeger, Wu, Lewis, 2005].

    11.1. VISIBILITY

    The [SLOC] has been considered attractive, especially by project managers, for a number

    of reasons:

    The SLOC are proof of actual work. The SLOC are visible. The counting and understanding of the SLOC does not require any special skill.11.2. AUTOMATION

    A [SLOC] calculation can be readily automated, and such a utility could be developed

    relatively easily as it does not require a sophisticated tool to do the automation. This

    reduces the time and effort required to produce an estimate.

    However, a counting utility may not be (easily) transferable across programming

    languages.

    11.3. REUSE

    The [SLOC] serves as a basis for a number of other metrics that are derived throughout

    the software development life cycle.

    12. A PERSPECTIVE ON THE ENDORSED APPLICATIONS OF THE [SLOC]

    There have been a number of claims of the uses of [SLOC], although not all can be

    substantiated.

    12.1. COMPARISON OF PROGRAMS

    The [SLOC] could be used to compare source code based on the same programming

    language.

  • 8/3/2019 6.Source Code Length

    14/21

    14

    However, in absence of necessary adjustments, physical SLOC is not useful for

    comparing programs across different programming languages.

    For example, a physical SLOC comparison between source code in Perl, Java, and

    COBOL may even seem ridiculous.

    12.2. PROGRAMMER PRODUCTIVITY

    The programmer productivity can be defined in a number of ways [Fenton, Pfleeger,

    1997, Page 408], including

    Size / Effort = ( [SLOC] Measurement ) / ( Person-Months ).

    However, using [SLOC] leads to a simplistic measure of productivity as it does not taken

    into account the effective use of resources and creativity.

    In other words, if the [SLOC] calculation is used as the only measure of programmer

    productivity, then it encourages quantity over quality.

    In [Jones, Bonsignour, 2012, Chapter 2], a case study of a telecommunications company

    in Europe is presented. It suggests the use of [SLOC] is termed as professional

    malpractice because it violates the basic principles of manufacturing economics and

    show[s] the highest productivity rates for the lowest-level languages (Figure 3).

    Figure 3. A ranking of productivity using [SLOC]. (Source: [Jones, Bonsignour, 2012,

    Table 2.24].)

  • 8/3/2019 6.Source Code Length

    15/21

    15

    12.3. COST ESTIMATION

    In the 1970s and part of 1980s, the attention in software development was largely on

    programming, and the SLOC was the most perceivable indicator of software cost.

    The use of [SLOC] is prevalent in a number ofcost estimation approaches. It is an input

    parameter for a number of cost estimation models such as COCOMO, SLIM, and SEER-

    SEM.

    For example, KDSI is used as a size input for the COCOMO 81 Cost Estimation

    Model [Boehm, 1981], and the logical SLOC is recommended as a size input for the

    COCOMO II Cost Estimation Model [Boehm, Abts, Brown, Chulani, Clark, Horowitz,

    Madachy, Reifer, Steece, 2000].

    However, there are limitations of using the [SLOC] for estimating effort. There is nodirect relationship between SLOC and effort, and therefore the correlation between

    SLOC and effort is weak. This is further elaborated in the following:

    Automation and Succinctness. There are cases where a large amount of source codecould be automatically generated with little effort (as in the case of user interface

    development using programming languages like Visual Basic) and, conversely, a lot

    of effort may have gone into making source code succinct.

    Comment Lines. There is effort involved in producing comment lines, although theymay not require the same effort as the rest of the source code, especially for a non-

    trivial algorithm. However, the physical SLOC and the logical SLOC calculations are

    expected to exclude any comment lines. This can discourage programmers from

    including comments.

    13. A PERSPECTIVE ON THE EXCLUSION OF BLANK LINES AND

    COMMENT LINES FROM [SLOC]

    The following two definitions are necessary for stating a definition of the physical SLOC.

    Definition [Blank Line] [Nguyen, Deeds-Rubin, Tan, Boehm, 2007]. A physical

    source line of code that contains any number of white space characters such as space, tab,

    form feed, carriage return, line feed, or their derivatives.

    Definition [Comment Line] [Nguyen, Deeds-Rubin, Tan, Boehm, 2007]. A comment

    is a string of zero or more characters that follow language-specific comment delimiter.

  • 8/3/2019 6.Source Code Length

    16/21

    16

    13.1. SOURCE CODE QUALITY

    The [SLOC] calculation does not take into account the quality of source code.

    BLANK LINES AND READABILITY

    The use of styling conventions (such as formatting using blank lines) contributes to the

    readability of the source code of a program.

    However, the [SLOC] calculations are expected to exclude any blank lines.

    COMMENT LINES AND UNDERSTANDABILITY

    It is known that, in general and if done appropriately, internal documentation

    contributes to the understandability of a software artifact. In particular, comments are akind ofannotation (meta-information), and thereby contribute to understandability of

    source code.

    However, the [SLOC] calculations are expected to exclude any comment lines.

    13.2. SOURCE CODE ON PHYSICAL MEDIA

    There is loss of important information in using [SLOC].

    For example, in certain situations, the source code length is used for deciding the amount

    ofcomputer storage required for the source code, or the amount ofpages required for

    a printout.

    In these cases, the source code length must reflect blank lines and comment lines.

    13.3. A PARTIAL SOLUTION

    The following is an adaptation [Fenton, Pfleeger, 1997].

    Let PSLOC be a metric that counts the number of physical SLOC according to either of

    the definitions 1, 2, or 3, and let CSLOC be a metric that counts the comment lines of

    source code.

  • 8/3/2019 6.Source Code Length

    17/21

    17

    Then,

    TSLOC = PSLOC + CSLOC,

    where TSLOC is preferable over PSLOC as a single metric for source code length.

    Let x and y be two pieces of source code, and the empirical relation Is-longer-

    Than8 be represented by the numerical relation > between the TSLOC.

    Then,

    x Is-Longer-Than y TSLOC(x) > TSLOC(y).

    Thus, the TSLOC satisfies the representation condition. Therefore, from the perspective

    of representational theory of measurement [Fenton, 1994], the TSLOC is a validmeasure for the length attribute of a source code entity.

    The expression

    CSLOC / TSLOC

    measures the density of comments in source code, which can be an indicator of the

    extent of self-documentation.

    There are tools such LocMetrics that have support for CSLOC.

    14. CONCLUSION

    To be specific, the text of the programs source code is the entity, the length is the

    attribute, and [SLOC] is one of the metrics for the attribute.

    In any reference to [SLOC], the details of viewpoint, view, and the counting approach

    must be made explicit. In particular, there needs to be a clarification ofwhat is being

    counted and how it is being counted.

    If not, then the count, based on the viewpoint, view, or the counting approach being used,

    can vary significantly from each other. For example, a comparison of source code from

    8It can be noted that this can indeed be checked by mere observation, for example, by visual examination

    of two printouts.

  • 8/3/2019 6.Source Code Length

    18/21

    18

    software projects, spanning multiple organizations, using different definitions of SLOC

    becomes prohibitive.

    ACKNOWLEDGEMENT

    The inclusion in this document of an image from an external source is only for non-

    commercial educational purposes, and its use is hereby acknowledged.

  • 8/3/2019 6.Source Code Length

    19/21

    19

    REFERENCES

    [Boehm, 1981] Software Engineering Economics. By B. W. Boehm. Prentice-Hall. 1981.

    [Boehm, Abts, Brown, Chulani, Clark, Horowitz, Madachy, Reifer, Steece, 2000]

    Software Cost Estimation with COCOMO II. By B. W. Boehm, C. Abts, A. W. Brown,

    S. Chulani, B. K. Clark, E. Horowitz, R. Madachy, D. Reifer, B. Steece. Prentice-Hall.

    2000.

    [Conte, Dunsmore, Shen, 1986] Software Engineering Metrics and Models. By S. Conte,

    H. Dunsmore, V. Shen. Benjamin-Cummings. 1986.

    [Fenton, 1994] Software Measurement: A Necessary Scientific Basis. By N. Fenton.

    IEEE Transactions on Software Engineering. Volume 20. Issue 3. 1994. Pages 199-206.

    [Fenton, Pfleeger, 1997] Software Metrics: A Rigorous and Practical Approach. By N. E.

    Fenton, S. L. Pfleeger. International Thomson Computer Press. 1997.

    [Galorath, Evans, 2006] Software Sizing, Estimation, and Risk Management. By D. D.

    Galorath, M. W. Evans. Auerbach Publications. 2006.

    [Grady, Caswell, 1987] Software Metrics: Establishing a Company-Wide Program. By R.

    B. Grady, D. L. Caswell. Prentice-Hall. 1987.

    [IEEE, 1993] IEEE Standard 1045-1992. Standard for Software Productivity Metrics.

    IEEE Computer Society. 1993.

    [Jones, Bonsignour, 2012] The Economics of Software Quality. By C. Jones, O.

    Bonsignour. Addison-Wesley. 2012.

    [NASA, 1995] Software Measurement Guidebook. By NASA Software Engineering

    Program. Technical Report NASA-GB-001-94. National Aeronautics and Space

    Administration. 1995.

    [Nguyen, Deeds-Rubin, Tan, Boehm, 2007] A SLOC Counting Standard. By V. Nguyen,

    S. Deeds-Rubin, T. Tan, B. Boehm. Technical Report. COCOMO Forum. 2007.

    [Park, 1992] Software Size Measurement: A Framework for Counting Source Statements.

    By R. E. Park. Technical Report CMU/SEI-92-TR-020 ESC-TR-92-020. Software

    Engineering Institute. Carnegie Mellon University. Pittsburgh, U.S.A.

  • 8/3/2019 6.Source Code Length

    20/21

    20

    [Pfleeger, Wu, Lewis, 2005] Software Cost Estimation and Sizing Methods: Issues and

    Guidelines. By S. L. Pfleeger, F. Wu, R. Lewis. RAND Corporation. 2005.

  • 8/3/2019 6.Source Code Length

    21/21

    This resource is under a Creative Commons Attribution-Noncommercial-No

    Derivative Works 3.0 Unported license.