lecture 10 implementation

Post on 23-Jan-2016

24 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Lecture 10 Implementation. CSCI – 3350 Software Engineering II Fall 2014 Bill Pine. Overview. The Implementation Workflow Choosing a Programming Language Good Programming Practices Coding Standards Metrics for the Implementation Workflow. Overview (cont). Secure Coding Background - PowerPoint PPT Presentation

TRANSCRIPT

Lecture 10Implementation

CSCI – 3350 Software Engineering II

Fall 2014

Bill Pine

CSCI 3350 Lecture 10 - 2

Overview

• The Implementation Workflow • Choosing a Programming Language• Good Programming Practices• Coding Standards• Metrics for the Implementation Workflow

CSCI 3350 Lecture 10 - 3

Overview (cont)

• Secure Coding Background• Buffer overflow attack• Strategies to reduce vulnerability• Guiding Principles for Software Security

CSCI 3350 Lecture 10 - 4

Implementation Workflow

• Goal: Clearly and accurately represent the detailed application design in the chosen implementation language– Define the unit tests– Write the code– Execute the unit test suite

• Resolve any discrepancies

– Submit to QA group for further evaluation

CSCI 3350 Lecture 10 - 5

Choosing a Programming Language

• Specified directly as a requirement• Specified indirectly as a requirement

– Platform specified

• If there is an opportunity for choice– “Most appropriate language” requirement– Or you may be driven by an object-oriented

design and implementation requirement

CSCI 3350 Lecture 10 - 6

Taking a Decision

• Base upon– Cost benefit analysis– Risk analysis

• Use the language strength of the organization– Procedural vs. object-oriented

• Pure object-oriented• Hybrid

• Acquiring needed skills issue– Hire new talent– Retrain existing employees– A mixture?

CSCI 3350 Lecture 10 - 7

X- Generation Language

• First-generation languages– Machine code

• Second-generation language– Assembly

• Third-generation language– High order language

• Multiple (5-10) machine instructions/source line• Examples: FORTRAN, C, COBOL …

CSCI 3350 Lecture 10 - 8

Fourth-Generation Language

• Application-specific language• Original goal was 25 - 50 mi/source line

– Database based• PowerBuilder• Oracle• DB2• Report generators

– Mathematics based• Mathmatica• SPSS

CSCI 3350 Lecture 10 - 9

Good Programming Practices

• Many best practices tend to be language specific– Some authors have made a career of adapting for each

new language– Example: Henry Ledgard - authored > 25 works

• Programming Proverbs • Programming Proverbs and Principles• Programming Proverbs for FORTRAN Programmers• FORTRAN with Style: Programming proverbs • Pascal with Style: Programming proverbs • Pascal with Excellence: Programming proverbs • Programming Language Landscape

• Some general practices cut across specific languages

Best Coding Practices

• The following slides on best practices draw heavily upon– Clean Code – full citation in reference

• Agile Development Community is the origin of the Best Coding Practices

• You must devote effort to writing and maintaining quality code– As the code deteriorates, so decreases team

productivity– Decreasing productivity, causes less effort to be

expended in maintaining code quality. Leading to lower productivity …

– A positive feedback loop that is inherently unstable

Best Coding Practices (cont)

Writing clean code is what you must do in order to call yourself a professional. There is no reasonable excuse for doing anything

less than your best.

- Robert Martin

CSCI 3350 Lecture 10 - 12

Best Programming Practices

• Will examine guidelines relating to the following areas– Identifier names– Functions– Comments– Formatting

Guidelines for Identifier Names

• Meaning must be obvious to the maintenance programmer– Maximize communications

• Use intention-revealing names– Much harder than it seems– Accept that the name will probably change as

you are developing

CSCI 3350 Lecture 10 - 13

CSCI 3350 Lecture 10 - 14

General Practices

• General Guidelines– Variables should be nouns (noun phrases)– Class and objects names should be nouns (noun

phrases– Function and method names should be verbs (verb

phrases)– Kernighan and Pike assert: “Long names for global

identifies; short names for local identifiers”

Identifier Names (cont)

• Identifier name should answer the questions– Why does the entity exist?– What does the entity do?– How is the entity used?

• If the identifier name requires a comment to answer these questions– The name does not reveal the identifier’s intent

and needs to be changed

CSCI 3350 Lecture 10 - 15

Identifier Names (cont)

• Avoid Disinformation– Use a difference in identifier only when you are making

a meaningful distinction • Example: fetch, get, retrieve or controller, manager, driver

– Don’t use lower case L or upper case O as variable name

– Don’t use noise words• a, an, the - as prefixes without a convention• info, data – as suffixes• nameString instead of name?

CSCI 3350 Lecture 10 - 16

Contrived (?) Example

int a = 1;

if( 1 == O1 )a = Ol

elsea = 01

CSCI 3350 Lecture 10 - 17

CSCI 3350 Lecture 10 - 18

Identifier Names (cont)

• Use pronounceable names• Use searchable names

– One or two letter variable names and literal constants yield too many matches

• Avoid encoding the identifier type in the name– In particular, eschew Hungarian notation

• No help with strongly typed languages• Allow for misleading information if the type changes

– But the encoding doesn’t

CSCI 3350 Lecture 10 - 19

Identifier Names (cont)

• Avoid mental mappings– Short names / heavily abbreviated names require

the reader to translate• Avoid cute names• Prefer solution domain names over problem

domain names– Who is the reading audience of your code?

• Prefer problem domain names over informal names

CSCI 3350 Lecture 10 - 20

Identifier Names (cont)

• Don’t add gratuitous context to identifier names– Add context only as necessary

• accountAddress and customerAddress may be appropriate for instances of a class

• But not appropriate for a class name – Address would be a better choice

CSCI 3350 Lecture 10 - 21

Identifier Names (cont)

• Final comments– Poor names

• Impede communications between the code author and the code reader

• Have been shown to be an indicator of overall poor code quality

– Indicate a less than complete understanding by the author– Point to likely areas for code faults

CSCI 3350 Lecture 10 - 22

Guidelines for Functions

• First rule of functions– A function should be small

• Second rule of functions– A function should be smaller than would be

produced by rule 1– Try for an average of 20 lines / function– Indent depth should should be 1 or 2 levels

CSCI 3350 Lecture 10 - 23

Functions (cont)

• Functions should do 1 thing– They should do it well– They should do that 1 thing only– All steps in the function should be at the same

level of abstraction

• Principle of Least Surprise– Based upon the function name, the code in the

function is what you would expect

CSCI 3350 Lecture 10 - 24

Functions (cont)

• The ideal number of arguments is zero– Niladic

• Followed by– 1 argument – Monadic– 2 arguments – Dyadic– 3 arguments – Triadic

• Any more than 3 requires compelling justification

CSCI 3350 Lecture 10 - 25

Functions (cont)

• Why restrict the the number of arguments?– An increasing number of argument requires

increasing conceptual power– Harder to test and requires an increasing number

of tests

• Eschew flag arguments– Indicate that a function is doing more than 1 thing

• Functions should have no side effects

CSCI 3350 Lecture 10 - 26

Functions (cont)

• Avoid output arguments– The reader’s expectation is that an argument is an

input– Prior to object oriented programming, one could

justify the use of output arguments• With o-o, instead of having a function return a value

through an argument, the function should change the state of the appropriate object

CSCI 3350 Lecture 10 - 27

Functions (cont)

• Functions should change the state of an object or return the state of an objects – never both

• Prefer exceptions over returning error codes• Never duplicate code (i.e. copy &paste)

– Code bloat– Multiple places to change the code => multiple

places for faults to be injected

CSCI 3350 Lecture 10 - 28

Guidelines for Comments

• Myth of “self-documenting” code– Goal: The code should not need comments to

make clear the “how” of the code– Always need internal documentation

• To meet the need of making clear the “why”• Block comments at the beginning of each unit• Comments interspersed (as needed) within the unit

CSCI 3350 Lecture 10 - 29

Comments (cont)

• The previous slide not withstanding, which code would you rather read?

• Version 1//// *** Check if employee is eligible for benefits//

if( (employee.flags & HOURLY_FLAG) && (employee.age > 55 ) )

• Version 2 if( employee.isEligibleForFullBenefits( ) )

CSCI 3350 Lecture 10 - 30

Comments (cont)

• Additional thoughts on comments– Don’t comment the obvious– Don’t use end-of-line comments with high-

order languages– Format of the comments should reflect and

reinforce the structure of the code– Comments must be accurate

• Agree with the code and support reading the code

CSCI 3350 Lecture 10 - 31

Comments (cont)

– Don’t comment closing braces– Don’t use comments as a substitute for source

code versioning systems• Remove commented-out code from production code• Don’t add bylines

CSCI 3350 Lecture 10 - 32

Guidelines for Formatting

• Code formatting is important– Format as you write the code, not as a cleanup

operation

• Remember the PARC Design Principle• Indentation

– Source code is a hierarchy• Use consistent indentation to reflect the hierarchy• Don’t violate indentation – ever – not even for short

functions / methods

Formatting (cont)

• Intra-line white space– Some freedom to improve readability if your

editor / IDE doesn’t insist upon removing “extraneous”spaces

– Consider the following

root1 = (-b+sqrt(b*b-4ac))/(2*a)– Versus

root2 = (-b - sqrt(b*b – 4*a*c))/(2*a)

CSCI 3350 Lecture 10 - 33

CSCI 3350 Lecture 10 - 34

Miscellaneous Practices

• Eschew literal constants for symbolic constants– Higher informational content– Easier to read– Easier to maintain– More readily searchable

CSCI 3350 Lecture 10 - 35

Miscellaneous (cont)

• Layout– Use the block separators consistently

• K & R• Allman• Whitesmith

– One statement per line– Use parenthesis to eliminate misunderstanding

• Order of precedence

– Break complex expressions into simpler ones

CSCI 3350 Lecture 10 - 36

Miscellaneous (cont)

• Strive for clearness not cleverness– Be concise, but not at the expense of readability

• Be aware of side effects– Some languages have operators that

• Return a value• Modify the internal state of an item• Do not specify the exact order of execution

CSCI 3350 Lecture 10 - 37

Miscellaneous (cont)

• Idioms– Definition - an expression that has a meaning

not readily understood from the meaning of the individual words

– A central issue in learning any language is to absorb and use the idioms

– Example• “Burf is a student after my own heart”• Array idioms (code patterns)• List walking

CSCI 3350 Lecture 10 - 38

Coding Standards

• Purpose is to define the practices that make the life of the development and maintenance programmers easier

• Records, documents and clarifies the set of best programming practices that will be used by the– Organization– Team– Project

CSCI 3350 Lecture 10 - 39

Recall The Distinction

• Error - A discrepancy between an actual value and a expected value

• Failure - Inability for the system to perform according to specifications

• Fault - A condition that causes the system to fail• If an error is observed, then a failure must have

occurred• If a failure has occurred, then there must be a fault

in the system

CSCI 3350 Lecture 10 - 40

Implementation Metrics

• Code complexity metrics– Lines of code

• Assumes a constant probability that a line of code contains a fault

• More lines of code => more faults• A number of studies have shown a correlation

between the number of faults and the size of the application

CSCI 3350 Lecture 10 - 41

Implementation Metrics (cont)

– McCabe’s cyclomatic complexity M• M = number of binary decision + 1• A measure of the number of branches in the code• Recall white-box testing coverage criteria

– M can be used as a measure of the number of test cases for branch coverage

CSCI 3350 Lecture 10 - 42

Implementation Metrics (cont)

• Advantages– Almost as easy to calculate as lines of code– Studies show a good correlation between M and number

of faults

• Disadvantages– M correlates strongly with lines of code – There may be little additional value over lines of code

CSCI 3350 Lecture 10 - 43

Implementation Metrics (cont)

• Testing metrics– Number of tests

• McCabe’s M a good measure for number of tests for branch coverage

– Total number of faults• Exceeding a threshold triggers rewrite of a “chunk” of code

– Number of faults by faulty type• Use of the types of faults to generate checklists for non-

execution based testing

CSCI 3350 Lecture 10 - 44

Origins of Bad Software

• Graff and van Wyk cite three factors– Technical– Psychological– Real world

• Probably not due to– Ignorance– Stupidity– Laziness

CSCI 3350 Lecture 10 - 45

Technical Factors

• Secure software is intrinsically difficult to write– Complexity

• Composition– System composed of multiple separate

components– Each component standing alone is secure– Combination introduces a vulnerability

CSCI 3350 Lecture 10 - 46

Psychological Factors

• Software professionals make mistakes• Even when examining software for vulnerabilities,

– Tend to discover only faults• That we are looking for• That we understand• That we know how to fix

• Most people find it hard to– Assume that the “other guy” is a “bad guy”

• We are too willing to trust others

– Adopt a different view of the software

CSCI 3350 Lecture 10 - 47

Different Views of Software

• Software developers frequently employ mental models– When viewed only from the mental model,

potential vulnerabilities are not apparent

• The bad guy is successful in his attack by adopting a different mental model

CSCI 3350 Lecture 10 - 48

Mouse Attack

• An attacker was able to gain control of a Unix system by abusing a mouse driver

• Purpose of the driver was to position the cursor at a specified screen location

• Since the driver needed to interact with the display hardware it was installed with high privileges

• Driver worked error-free for years• Until …

CSCI 3350 Lecture 10 - 49

Mouse Attack (cont)

• An attacker directly called the driver with very large values for the screen coordinates– Internal memory of the driver was overwritten– Allowing the attacker to gain control of the system

• The attacker was successful by– Ignoring the mental model for the driver– Concentrating on the code bytes

• Developer’s mental model did not admit the possibility for the driver being directly called by an application program

• The ability to ignore the mental model is– A hard skill for developers to cultivate– An essential skill for locating vulnerabilities

CSCI 3350 Lecture 10 - 50

Some Different Views of the System

• An ordered set of algorithms• Lines of text on the screen• An ordered set of instructions for a specific

processor• A series of bits ( 0 | 1 )

– In memory– On magnetic disk– On optical media

CSCI 3350 Lecture 10 - 51

Some Different Views (cont)

• An ordered set of linked libraries, other components

• A stream of bits along various pathways• Executing on a host as a part of a network• A set of vertical layers ( transport, protocol,

presentation, … )• A ordered set of events, with critical timing

intervals

CSCI 3350 Lecture 10 - 52

Real World Factors

• Source of essential source code– Much was written by “amateurs”

• The architecture and design decisions for the TCP/IP network subsystem

– Developed by Berkeley undergraduates

• Much of the code for Internet applications written by people without any software engineering training– Web pages, scripts, …

• A phenomenon know as “democratization of development”

CSCI 3350 Lecture 10 - 53

Real World Factors (cont)

• But, the real software professionals are responsible for most of the problems– Even with

• Extensive training• Awareness of the critical issues • The best of intension

Developing secure software is one of the most challenging activities imaginable

CSCI 3350 Lecture 10 - 54

Real World Factors (cont)

• Production pressures• Just secure enough

– As little as possible, just enough to prevent loss of sales and avoid bad publicity

– Resources spent on security mean fewer features in the next release

– By not acknowledging security problems, don’t have to deal with them

• Tragedy of the commons– Garrett Hardin, 1988

• Pasture land commonly shared by many herdsmen– Likewise the common shared Internet

CSCI 3350 Lecture 10 - 55

Focus for the Rest of the Lecture

• Although as mentioned earlier, security issues in the– Architecture– Design

Are of equal or perhaps greater importance,

• These issues are the focus of software engineering• For the remainder of the lecture, we will

concentrate on coding– In particular, on buffer overflows

CSCI 3350 Lecture 10 - 56

Buffer Overflow Background

• Buffer overflows are arguably the most common form of attack

• First well-known buffer overflow attack– The Internet Worm, written and released by Robert T.

Morris in 1988, – Infected thousands of systems on the Internet– Exploited a buffer overflow in the finger daemon

• In 1999, Brian Snow predicted that buffer overflow attacks would still be a problem 20 years hence

CSCI 3350 Lecture 10 - 57

Process Memory Image

• Text = code• Data = unintialized and

initialized data• Heap - allocated by new• Stack - local variables,

frame• Environment = PATH,

HOME, …

Text Increasing Address Data DLLs Heap Stack

Command Line Parms Environment

CSCI 3350 Lecture 10 - 58

Structure of the Stack Frame

• For each function call, certain data is placed on the stack

Function Local Variables Return Address Function Parameters Caller Stack Frame Decreasing Addr

CSCI 3350 Lecture 10 - 59

Local Variable Overflow

• Normally, if a local variable overflows– The data on the stack is “clobbered”– When the function attempts to return

• The process crashes

• If however, a “bad guy” carefully crafts the data that overflows– Replaces the return address with a valid address that

contains code that the “bad guy” wants to execute

• For excruciating details see– Smashing the Stack for Fun and Profit

CSCI 3350 Lecture 10 - 60

Preventing Overflow

• Many languages perform bounds checks on arrays and strings to prevent overflow

• Not so, C, C++• Main offenders

– strcpy, strcat– sprintf– scanf, – gets

And all their sisters, and their cousins, and their aunts

CSCI 3350 Lecture 10 - 61

Preventing Overflow (cont)

• A minor improvement (for C programmers)– Use strncpy, strncat– But these are not without problems

• strncpy( source, destination, len );• If source contains more characters than specified by len,

– No terminating null is place at the end of source

– Better choice• strlcpy, strlcat - available on Darwin, FreeBSD, OpenBSD• Freeware source code versions available for down load

– Heavy-duty libraries• SafeStr

CSCI 3350 Lecture 10 - 62

Preventing Overflow (cont)

• With C++– Whenever possible use string class

• Overflows still possible if you use [ ]• If you need a c-style string for system call, recall a

member function exists for that purpose c_str( )

– Some better classes available e.g. Boost library• rope class

CSCI 3350 Lecture 10 - 63

Stack Protection by Compiler

• Some compilers use a “canary” to detect stack overflows– Place an unpredictable value on the stack, prior

to the return address– Prior to using the return address, check to see if

the canary has be overwritten• If so - abort, throw an exception, …

• StackGuard, propolice, Stack Shield, MS /GS switch

CSCI 3350 Lecture 10 - 64

Stack Protection by Compiler (cont)

• However, workarounds now exist– http://www.coresecurity.com/files/files/11/Stac

kguardPaper.pdf– http://www.phrack.org/phrack/56/p56-0x05

CSCI 3350 Lecture 10 - 65

Heap Smashing Attacks

• Possible in theory; difficult, but not impossible in practice– Attacker has to identify security critical

variables (akin to the criticality of the return address on the stack)• Difficult without source code

– Attacker has to find a buffer to overflow to rewrite the critical variable

CSCI 3350 Lecture 10 - 66

Guiding Principles for Software Security

• From Viega and McGraw1. Secure the weakest link2. Practice defense in depth3. Fail securely4. Follow the principle of least privilege5. Compartmentalize6. Keep it simple7. Promote privacy8. Remember hiding secrets is hard9. Be reluctant to trust10. Use your community resources

Secure the Weakest Link

• Example - physical security– Attacker take the path of least resistance

• Approach– List vulnerabilities by process area– Assign weakness ranking– Iteratively address the vulnerabilities weakest

first

CSCI 3350 Lecture 10 - 67

Practice Defense in Depth

• Example – perimeter defense– Originated as military concept

• Approach (NSA)– Identify potential adversaries, motivations and

classes of attack– List common classes of attack– Build the in-depth desfense by the common

classes

CSCI 3350 Lecture 10 - 68

Fail Securely

• Example – buffer overflow detected by canary

• Approach– Identify key checkpoint– Explore what happens if checkpoint fails

CSCI 3350 Lecture 10 - 69

Follow Principle of Least Privilege

• Example – Personnel security clearance• Approach

– Inventory privileges needed for operations– Review and restrict to minimum privilege

necessary to carry out the assignment

CSCI 3350 Lecture 10 - 70

Compartmentalize

• Example – Submarines are built with sealable compartments

• Approach– List security components– Determine coupling between components– Reduce couplings to the minimum need to carry

out assignment

CSCI 3350 Lecture 10 - 71

Keep It Simple

• Example – Only need to dial 3 digits for emergency help

• Approach– Reuse of code– Introduce common chokepoints

CSCI 3350 Lecture 10 - 72

Promote Privacy

• Example – Cookies used only with user permission

• Approach– Compile list of basic system components– Identify information revealed

• User• System / Server identification withheld

CSCI 3350 Lecture 10 - 73

Hiding Secrets is Hard

• Example – How quickly have various “protections” been broken, DeCSS → CSS

• Approach– Identify “secrets” present in the system– Identify adversaries– Assess risk– Address risk

CSCI 3350 Lecture 10 - 74

Be Reluctant to Trust

• Example – Social engineering ala Kevin Mitnick

• Approach– Identify trust relations in system

• Individuals• Other systems

– Log interactions with trustee– Review log

CSCI 3350 Lecture 10 - 75

Use Your Community Resources

• Example – Use encryption techniques that are peer reviewed and widely use

• Approach– Become aware of resources - NIST, SANS,

USCERT, CERIAS, Schneier on Security, ..– Regularly monitor your resources– Consult resource when your situation changes

CSCI 3350 Lecture 10 - 76

CSCI 3350 Lecture 10 - 77

References

• Any of Henry Ledgard “Proverbs” series• Robert Martin, Clean Code, Prentice Hall,

2009, ISBN 0-13-235088-2.• Brian W. Kernighan and Rob Pike, The

Practice of Programming, Addison-Wesley, 1999, ISBN 0-201-61586-X.

• Brian Snow, Future of Security, Panel presentation at IEEE Security and Privacy, May 1999.

CSCI 3350 Lecture 10 - 78

References

• John Viega and Gary McGraw, Building Secure Software, Addison-Wesley, 2003.

• John Viega and Matt Messier, Secure Programming Cookbook, O’Reilly, 2003.

• Mark Graff and Kenneth R. vanWyk, Secure Coding, O’Reilly, 2003.

• Aleph One, Smashing the Stack for Fun and Profit, Phrack 49, http://phrack.org/show.php?p=49&a=14.

top related