cs 5600 computer systems

Post on 28-Jan-2016

45 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

CS 5600 Computer Systems. Lecture 2 : Crash Course in C “CS 5600? That’s the class where you write 1 million lines of C code.”. Overview. C: A language written by Brian Kernighan and Dennis Ritchie UNIX was written in C Became the first "portable " language - PowerPoint PPT Presentation

TRANSCRIPT

CS 5600Computer Systems

Lecture 2: Crash Course in C“CS 5600? That’s the class where you

write 1 million lines of C code.”

2

Overview

• C: A language written by Brian Kernighan and Dennis Ritchie– UNIX was written in C– Became the first "portable" language

• C is a typed, imperative, call-by-value language• “The C programming language -- a language

which combines the flexibility of assembly language with the power of assembly language.”

3

Programming in C

• Four stages– Editing: Writing the source code by using some IDE or

editor– Preprocessing: Already available routines – Compiling: Translates source to object code for a specific

platform – Linking: Resolves external references and produces the

executable module • For the purposes of this class:– Editing (emacs or vim)– Compiling (gcc and make)

4

Example: Hello World in C

1: #include <stdio.h>2:3: int main(int argc, char *argv[]) {4: printf("Hello, world!\n");5: return 0;6: }

5

Line 1

1: #include <stdio.h>

• During compilation, the C compiler runs a program called the preprocessor– The preprocessor is able to add and remove code

from your source file• In this case, the directive #include tells the

preprocessor to include code from the file stdio.h• Equivalent to import in Python or Java

6

Line 3

3: int main(int argc, char *argv[]) {

• This statement declares the main function– A C program can contain many functions but must

have one main.• A function is a self-contained module of code that

can accomplish some task• The int specifies the return type of main– By convention, a status code is returned– 0 = success, any other value indicates an error

7

Line 4

4: printf("Hello, world!\n");

• printf is a function from a standard C library– Prints strings to standard output (usually the

terminal)– Included by #include <stdio.h>

• The compiler links code from libraries to the code you have written to produce the final executable

8

Line 5

5: return 0;

• Returns 0 from the current function– For main(), this is the return status of the program– Return value type must match the function

definition• No statements after the return will execute

9

Data Types, Constants, Variables

10

Data Types

• A variables data type determines– How it is represented internally by the hardware– How it may be legally manipulated (operations allowed on that data

type)– What values it may take on

• Constants and variables are classified into 4 basic types– Character: char (1 byte)– Short: short (2 bytes)– Integer: int (usually 4 bytes), long int (usually 8 bytes)– Floating Point: float (usually 4 bytes), double (8 bytes)

• All other data objects are built up from these fundamental types

11

char

• What is a character?– A member of the character set– ASCII: American Standard Code for Information

Interchange (128 characters)– Each character is internally represented using a

number (e.g: A is 65)• Recognized types, sizes and ranges:– char: 1 byte (-128 ≤ x ≤ 127)– unsigned char: 1 byte (0 ≤ x ≤ 255)

12

More about char

• ASCII character ‘2’ and the number 2 are not the same (2 != '2')– '2' is the character 2 (50 in ASCII)– 2 is the number 2– As a result, '2' = 50

• Escape Sequence– Special characters denoted by ‘\’ followed by characters or

hexadecimal code– \n new line– \t tab– \r carriage return

– \a alert– \\ backslash– \” double quote

13

short, int, long

• A whole or integral number, not containing a fractional part• Recognized types, size and ranges:

– short int: 2 bytes (-215 ≤ x ≤ 215 –1)– int: 4 bytes (-231 ≤ x ≤ 231 – 1, signed by default)– unsigned int: 4 bytes (0 ≤ x ≤ 232-1)– long long int: 8 bytes (-263 ≤ x ≤ 263 – 1, signed by default)

• Expressible as octal, decimal or hexadecimal– Integer starting with 0 will be interpreted in octal (e.g: 010 is 8)– Integer starting with 0x will be interpreted in hexadecimal (0x10 is 16)

• Negative numbers typically represented using two’s complement– What does short a = -7 look like in machine representation?

14

float, double

• A number which may include a fractional part– Representation is an estimate (although at times, it may be exact)– 2.13 can be represented exactly – 1.23456789 x 1028 does not fit into a float

• Hence number is truncated

• Recognized types, sizes and ranges– float: 4 bytes (1 sign bit, 8 exponent bits, 24 fraction bits)

• Range is -1e37 ≤ x ≤ 1e38

– double: 8 bytes (1 sign bit, 11 exponent bits, 53 fraction bits)• Range is -1e307 ≤ x ≤ 1e308

– long double: usually the same as double, sometimes 80 bits• You won’t be using floats or doubles in this class

15

Arithmetic Operators

• Order of precedence (highest to lowest)– Parenthesis– Multiplication, division, modulus– Addition, subtraction– Comparisons– Assignment

Assignment (=) a = bAddition (+) a + bSubtraction (-) a – bMultiplication (*) a * bDivision (/) a / bModulus (%) a % b

Equal to a == bNot equal to a != bLess than a < b< or = to a <= bIncrement a++Decrement a--

16

Bitwise Operators• ints can be viewed as collections of bits; C provides bitwise

operations– Bitwise AND (&) a & b – Bitwise OR (|) a | b – Bitwise NOT (~) ~a – Bitwise XOR (^) a ^ b – Bitwise left shift (<<) a << b – Bitwise right shift (>>) a >> b

• Examples:unsigned int a = 9; // 00001001unsigned int b = 3; // 00000011printf("9 & 3 is: %d\n", (a & b)); // 00000001printf("9 << 3 is: %d\n", (a << b)); // 01001000

17

Constants

• Constants are values written directly into program statements– Not stored in variables– Can assign constant values to variables

unsigned int x = 122;char x = 'c';

• Variables can also be set as constantsconst int x = 1234;x = 5; // illegal, won’t compile

18

Numerical Constants

• Integer constants– Default data type is int– If value exceeds int range, it then becomes long– Can force compiler to store value as long/unsigned using l/u

suffixes

long c = 0x0304lu;• Floating point constants

– Default type of floating point is double– Can force type float during compilation by using decimal (e.g., 1.0)– Can also use scientific notation

float x = 1.3e-20;

19

Variables in C• Variable declaration (e.g., int x = 7;)

– Associates data type with the variable name– Allocates memory of proper size– Cannot be re-declared within the same scope– Variable memory space initially contains junk, unless initialized

int x;printf(“%i\n”, x); // who knows what will print

• Variable names– Set of characters starting with [A-Z] or [a-z] or underscore – Period (.) is not allowed; no leading numbers; no reserved

keywords – Case sensitive

• Convention is lowercase used for variables and UPPERCASE for constants

20

Variable Types

• Character variables:char first = 'c';char second = 100;

• Integer variablesint a = 8;short unsigned int c = 2;

• Floating point variablesfloat g = 3.4;double e = -2.773;

21

Arrays• A data structure storing a collection of identical objects

– Allocated memory space is contiguous– Identified using a single variable– The size is equal to the number of elements and is constant

• Create an array called test_array which has 10 integer elementsint test_array[10];int my_array[2] = { 30, 10 };

• Array Elements– Referenced by name[index] (0-indexed; 2nd element is test_array[1] )– Array bounds not checked

int my_array[10];printf(“%i\n”, my_array[37]); // will compile and run, but may crash

my_array[123] = 7; // will compile and run, but is wrong

22

Type Casting

• What happens if you need to assign a variable of one type to a variable of a different type?

• By default, operands converted to the longest type– char converts to int, int converts to float, float converts

to double• Manual casting

int y = (int) 3.14 * x; // legaldouble z = (double) y; // legalchar * a = (char *) (3.14 * x); // won’t compile

23

Division and Modulus

• Dividing ints always results in a new intprintf(“%i\n”, 30 / 2); // 15printf(“%i\n”, 31 / 2); // 15printf(“%i\n”, 30 / 29); // 0

• Can force float division by castingprintf(“%f\n”, 31 / (float) 2); // 15.5

• Modulus operator yields remainder of divisionprintf(“%i\n”, 30 % 2); // 0printf(“%i\n”, 31 % 2); // 1

24

sizeof()

• Returns number of bytes required to store a specific data type.– Usage: int x = sizeof(<argument>);

• Works with types, variables, arrays, structures:– Size of an int sizeof(int); // 4– Size of a structure sizeof(struct foo);– Size of an array element: sizeof(test_array[0]);– Size of the entire array: sizeof(test_array);– Size of variable x: sizeof(x);

25

Pointers

26

Pointers

• Pointers are an address in memory– Includes variable addresses, constant addresses,

function addresses...• Pointer is a data type just like any other (int, float,

double, char)– On 32-bit machines, pointers are 4 bytes– On 64-bit machines, pointers are 8 bytes

• Pointers point to a particular data type– The compiler checks pointers for correct use just as it

checks int, float, etc.

27

Declaring Pointers

• Use * to denote a pointerint *ptrx; // pointer to data type intfloat *ft; // pointer to data type floatshort *st; // pointer to data type short

• Compiler associates pointers with corresponding data types– int * must point to an int, etc

28

Referencing/Dereferencing Pointers• How can you create a pointer to a variable?

– Use &, which returns the address of the argument

int y = 7;int *x = &y; // assigns the address of y to x

• How can you get the value pointed to?– Dereference the pointer using *– Go to the address which is stored in x, and return the value at that address

int y = 7; // y holds the value 7int *x = &y; // x is the memory address of yint z = *x; // z now holds the value 7(*a)++; // increments the value pointed to a *(a + 1); // accesses the value pointed to by the address (a + 1)

29

Pointer Quiz

int y = 10;int x = y;y++;x++;

• What is the value of y? 11int y = 10;int *x = &y;y++;(*x)++;

• What is the value of y? 12

30

Arrays and Pointers• Compiler associates the address of the array to/with the

nameint temp[34];

– Array name (temp) is the pointer to the first element of array• To access the nth element of the array:

– Address = starting address + n * size of element– Starting address = name of the array – Size of element = size of data type of array– array[n] de-references the value at the nth location in the array

int temp[10]; // Assume temp = 100 (memory address)temp[5] = *(100 + sizeof(int) x 5) = *(120)

31

Passing Arrays• Passing an array to a function passes a pointer

– Hence arrays are always passed by reference

int general (int size, int name []);//Expects a pointer to an int array

int general (int size, int *name);//Expects a pointer to an int

void foo(int a[]) {a[0] = 17;

}

int b[1] = { 5 };foo(b);

• What is the value of b[0]? 17

32

Functions and Pointers

• Functions must return a value of the declared type• Just like variables, functions can return a pointer

float *calc_area (float radius);• Function formal arguments may be of type

pointer:double calc_size (int *stars);

• For example, scanf takes in parameters as pointers:int scanf(const char *format, ...); // int*, int*scanf("%d%f", &x, &f);

33

Passing in Pointers

• Why pass a variable address at all and complicate functions?– By design we can return only one value– Sometimes we need to return back more than one

value• For example, consider scanf("%d %f", &x, &f); – Three values are returned (in x, f, and the return

value)• Pointers allows us to return more than one value

34

Pointer Arithmetic

• Pointers can be added to and also subtracted from– Pointers contain addresses

• Adding to a pointer goes to next specified location (dep. on data type)– <data type> *ptr;– ptr + d means ptr + d * sizeof (<data type>);

• For exampleint *ptr;

– ptr + 2 means ptr + 2*sizeof(int) which is ptr + 8char *ptr;

– ptr + d means ptr + 2*sizeof(char) which is ptr + 2

35

Example#include <stdio.h>void main () {

int *i;int j = 10;i = &j;printf ("address of j is : %p\n", i);printf ("address of j + 1 is : %p\n", i + 1);

}

• What is the output?$ ./a.outaddress of j is : 0xbffffa60address of j + 1 is : 0xbffffa64

• Note that j + 1 is actually 4 more than j

36

Strings

37

Character Strings

• A sequence of character constants such as “This is a string”– Each character is a character constant in a consecutive

memory block

• Each character is stored in ASCII, in turn is stored in binary – Character strings are actually character arrays

• A string constant is a character array whose elements cannot change

char *msg = "This is a string";

T H I S I S A S T R I N G \0

38

Strings as Arrays

char *msg = "This is a string !";• The variable msg is associated with a pointer to the first element

– msg is an array of 19 characters– \0 is also considered a character

• Appended to each string by the compiler• Used to distinguish strings in memory, acts as the end of the string• Also called the NULL character

• Character pointerschar *ptr;ptr = "This is a string";

– ptr is a char pointer containing the address of the first character (T)

39

String Functions

#include <string.h>int strcmp(char * a, char * b);

// return 0 if strings are identical// otherwise, return is non-zero

int strlen(char * a);// returns count of characters, not including null

char * strcpy(char * dst, char * src);// copies characters from b into a// returns the address of a// WARNING: a must have enough space to hold b!

40

Getting Numbers from Strings

#include <stdlib.h>int atoi(const char *a);

// converts an alphanumeric string to an // integer if possible (returns 0 on failure)

double atof(const char * a);// converts string to double if possible

int a = atoi("17"); //a is now 17double b = atof("89.29393"); //b is now 89.29393

41

Structures

42

Structures• An aggregate data type which contains a fixed number of components

– Declaration:

struct name {// a field// another field

};• For example

struct dob {int month;int day;int year;

};• Each dob has a month, day, and year (ints) inside it

43

Using Structures• Declare variables using struct keyword

– All internal variables are allocated space

struct dob d1, d2;• Access member values using ‘.’ notation

d1.day = 10; d2.year = 1976;printf("%d\n", d2.year);

• A structure can be assigned directly to another structurestruct dob d1, d2;d1.day = 10; d1.month = 12; d1.year = 1976;d2 = d1; // now d2 has the same values as d1 for its fields.

44

Operations on Structures

• Cannot check to see if two structures are alike directlystruct dob d1, d2;if (d1 == d2) // WRONG !!!

– To compare, we need to check every internal value• Cannot print structures directly

– Must print one field at a time• Pointers to structures use the ‘->’ notation

struct dob *d1 = (struct dob *) malloc(sizeof(struct dob));d1->year = 1976; d1->day = 26; (*d1).month = 6;

45

A Little More on Structures

• Can be initialized field by field or during declaration

struct dob d1 = {26, 6, 1976};• Can create arrays of structures

struct dob d1[10]; // array of 10 structs dob• … and access them in the usual manner

d1[1].day = 26; d1[1].month = 6;d1[1].year = 1976;

46

Making a struct Into a Type

• Type definition allows an alias for an existing type identifiertypedef <type> <name>;

• For examplestruct dob_s {

int day;int month;int year;

};typedef struct dob_s dob;

• Now you can simplydob my_dob;my_dob.year = 1984;

Alternate form:typedef struct dob_s {

int day;int month;int year;

} dob;

47

Memory Management

48

Memory Management

• Two different ways to look at memory allocation– Transparent to user

• Done by the compiler, OS

– User-defined• Memory allocated by the user

• Why allocate memory?– Required by program for permanent variables– Reserve working space for program

• Normally maintained using a stack and a heap

49

Stack

• Programs are typically given an area of memory called the stack– Stores local variables– Stores function parameters– Stores return addresses

• On x86, the stack grows downwards– Functions are responsible for allocating and

deallocating storage on the stack

50

Stack Frame Example

void foo(char * packet) { int offset = 0; char username[64]; int name_len;

name_len = packet[offset]; memcpy(&username, packet + offset + 1,name_len); …}

Callers return address

char username[]

int offset

int name_len

Stack

X

X-4

X-8

X-72

X-76

name_len name0 43

Packet

Chris

to

Wils

on

15

Bytes n

51

Local Variable Scoping

• Variables on the stack only exist when they are in scope– i.e. the function that created the variables is executing

struct mystruct *foo(int i) {struct mystruct;mystruct.i = 7;return &mystruct;

}• What happens if you try the program above?

– Can the caller access mystruct?

52

Heap

• A heap is a collection of memory manipulated but the user/system– No order specified for addition/removal

• Typically, the OS manages a pool of free heap memory– Applications may request allocations of memory

from this pool at runtime

53

Dynamic Memory Allocation• There are times when we don’t know how much memory is

required– Employee data base doesn’t know how many employees will exist– Program accepting command line input of different sizes– User requests more records to be created

printf("Add a new record y/n ?");scanf ("%c", &response);if (response == 'y') {

// we now have to allocate space}

• We use the heap to allocate memory dynamically

54

malloc()

#include <stdlib.h>void *malloc (size_t size); // size_t = unsigned int

• Returns a pointer to a contiguous block in memory of size bytes– For example

int *x = malloc(sizeof(int));• Returns a valid address, or NULL if an error occurs

– ALWAYS check the return value of malloc• Memory is allocated (automatically) on the heap

– Allocated space is not initialized; contains garbage

55

A void *?

• Why is void * returned?– malloc just allocates memory which consists of some number of

bytes– Memory as such has no data type– Programmer must associate a data type with the block of memory

• How to assign a data type to the memory just created?– Use type casting

double *x; // x is a pointer to doublex = (double *) malloc(100);

• How many doubles can we store? 100/8 = 12– We can clearly see a pit falls of malloc here

x = (double *) malloc(100 * sizeof(double)); // better

56

Freeing Memory

• You, the programmer, are in charge of freeing memory– Only dynamically allocated memory can be freed

• When finished using dynamically allocated memory, free it

#include <stdlib.h>void free (void *blk);

• Good practice to avoid memory leaks– Growing loss of memory due to unreleased locations– This will be crucial when developing your OS

57

Using free()char *x;x = (char *) malloc (10 * sizeof (char));free (x); // frees the memory just assigned

• Never free a memory block that is not dynamically allocated.int *x = 0;free (x); // error!!!

• Never double-freechar *x = (char *) malloc (10 * sizeof (char));free (x); free (x); // error!!!

• Never access freed memorychar *x = (char *) malloc (10 * sizeof (char));free (x); strcpy(x, "welcome"); // program may or may not crash

58

Keep Track of Your Pointers

• Dangling pointerschar *x, *y;x = (char *) malloc (10 * sizeof (int));y = (char *) malloc (10 * sizeof (int));x = y; // cannot access what was in x anymore

• Can easily lead to memory leaks

59

Writing Safe Code in an Unsafe Language

60

A Few Tips

• C is powerful because it gives you raw access to the system and to memory

• But, C provides much less help to the programmer than others– You can get yourself into trouble easily– C is very liberal with types (i.e. because of casting)

• Also, much harder to debug– Memory is just an array of bytes

• Programmer has to assign meaning

– You can overwrite memory, making the source of problems• A few tips will make your experience in this class easier

61

Buffer Overflows

char b[10]; b[10] = x;

• Array starts at index zero– So [10] is 11th element– One byte outside buffer was referenced– Program may or may not crash

• Off-by-one errors are common and can be exploitable!

62

Buffer Overflow Example

char username[]

int offset

int name_len

Stack

X

X-4

X-8

X-72

X-76

Chris

to

Wils

on

1572 (64 + 8)

void foo(char * packet) { int offset = 0; char username[64]; int name_len;

name_len = packet[offset]; memcpy(&username, packet + offset + 1,name_len); …}

Callers return address

name_len name0 43

PacketBytes n

Name data…

63

What Happens With an Overflow?

• If memory doesn't exist:– Bus error, crash

• If memory protection denies access:– Segmentation fault, crash– General protection fault, crash

• If access is allowed, memory next to the buffer can be accessed– Can overwrite the heap, stack– Worst of all options; won’t detect immediately

• Leads to weird, non-deterministic outcomes

– Can compromise your program

64

Preventing Buffer Overflows

void foo(int *array, int len);• Always pass array/buffer length with array– Don’t assume you know the length – Many library functions require you to do this already

int foo[3];for (int i=0; i<=3; i++) foo[i] = 0;

• Remember that last element of n-length array is n-1• Can happen to the best programmers

65

Problems With String Functions

char * strncpy(char * dst, const char * src, size_t len);– len is the maximum number of characters to copy – dst is NULL-terminated only if < len characters were copied!– All calls to strncpy must be followed by a NULL-termination

operation

int strlen(const char * str);– What happens when you call strlen on an improperly terminated

string?• strlen scans through memory until a null character is found• Can scan outside buffer if string is not null-terminated

– strlen is not safe to call unless you know string is NULL-terminated

66

The Worst String Operation

char * strcpy(char * dst, const char * src);• How can you use strcpy safely?– Set the last character of src to NULL• Even if string is shorter than the entire buffer

– Do not check according to strlen(src)!• Check that the src is smaller than or equal to dst– Or allocate dst to be at least equal to the size of src

67

Debugging

68

Debugging With gdb

• GDB is a debugger that helps you debug your program– Time you spend learning gdb will save you days of

debugging time• You need to compile with the -g option to use gdb– The -g option adds debugging information to your

program

$ gcc -g -o hello hello.c• Should be done automatically in all Makefiles we

give you

69

Running gdb• To run a program with gdb type

$ gdb name_of_program....(gdb)

• Then set a breakpoint in the main function– Marker in your program that will make the program stop – Return control back to gdb

(gdb) break main

• Now run your program– If your program has arguments, you can pass them after run

(gdb) run arg1 arg2 ... argN

70

Stepping Through Your Program

• Your program will start and then pause at maingdb>

• You have the following commands to run your program step by step

(gdb) step– It will run the next line of code and stop– If it is a function call, it will enter into it

(gdb) next– It will run the next line of code and stop– If it is a function call, it will go through it

• If the program is running without stopping, regain control CTRL-C

71

Setting Breakpoints

• You can set breakpoints in a program in multiple ways:

(gdb) break function_name– Set a breakpoint in a function

(gdb) break line_# – Set a break point at a line number in the current

file

(gdb) break file_name:line_#– Set a break point at a line number in a specific file

72

Inspecting the Stack

• The command(gdb) where

– Prints the current function being executed – And the chain of functions that are calling that function– This is also called the backtrace

• Example:(gdb) where#0 main () at test_mystring.c:22#1 test () at test_mystring.c:38(gdb)

73

Inspecting Variables

• To print the value of a variable(gdb) print variable_name

– Will automatically print char*s and arrays

(gdb) print i$1 = 5(gdb) print s1$1 = 0x10740 "Hello"(gdb) print stack[2]$1 = 56(gdb) print stack$2 = {0, 0, 56, 0, 0, 0, 0, 0, 0, 0}(gdb)

74

Catching Seg Faults• If your program seg faults, gdb will catch it

(gdb) runStarting program: /home/cbw/a.out test string

Program received signal SIGSEGV, Segmentation fault.0x4007fc13 in _IO_getline_info () from /lib/libc.so.6

(gdb) backtrace#0 0x4007fc13 in _IO_getline_info () from /lib/libc.so.6#1 0x4007fb6c in _IO_getline () from /lib/libc.so.6#2 0x4007ef51 in fgets () from /lib/libc.so.6#3 0x80484b2 in main (argc=1, argv=0xbffffaf4) at segfault.c:10#4 0x40037f5c in __libc_start_main () from /lib/libc.so.6

75

Valgrind

• Valgrind is a memory mismanagement detector– Use of uninitialized memory– Reading/writing memory after free()– Reading/writing outside bounds of malloc()ed

memory– Reading/writing inappropriate areas of the stack– Memory leaks

76

Finding Memory Leaks

#include <stdlib.h>int main() {

char *x = malloc(100);return 0;

}

$ valgrind --tool=memcheck --leak-check=yes example1==2330== 100 bytes in 1 blocks are definitely lost in loss record 1 of 1 ==2330== at 0x1B900DD0: malloc (vg_replace_malloc.c:131)==2330== by 0x804840F: main (example1.c:5)

77

Finding Invalid Pointer Use

#include <stdlib.h>int main() {

char *x = malloc(10);x[10] = 'a';return 0;

}

$ valgrind --tool=memcheck --leak-check=yes example2==9814== Invalid write of size 1 ==9814== at 0x804841E: main (example2.c:6)==9814== Address 0x1BA3607A is 0 bytes after a block of size 10 alloc'd==9814== at 0x1B900DD0: malloc (vg_replace_malloc.c:131) ==9814== by 0x804840F: main (example2.c:5)

78

Detecting Use of Uninitialized Variables

#include <stdio.h>int foo(int x) {

if(x < 10) printf("x is less than 10\n");}int main() {

int y;foo(y);return 0;

}==4827== Conditional jump or move depends on uninitialized value(s)==4827== at 0x8048366: foo (example4.c:5)==4827== by 0x8048394: main (example4.c:14)

top related