chapter 6 - data t

Chapter 6

Data Types

ISBN 0-321-33025-0

Chapter 6 Topics

• Introduction• Primitive Data Types• Character String Types• User-Defined Ordinal Types• Array TypesArray Types• Associative Arrays

R d T• Record Types• Union Types

Copyright © 2006 Addison-Wesley. All rights reserved. 1-2

• Pointer and Reference Types

Introduction

• A data type defines a collection of data values and a set of predefined operations on those objects

• Some data types are specified by type operatorswhich are used to form type expressions (‘[]’, ‘*’, ‘()’ in C for array, pointer, function, respectively)

• A descriptor is the collection of the attributes of a variable. It is used for type checking and by allocation and deallocation operations

• Design issue: What operations are provided for


es g ssue at ope at o s a e p o ded ovariables of the type and how are they specified?

Primitive Data Types

• Almost all programming languages provide a set of primitive data types– Primitive data types: Those not defined in terms yp

of other data types

• Some primitive data types are merely reflections• Some primitive data types are merely reflections of the hardware

• Others require little software support


Primitive Data Types: Integer

• Almost always an exact reflection of the hardware h i i i i lso the mapping is trivial

– Java: byte, short, int, long– Ada: SHORT INTEGER, INTEGER and LONG INTEGER

• The leftmost bit is set to indicate negative and the remainder of the bit string represents the absolute value of the number

• Most computers now use a notation called two’s complement to store negative integers, which is


complement to store negative integers, which is convenient for addition and subtraction

Primitive Data Types: Floating Point

• Model real numbers but only as approximations• Model real numbers, but only as approximations for most real values

f f l• Languages for scientific use support at least two floating-point types (e.g., float and double; sometimes more)

• Usually exactly like the hardware, but not alwaysy y , y


IEEE Floating Point Formats


Primitive Data Types: Decimal• For business applications

S fi d b f d i l di i• Store a fixed number of decimal digits • The representations of data types are called

binar coded decimal (BCD)binary coded decimal (BCD)– Two digits per byte

Ad t• Advantage– Accuracy

Di d• Disadvantages– The range of values is restricted because no

exponents are allowed


exponents are allowed – Wastes memory

Primitive Data Types: Boolean

• Simplest of allp• Range of values: two elements, one for “true”

and one for “false”and one for false• Could be implemented as bits, but often as

bytesbytes– Advantage: readability


Primitive Data Types: Character

• Stored as numeric codings• Most commonly used coding: ASCII

An alternative 16 bit coding: Unicode• An alternative, 16-bit coding: Unicode– Includes characters from most natural languages– Originally used in Java– C# and JavaScript also support Unicode


Character String Types

• Values are sequences of characters• Design issues:

– Is it a primitive type or just a special kind ofIs it a primitive type or just a special kind of array?

– Should the length of strings be static orShould the length of strings be static or dynamic?


Character String Types Operations

• Typical operations:– Assignment and copying– Comparison– Comparison– Catenation– Substring reference– Pattern matching


Character String Type in Certain Languages

• AdaN t i iti– Not primitive

– Assignment, comparison, catenation, substring referencereference

NAME1 : STRING(1..30);

NAME1 := NAME1 & NAME2; (catenation)NAME1 := NAME1 & NAME2; (catenation)NAME1(2:7) (substring reference)• CC

– Not primitive– Use char arrays and a library of functions that


– Use char arrays and a library of functions that provide operations

Character String Type in Certain Languages

• Java– Primitive via the String class (not arrays of char).

Objects cannot be changedl f h bl b– StringBuffer is a class for changeable string objects

• SNOBOL4 (a string manipulation language)– Primitive– Many operations, including elaborate pattern

himatchingLETTER = ‘abcdefghijklmnopqrstuvwxyz’;

WORDPAT BREAK(LETTER) SPAN(LETTER) WORD


WORDPAT = BREAK(LETTER) SPAN(LETTER) . WORD

TEXT WORDPAT

Character String Length Options• Static length string: FORTRAN 90, COBOL, Pascal,

J ’ i lJava’s String class, …– FORTRAN 90: CHARACTER (LEN = 15) NAME1, NAME2

Limited dynamic length string: C/C++• Limited dynamic length string: C/C++– In C-based language, ‘\0’ is used to indicate the

end of a string’s charactersend of a string s characters• Dynamic length string: SNOBOL4, Perl, JavaScript

It requires the overhead of dynamic storage– It requires the overhead of dynamic storage allocation and deallocation but provides maximum flexibility


flexibility • Ada supports all three string length options

Character String Implementation

• Static length: compile-time descriptor• Limited dynamic length: may need a run-time

descriptor for length (but not in C/C++)descriptor for length (but not in C/C++)• Dynamic length: need run-time descriptor

b d l k d l– Strings can be stored in a linked list, or – Strings can be stored completely in adjacent

storage cells


Character String Implementation

Static string Limited dyn. string Dyn. string

Length Max length Current lengthLength Max. length Current length

Address Current length Address

Address

Compile-time Run-time descriptor Run-timeCompile timedescriptor for static strings

Run time descriptor for limited dynamic strings

Run time descriptor for dynamic strings


User-Defined Ordinal Types

• An ordinal type is one in which the range of possible values can be easily associated with the set of positive integers

• Examples of primitive ordinal types in Java– integerinteger

– char

b l– boolean

• In many languages, users can define two kinds


of ordinal types: enumeration and subrange

Enumeration Types

• All possible values, which are named constants, are provided in the definition

• Ada:type DAYS is (Mon, Tue, Wed, Thu, Fri, Sat, Sun);

Design issues• Design issues– Is an enumeration constant allowed to appear in more

th t d fi iti d if h i th t fthan one type definition, and if so, how is the type of an occurrence of that constant checked?Are en meration al es coerced to integer?


– Are enumeration values coerced to integer?

Example

• Pascal - cannot reuse constants; they can be used for array subscripts for loops caseused for array subscripts, for loops, case selectors; no input or output; can be compared

program test;p g ;type DAYS = (Mon, Tue, Wed, Thu, Fri, Sat, Sun);var

day: DAYS;day: DAYS;begin

for day := Mon to Sun dowriteln('Hello, World');

day := Wed;if day <= Sat then …


ay aend.

Example

• C/C++, like Pascal, cannot reuse constants in a given referencing environment. Enumeration values implicitly converted to integer

void main() {

enum months {Jan = 1, Feb, Mar, Apr, May, Jun,

Jul, Aug, Sep, Oct, Nov, Dec};

enum months month;

printf("%d\n", month = Sep);


}

Example

• Ada - constants can be reused; distinguish with context or <type_name>’; can be used as in Pascal; can be input and output

type LETTERS is ( ‘A’ ‘B’ ‘C’ ‘D’ ‘E’ ‘F’ ‘G’ ‘H’( A , B , C , D , E , F , G , H ,

‘I’, ‘J’, ‘K’, ‘L’, ‘M’, ‘N’, ‘O’, ‘P’,‘Q’, ‘R’, ‘S’, ‘T’, ‘U’, ‘V’, ‘W’, ‘X’,‘Y’ ‘Z’ )‘Y’, ‘Z’ );

type VOWELS is (‘A’, ‘E’, ‘I’, ‘O’, ‘U’);…


for letter in VOWELS’(‘A’) .. VOWELS’(‘U’) loop

Evaluation of Enumerated Type

• Aid to readability, e.g., no need to code a color as a number

• Aid to reliability, e.g., compiler can check: y g p– Operations (don’t allow colors to be added)

No enumeration variable can be assigned a value– No enumeration variable can be assigned a value outside its defined rangeAd C# d J 5 0 ti t i bl– Ada, C#, and Java 5.0: enumeration type variables are not coerced into integer types (better than C )


C++)

Subrange Types• An ordered contiguous subsequence of an

ordinal typeordinal type– Example: 12 .. 18 is a subrange of integer

Ada’s design• Ada s designtype Days is (mon, tue, wed, thu, fri, sat, sun);

subtype Weekdays is Days range mon fri;subtype Weekdays is Days range mon..fri;

subtype Index is Integer range 1..100;

…

Day1: Days;

Day2: Weekday;


Day2 := Day1;

Subrange Evaluation

• Aid to readability– Make it clear to the readers that variables of

subrange can store only certain range of valuesg g

• ReliabilityAssigning a value to a subrange variable that is– Assigning a value to a subrange variable that is outside the specified range is detected as an errorerror


Array Types

• An array is an aggregate of homogeneous data elements in which an individual element is identified by its position in the aggregate,identified by its position in the aggregate, relative to the first element


Array Design Issues

• What types are legal for subscripts?• Are subscripting expressions range checked?• How many subscripts are allowed?• How many subscripts are allowed?• When does allocation take place?• Can array objects be initialized?• Are any kind of slices allowed?y


Array Indexing

• Indexing (or subscripting) is a mapping from indices to elementsarray name(index value)y_ _

• Index syntaxFORTRAN PL/I Ada use parentheses– FORTRAN, PL/I, Ada use parentheses

– Most other languages use brackets


Arrays Index (Subscript) Types

• FORTRAN, C: integer only• Pascal: any ordinal type (integer, Boolean, char,

enumeration)• Ada: integer or enumeration (includes Boolean

and char)and char)• Java: integer types only• C/C++, Perl, and Fortran do not specify range

checking


• Java, ML, C# specify range checking

Subscript Binding and Array Categories

• Static array: subscript ranges are statically bound and storage allocation is static (before run-time)– Advantage: execution efficiency (no dynamic

allocation)

• Fixed stack-dynamic array: subscript ranges are statically bound but the allocation is done atstatically bound, but the allocation is done at declaration elaboration time during execution


– Advantage: space efficiency

Subscript Binding and Array Categories (cont )(cont.)

• Stack-dynamic array: subscript ranges are dynamically bound and the storage allocation isdynamically bound and the storage allocation is dynamic (at run-time)

Advantage: flexibility (the size of an array need not– Advantage: flexibility (the size of an array need not be known until the array is to be used)Ada:– Ada:

procedure foo (size: integer) is

M: array (1.. size, 1 .. size) of realy ( , )

…

begin


…

end foo;

The compiler arranges for a pointer to M to reside at a


• The compiler arranges for a pointer to M to reside at a static offset from the frame pointer


• Fixed heap-dynamic array: storage binding is dynamic but fixed after allocation (i.e., binding is done when requested and storage is allocated q gfrom heap, not stack)– Example: In FORTRAN 90Example: In FORTRAN 90

INTEGER, ALLOCATABLE, ARRAY (:, :) :: MAT

ALLOCATE (MAT(10 NUMBER OF COLS))ALLOCATE (MAT(10, NUMBER_OF_COLS))

DEALLOCATE (MAT)


– Example: malloc/free (C), new/delete (C++)


• Heap-dynamic array: binding of subscript ranges and storage allocation is dynamic and can change any number of timesg y– Advantage: Flexibility (arrays can grow or shrink

during program execution)during program execution)– Example: In Perl

@ l h (" " " ")@alpha = ("a" .. "z");

push(@array, <element>);


pop(@array);


• C/C++ arrays that include static modifier are static

• C/C++ arrays without static modifier are fixed stack-dynamic

• Ada arrays can be stack-dynamic• Ada arrays can be stack dynamic• C/C++ provide fixed heap-dynamic arrays• C# includes a second array class ArrayList that

provides fixed heap-dynamic


• Perl and JavaScript support heap-dynamic arrays

Number of subscripts

• FORTRAN I allowed up to three• FORTRAN 77 allows up to seven

Others no limit• Others - no limit


Array Initialization• Some language allow initialization at the time of

t ll tistorage allocation– C, C++, Java, C#:

int list[] = {4, 5, 7, 83}

– FORTRAN 77INTEGER LIST(3)

DATA LIST /0, 5, 9/

– AdaSCORE : array (1..5) of Integer :=


(1 => 17, 3 => 10, others => 0);

Jagged Arrays

• A jagged matrix has rows with varying number of elements– They are made possible when multidimensioned y p

arrays actually appear as arrays of arrays

• Advantages• Advantages– It allows the rows to have different lengths,

ith t d ti t h l t th d f thwithout devoting space to holes at the ends of the rows


An example in C/C++

char days[][10] = char *days[] =

{"Sunday", "Monday",

"Tuesday", "Wednesday",

"Thursday", "Friday",

{"Sunday", "Monday",

"Tuesday", "Wednesday",

"Thursday", "Friday",Thursday , Friday ,

"Saturday"};

...

Thursday , Friday ,

"Saturday"};

...


days[2][3] == ’s’; days[2][3] == ’s’;

Slices

• A slice is some substructure of an array – It is a mechanism for referencing part of an array

as a unitas a unit

• Slices are only useful in languages that have array operationsarray operations


Slice Examples

• Fortran 90INTEGER MAT(1:3, 1:3), CUBE(1:3, 1:4, 1:4)

MAT(1:3, 2) MAT(2:3, 1:3)

CUBE(2, 1:3, 1:4) CUBE(1:3, 1:3, 2:3)


Arrays Operations

• APL provides the most powerful array processing operations for vectors (V) and matrixes (M) as well as unary operators

V reverses the elements of V M reverses the columns of M

M reverses the rows of M M transposes MM transposes MA + B whether A and B are scalar variables, vectors,


or matrixes

Arrays Operations (cont.)

• Ada allows – The assignment, arithmetic, relational operators

type ARY INT is array(1..6) of INTEGER;yp _ y( ) ;

flag : BOOLEAN;

crowd Group1 Group2 : ARY INT;crowd, Group1, Group2 : ARY_INT;

Group1 := (12, 17, -1, 3, -100, 5);

Group2 := (13 2 22 1 1242 12);Group2 := (13, -2, 22, 1, 1242, -12);

flag := Group1 <= Group2;

d G 1 G 2


crowd := Group1 + Group2;


– Logical operatorstype ARY_BOOL is array(1..4) of BOOLEAN;

Result, Answer1, Answer2 : ARY BOOL;, , _ ;

Answer1 := (TRUE, FALSE, TRUE, FALSE);

Answer2 := (TRUE, FALSE, FALSE, TRUE);Answer2 : (TRUE, FALSE, FALSE, TRUE);

Result := Answer1 and Answer2;



– Catenationdeclare

type vector is array(positive range <>) of

integer;

a : vector (1..10);

b : vector (1..5) := (1, 2, 3, 4, 5);

c : vector (1..5) := (6, 7, 8, 9, 10);

begin

a := b & c;


end;


• Fortran 90 has a very rich set of array operationsA f th b ilt i ith ti t ill t k– Any of the built-in arithmetic operators will take arrays as operands; the result is an array, of the same shape as the operands, whose elements aresame shape as the operands, whose elements are the result of applying the operator to correspon-ding elements

– Slices of the same shape can be intermixed in array operations, even if the arrays from which they were sliced have very different shapesthey were sliced have very different shapes

• FORTRAN 90 also includes library functions for matrix multiplication matrix transpose and


matrix multiplication, matrix transpose, and vector dot product

Implementation of Arrays

• Access function maps subscript expressions to an address in the arrayan address in the array

• A single-dimensioned array is a list of adjacent memor cellsmemory cells

• Hardware memory is linear – it is usually a simple f b t S l t f ltisequence of bytes. So elements of multi-

dimensional arrays must be mapped onto the single-dimensioned memorysingle-dimensioned memory

• Row-major or column-major orderFortran uses column major order; most other


– Fortran uses column-major order; most other languages use row-major order

Implementation of Arrays (cont.)

• The difference between row- and column-major l t b i t t f th tlayout can be important for programs that use nested loops to access all the elements of a large multi dimensional arraylarge, multi-dimensional array– If a small array is accessed frequently, all or most

of its elements are likely to remain in the cacheof its elements are likely to remain in the cache– For a large array, each miss will bring into the

cache not only the desired element but the nextcache not only the desired element, but the next several elements as well If elements are accessed across cache lines then If elements are accessed across cache lines, then

almost every access will result in a cache missCopyright © 2006 Addison-Wesley. All rights reserved. 1-48

Compile-Time Descriptors


Single-dimensioned array Multi-dimensional array




• Single-dimensioned array:address(a[k]) =

address(a[lower_bound]) + ((k - lower_bound) * _ _

element_size)

• Double-dimensioned array (row-major order):Double dimensioned array (row major order):address(a[i, j]) =

address(a[ro lb col lb]) + ((i ro lb) * N +address(a[row_lb, col_lb]) + ((i - row_lb) * N +

(j - col_lb)) * element_size

h N i b f lCopyright © 2006 Addison-Wesley. All rights reserved. 1-51

where N is a number of column

Associative Arrays

• An associative array is an unordered collection f d t l t th t i d d b lof data elements that are indexed by an equal

number of values called keys h l f f• Each element of an associative array is a pair of

entities: <key, value> • The design issues that are specific for

associative arrays are: – What is the form of references to elements? – Is the size of an associative array static or


dynamic?

Associative Arrays - Perl

• In Perl, associative arrays are often called hashes, because in the implementation their elements are stored and retrieved with hash functions

• The size of a Perl hash is dynamic: It grows• The size of a Perl hash is dynamic: It grows when a new element is added and shrinks when

l t i d l t d d l it i ti d ban element is deleted, and also it is emptied by assignment of the empty literal


Associative Arrays – Perl (cont.)

• Structure and Operations Hash variables begin with %– Hash variables begin with %

– Scalar variable names begin with $%fruit = (“apples” => 3, “oranges” => 6);

– Subscripting is done using braces and keys$fruit{“apples”}

– Elements can be added with the assignment statementElements can be added with the assignment statement$fruit{“peach”} = 9;

– Elements can be removed with deletedelete $fruit{“peach”};

– The entire hash can he emptied by assigning the empty literal to it


te a to t@fruit = ();

Record Types

• A record is a possibly heterogeneous aggregate of data elements in which the individual elements are identified by namesy

• Design issues:What is the syntactic form of references to the– What is the syntactic form of references to the field?

ll l f ll d– Are elliptical references allowed?


Definition of Records in COBOL

• COBOL uses level numbers to show nested records

01 EMP-REC.

02 EMP-NAME.02 EMP NAME.

05 FIRST PIC X(20).

05 MID PIC X(10).( )

05 LAST PIC X(20).

02 HOURLY-RATE PIC 99V99.


Definition of Records in Ada• In Ada:type DATE istype DATE is

recordMonth : INTEGER range 1..12;Day : INTEGER range 1..31;Year : INTEGER range 1776..2100;

end record;;type PERSON is

recordName : STRING(1 15);Name : STRING(1..15);Birthday : DATE;Age : INTEGER;


Sex : CHARACTER;end record;

References to Records• Most language use dot notation• Fully qualified references must include all recordFully qualified references must include all record

names– COBOL:

MID OF EMP-NAME OF EMP-REC– Ada

Student: PERSON; … Student.Birthday.Year …• Elliptical references allow leaving out record

names as long as the reference is unambiguous– COBOLCOBOL

FIRST, FIRST OF EMP-NAME, and FIRST OF EMP-RECare elliptical references to the employee’s first name– Ada


Adawith Student do Name := ‘Michael’;

Operations on Records

• Assignment is very common if the types are identical

• Ada allows record comparisonAda allows record comparison• COBOL provides MOVE CORRESPONDING

f ld f h d h– Copies a field of the source record to the corresponding field in the target record


Example01INPUT-RECORD.

02 NAME.05 FIRST PICTURE IS X(20).05 MIDDLE PICTURE IS X(20).05 LAST PICTURE IS X(20).

02 EMPLOYEE NUMBER PICTURE IS 9(10)02 EMPLOYEE-NUMBER PICTURE IS 9(10).02 HOURLY-RATE PICTURE IS 99.

01OUTPUT-RECORD.02 NAME02 NAME.

05 FIRST PICTURE IS X(20).05 MIDDLE PICTURE IS X(20).05 LAST PICTURE IS X(20).( )

02 EMPLOYEE-NUMBER PICTURE IS 9(10).02 GROSS-PAY PICTURE IS 999V99.02 NET-PAY PICTURE IS 999V99.


…MOVE CORRESPONDING INPUT-RECORD TO OUTPUT-RECORD.

Implementation of record types

A compile-time descriptor for a record


Memory layout – an example in Pascal

type element = record type element = packed record

name: two_chars;

atomic_number: integer;

atomic weight: real;

name: two_chars;

atomic_number: integer;

atomic weight: real;atomic_weight: real;

metallic: Boolean

end;

atomic_weight: real;

metallic: Boolean

end;


Evaluation and Comparison to Arrays

• Arrays are used when all the data values have the same type and are processed in the same way

• Records are used when the collection of dataRecords are used when the collection of data values is heterogeneous and the different fields are not processed in the same wayare not processed in the same way

• Access to array elements is much slower than access to record fields, because subscripts are dynamic (field names are static)


Unions Types

• A union is a type whose variables are allowed to store different type values at different times during executiong

• Design issues Should type checking be required?– Should type checking be required?

– Should unions be embedded in records?


Discriminated vs. Free Unions

• Fortran, C/C++ provide union constructs in hi h h i l fwhich there is no language support for type

checking; the union in these languages is called ffree union

• Type checking of unions require that each union include a type indicator called a tag (or discriminant– Supported by ALGOL 68, Ada

• Pascal provides both discriminated and non-


pdiscriminated unions

Pascal Union Types

• In Pascal (and Ada), the discriminated union is called a record variant, or variant part of a record

type intreal =

record tagg : Boolean of

true : (blint : integer);

false : (blreal : real);

end;

• Problem with Pascal’s design: type checking is ineffective


ineffective

Pascal Union Types (con.)

– User can create inconsistent unions (because the tag can be individually assigned)

var x : intreal; y : real;

(1) x.tagg := true; { it is an integer }

(2) x.blint := 47; { ok }

(3) x.tagg := false; { it is a real }

(4) y := x.blreal; { assigns an integer to ( ) y ; { g g

real}

– The tagg is optional. Now, the declaration and


gg p ,the second and last assignment cause trouble

Ada Union Types

type Shape is (Circle, Triangle, Rectangle);type Colors is (Red, Green, Blue);type Co o s s ( ed, G ee , ue);type Figure (Form: Shape := Rectangle) is record

Filled: Boolean;Color ColorsColor: Colors;case Form : Shape is

when Circle => Diameter: Float;when Triangle =>

Leftside, Rightside: Integer;Angle: Float;Angle: Float;

when Rectangle => Side1, Side2: Integer;end case;d d


end record;

Ada Union Type Illustrated

color

form: tag (discriminant)rectangle: side1, side2

filledcolor

circle: diameter

triangle: leftside, rightside, angle

A discriminated union of three shape variables


Ada Union Types (cont.)

• Reasons Ada union types are safer than Pascal– Variant records must always have a discriminant– The discriminant must also appear in the header

of the record’s declaration and have default value– It is impossible for the user to create an incon-

cistent union. The assignment can occur either• Via whole-record assignment (e.g., A := B, where A

d B i t d )and B are variant records), or• Via assignment of an aggregate (e.g., A := {Filled => True Color => Red Form => Circle


=> True, Color => Red, Form => Circle,

Diameter => 5.5 }; )

Implementation of Union Types

• Discriminated unions are implemented by simply i h dd f ibl iusing the same address for every possible variant

– Sufficient storage for the largest variant is ll dallocated

• At compile time, the complete description of each variant must be stored– Using a case table with the tag entry in the

descriptor – The case table has an entry for each variant, which


points to a descriptor for that particular variant

Example - Ada

type NODE(TAG: BOOLEAN) isrecordrecord

case TAG iswhen true COUNT: INTEGER;when false SUM: FLOAT;

end case;

end record;end record;

Discriminated union

COUNTINTEGER

NameType

BOOLEAN

Offset

TAG

Case tableSUM

FLOATNameType


Address true

false

FLOAT Type

Evaluation of Unions

• Potentially unsafe construct in most languages (not Ada)– Do not allow type checkingDo not allow type checking

• Oberon, Modula-3, Java, and C# do not support iunions

– Reflective of growing concerns for safety in programming language


Set Types

• A set is a type whose variables can store unordered collections of distinct values fromunordered collections of distinct values from some ordinal typeDesign Iss e• Design Issue:– What is the maximum number of elements in any

set type?set type?• Among the common imperative languages, only

Pascal includes sets as a data typePascal includes sets as a data type– The maximum size of Pascal base sets is

compilation dependent (< 100)


compilation dependent (< 100) – Operations: set union, set intersection, set equality

Sets - Pascal

type

colors = (red, blue, green, yellow, orange, white, black);

colorset = set of colors;colorset = set of colors;

var

set1, set2: colorset;

…

set1 := [red, blue, green];

t2 [ hit bl k]set2 := [white, black];


Implementation of Set Types

• Sets are usually stored as bit strings in memory• Example: If a set has the ordinal base type

[‘a’ .. ‘p’] then variables of this set type can use the first 16 bits of a machine word, with each set bit: 1: representing a present element 0: representing an absent element0: representing an absent element

• The set value [‘a’, ‘c’, ‘h’, ‘o’] would be t d 1010000100000010


represented as 1010000100000010

Pointer and Reference Types

• A pointer type variable has a range of values that consists of memory addresses and a special value, nil

• Provide the power of indirect addressingP id t d i• Provide a way to manage dynamic memory– A pointer can be used to access a location in the

area where storage is dynamically created (usually called a heap)


Design Issues of Pointers

• What are the scope of and lifetime of a pointer variable?

• What is the lifetime of a heap-dynamic variable?• Are pointers restricted as to the type of value to

which they can point?c t ey ca po t• Are pointers used for dynamic storage

management indirect addressing or both?management, indirect addressing, or both?• Should the language support pointer types,

f t b th?Copyright © 2006 Addison-Wesley. All rights reserved. 1-78

reference types, or both?

Pointer Operations• Two fundamental operations: assignment and

d f idereferencing• Assignment is used to set a pointer variable’s

l f l ddvalue to some useful address• Dereferencing yields the value stored at the

location represented by the pointer’s value– Dereferencing can be explicit or implicit– C/C++ uses an explicit operation via * operatorj = *ptr;


sets j to the value located at ptr

Pointer Assignment Illustrated

The assignment operation j = *ptr


g p j p

Problems with Pointers

• Dangling pointers: A pointer points to a heap-dynamic variable that has been deallocated

1 Allocate a heap-dynamic variable and set a1. Allocate a heap dynamic variable and set a pointer to point at it

2 Set a second pointer to the value of the first2. Set a second pointer to the value of the first pointer

3 D ll h h d i i bl i3. Deallocate the heap-dynamic variable, using the first pointer


Problems with Pointers (cont.)

• Lost heap-dynamic variable: An allocated heap-dynamic variable that is no longer accessible to the user program (often called garbage)p g g g– Pointer p1 is set to point to a newly created heap-

dynamic variabledynamic variable– Pointer p1 is later set to point to another newly

created heap dynamic variablecreated heap-dynamic variable


Pointers - Pascal

• Used for dynamic storage management and indirect addressing

• Explicit dereferencing (postfix ^)Explicit dereferencing (postfix )• Dangling pointers are possible (dispose)• Dangling objects are also possible


Example

var vari: integer;p: ^integer;

begin

p, q: ^integer;

begin

( )begini := 1;p := @i;^ 2

new(p);

p^ := 3;

q = p;p^ := 2;writeln(i); 2

end.

q p;

writeln(p^);

dispose(p);

{q: dangling pointer}

end.


Pointers in Ada

• Some dangling pointers are disallowed because dynamic objects can be automatically deallocated at the end of pointer's scopedeallocated at the end of pointer s scope

• All pointers are initialized to null• The lost heap-dynamic variable problem is not

eliminated by Ada


Pointers in C and C++

• Extremely flexible but must be used with care• Used for dynamic storage management and

indirect addressing• Pointer arithmetic is possible• Explicit dereferencing and address-of operators• void * can point to any type and can be type

checked (cannot be explicitly dereferenced)• Used for parameters

– Advantages of both pass-by-reference and pass-


g p y pby-value

Pointers in Fortran 95

• Pointers point to heap and non-heap variables• Implicit dereferencing• Dangling pointers are possible (DEALLOCATE)• Dangling pointers are possible (DEALLOCATE)• Pointers can only point to variables that have the TARGET attribute– The TARGET attribute is assigned in the

declaration:INTEGER, TARGET :: <variable>


,

Example

real, target :: a, b INTEGER, DIMENSION(:),

real, pointer :: pa

pa => a

ALLOCATABLE :: a !rank 1

…

ALLOCATE(a(100))…

b = pa * c

…

ALLOCATE(a(100))

…

DEALLOCATE(a)

pa = 1.2345

print *, 'a = ', a


Reference Types

• C++ includes a special kind of pointer type ll d f t th t i d i il fcalled a reference type that is used primarily for

formal parametersA C f i i h i l– A C++ reference type is a pointer that is always implicitly dereferenced It t b i iti li d ith th dd f It must be initialized with the address of some

variables in its definition After initialization can never be set to reference After initialization can never be set to reference

any other variable The implicit dereference prevents assignment to


gthe address of a reference variable

Reference Types (cont.)

• Java extends C++’s reference variables and allows them to replace pointers entirely– No pointer arithmetic– Can only point at objects (which are all on the heap)– No explicit deallocator (garbage collection is used)No explicit deallocator (garbage collection is used),

that means there can be no dangling referencesDereferencing is always implicit– Dereferencing is always implicit

• C# includes both the references of Java and the


pointers of C++

Implementation of Pointer and Reference TypesReference Types

• Representation of pointers and references– Large computers use single values– Intel microprocessors use segment and offset

• Dangling pointer problem1 Tombstone: extra heap cell that is a pointer to1. Tombstone: extra heap cell that is a pointer to

the heap-dynamic variable The actual pointer variable points only at The actual pointer variable points only at

tombstones When heap-dynamic variable deallocated,


When heap dynamic variable deallocated, tombstone remains but set to nil

Implementation of Pointer and Reference Types (cont )Reference Types (cont.)


Implementation of Pointer and Reference Types (cont )Reference Types (cont.)2. Locks and keys: Pointer values are represented as

k dd i<key, address> pairs Heap-dynamic variables are represented as variable

l ll f i t l k lplus cell for integer lock value When heap-dynamic variable allocated, lock value is

created and placed in lock cell and key cell of pointercreated and placed in lock cell and key cell of pointer When a heap-dynamic variable is deallocated

explicitly its lock value is cleared Then although aexplicitly, its lock value is cleared. Then, although a pointer has the same address value but its key value will no longer match the lock, so the access will not be


g ,allowed

Evaluation of Pointers and References

• Dangling pointers and dangling objects are problems

• Pointers are like goto's - they widen the rangePointers are like goto s they widen the range of cells that can be accessed by a variableP i t f f• Pointers or references are necessary for dynamic data structures - so we can't design a language without them


Garbage Collection

• Problems with garbage collection– Many languages leave it up to the programmer to

design without garbage creation - this is very hard– Others arrange for automatic garbage collection

• Garbage collection with reference countsGarbage collection with reference counts– Does not work for circular structures

W k f i– Works great for strings– Should also work to collect unneeded tombstones



Garbage Collection (cont.)

• Mark-and-sweep– Achieved successfully in Ada, Java, ML, …– It processes in three main steps

1. The collector walks through the heap, tentatively marking every block as “useless”

2. Beginning with all pointers outside the heap, the collector recursively explores all linked data structures in the program marking each newly discovered blockin the program, marking each newly discovered block as “useful”

3 The collector again walks through the heap moving


3. The collector again walks through the heap, moving every block that is still marked “useless” to the free list


– Several potential problems: The collector be able to tell where every “in-use”

block begins and ends In a language with variable-size heap blocks, every

block must begin with an indication of its size The collector must be able in Step 2 to find the

pointers contained within each block


Garbage Collection (cont.)• Garbage collection with pointer reversal

Th l ti t (St 2) f k d– The exploration step (Step 2) of mark-and-sweep collection is naturally recursive The implementation needs a stack whose maximum The implementation needs a stack whose maximum

depth is proportional to the longest chain through the heap

In practice, the space for this stack may not be available: after all, we run garbage collection when we’re about to run out of space!

– As the collector explores the path to a given block, it th i t it f ll th t h


it reverses the pointers it follows, so that each points back to the previous block



• Stop-and-Copy– System divide the heap into two regions of equal size.

All allocation happens in the first half. When this half is ( l ) f ll th ll t b i it l ti(nearly) full, the collector begins its exploration

– Each reachable block is copied into the second half of the heap Any other pointer that refers to the samethe heap. Any other pointer that refers to the same block is set to point to the new locationWhen the collector finishes its exploration all useful– When the collector finishes its exploration, all useful objects have been moved (and compacted) into the second half of the heap, and nothing in the first half is


p, gneeded anymore

Summary• The data types of a language are a large part of what

determines that language’s style and usefulnessdetermines that language s style and usefulness• The primitive data types of most imperative

languages include numeric character and Booleanlanguages include numeric, character, and Boolean types

• The user-defined enumeration and subrange types• The user-defined enumeration and subrange types are convenient and add to the readability and reliability of programsy p g

• Arrays and records are included in most languages• Pointers are used for addressing flexibility and to


Pointers are used for addressing flexibility and to control dynamic storage management

chapter 6 - data t

Documents