chapter 6 - data t

51
Chapter 6 Data Types ISBN 0-321-33025-0 Chapter 6 Topics • Introduction • Primitive Data Types • Character String Types • User-Defined Ordinal Types Array Types Array Types • Associative Arrays R dT Record Types • Union Types Copyright © 2006 Addison-Wesley. All rights reserved. 1-2 • Pointer and Reference Types

Upload: dieu-anh

Post on 26-May-2017

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 6 - Data T

Chapter 6

Data Types

ISBN 0-321-33025-0

Chapter 6 Topics

• Introduction• Primitive Data Types• Character String Types• User-Defined Ordinal Types• Array TypesArray Types• Associative Arrays

R d T• Record Types• Union Types

Copyright © 2006 Addison-Wesley. All rights reserved. 1-2

• Pointer and Reference Types

Page 2: Chapter 6 - Data T

Introduction

• A data type defines a collection of data values and a set of predefined operations on those objects

• Some data types are specified by type operatorswhich are used to form type expressions (‘[]’, ‘*’, ‘()’ in C for array, pointer, function, respectively)

• A descriptor is the collection of the attributes of a variable. It is used for type checking and by allocation and deallocation operations

• Design issue: What operations are provided for

Copyright © 2006 Addison-Wesley. All rights reserved. 1-3

es g ssue at ope at o s a e p o ded ovariables of the type and how are they specified?

Primitive Data Types

• Almost all programming languages provide a set of primitive data types– Primitive data types: Those not defined in terms yp

of other data types

• Some primitive data types are merely reflections• Some primitive data types are merely reflections of the hardware

• Others require little software support

Copyright © 2006 Addison-Wesley. All rights reserved. 1-4

Page 3: Chapter 6 - Data T

Primitive Data Types: Integer

• Almost always an exact reflection of the hardware h i i i i lso the mapping is trivial

– Java: byte, short, int, long– Ada: SHORT INTEGER, INTEGER and LONG INTEGER

• The leftmost bit is set to indicate negative and the remainder of the bit string represents the absolute value of the number

• Most computers now use a notation called two’s complement to store negative integers, which is

Copyright © 2006 Addison-Wesley. All rights reserved. 1-5

complement to store negative integers, which is convenient for addition and subtraction

Primitive Data Types: Floating Point

• Model real numbers but only as approximations• Model real numbers, but only as approximations for most real values

f f l• Languages for scientific use support at least two floating-point types (e.g., float and double; sometimes more)

• Usually exactly like the hardware, but not alwaysy y , y

Copyright © 2006 Addison-Wesley. All rights reserved. 1-6

Page 4: Chapter 6 - Data T

IEEE Floating Point Formats

Copyright © 2006 Addison-Wesley. All rights reserved. 1-7

Primitive Data Types: Decimal• For business applications

S fi d b f d i l di i• Store a fixed number of decimal digits • The representations of data types are called

binar coded decimal (BCD)binary coded decimal (BCD)– Two digits per byte

Ad t• Advantage– Accuracy

Di d• Disadvantages– The range of values is restricted because no

exponents are allowed

Copyright © 2006 Addison-Wesley. All rights reserved. 1-8

exponents are allowed – Wastes memory

Page 5: Chapter 6 - Data T

Primitive Data Types: Boolean

• Simplest of allp• Range of values: two elements, one for “true”

and one for “false”and one for false• Could be implemented as bits, but often as

bytesbytes– Advantage: readability

Copyright © 2006 Addison-Wesley. All rights reserved. 1-9

Primitive Data Types: Character

• Stored as numeric codings• Most commonly used coding: ASCII

An alternative 16 bit coding: Unicode• An alternative, 16-bit coding: Unicode– Includes characters from most natural languages– Originally used in Java– C# and JavaScript also support Unicode

Copyright © 2006 Addison-Wesley. All rights reserved. 1-10

Page 6: Chapter 6 - Data T

Character String Types

• Values are sequences of characters• Design issues:

– Is it a primitive type or just a special kind ofIs it a primitive type or just a special kind of array?

– Should the length of strings be static orShould the length of strings be static or dynamic?

Copyright © 2006 Addison-Wesley. All rights reserved. 1-11

Character String Types Operations

• Typical operations:– Assignment and copying– Comparison– Comparison– Catenation– Substring reference– Pattern matching

Copyright © 2006 Addison-Wesley. All rights reserved. 1-12

Page 7: Chapter 6 - Data T

Character String Type in Certain Languages

• AdaN t i iti– Not primitive

– Assignment, comparison, catenation, substring referencereference

NAME1 : STRING(1..30);

NAME1 := NAME1 & NAME2; (catenation)NAME1 := NAME1 & NAME2; (catenation)NAME1(2:7) (substring reference)• CC

– Not primitive– Use char arrays and a library of functions that

Copyright © 2006 Addison-Wesley. All rights reserved. 1-13

– Use char arrays and a library of functions that provide operations

Character String Type in Certain Languages

• Java– Primitive via the String class (not arrays of char).

Objects cannot be changedl f h bl b– StringBuffer is a class for changeable string objects

• SNOBOL4 (a string manipulation language)– Primitive– Many operations, including elaborate pattern

himatchingLETTER = ‘abcdefghijklmnopqrstuvwxyz’;

WORDPAT BREAK(LETTER) SPAN(LETTER) WORD

Copyright © 2006 Addison-Wesley. All rights reserved. 1-14

WORDPAT = BREAK(LETTER) SPAN(LETTER) . WORD

TEXT WORDPAT

Page 8: Chapter 6 - Data T

Character String Length Options• Static length string: FORTRAN 90, COBOL, Pascal,

J ’ i lJava’s String class, …– FORTRAN 90: CHARACTER (LEN = 15) NAME1, NAME2

Limited dynamic length string: C/C++• Limited dynamic length string: C/C++– In C-based language, ‘\0’ is used to indicate the

end of a string’s charactersend of a string s characters• Dynamic length string: SNOBOL4, Perl, JavaScript

It requires the overhead of dynamic storage– It requires the overhead of dynamic storage allocation and deallocation but provides maximum flexibility

Copyright © 2006 Addison-Wesley. All rights reserved. 1-15

flexibility • Ada supports all three string length options

Character String Implementation

• Static length: compile-time descriptor• Limited dynamic length: may need a run-time

descriptor for length (but not in C/C++)descriptor for length (but not in C/C++)• Dynamic length: need run-time descriptor

b d l k d l– Strings can be stored in a linked list, or – Strings can be stored completely in adjacent

storage cells

Copyright © 2006 Addison-Wesley. All rights reserved. 1-16

Page 9: Chapter 6 - Data T

Character String Implementation

Static string Limited dyn. string Dyn. string

Length Max length Current lengthLength Max. length Current length

Address Current length Address

Address

Compile-time Run-time descriptor Run-timeCompile timedescriptor for static strings

Run time descriptor for limited dynamic strings

Run time descriptor for dynamic strings

Copyright © 2006 Addison-Wesley. All rights reserved. 1-17

User-Defined Ordinal Types

• An ordinal type is one in which the range of possible values can be easily associated with the set of positive integers

• Examples of primitive ordinal types in Java– integerinteger

– char

b l– boolean

• In many languages, users can define two kinds

Copyright © 2006 Addison-Wesley. All rights reserved. 1-18

of ordinal types: enumeration and subrange

Page 10: Chapter 6 - Data T

Enumeration Types

• All possible values, which are named constants, are provided in the definition

• Ada:type DAYS is (Mon, Tue, Wed, Thu, Fri, Sat, Sun);

Design issues• Design issues– Is an enumeration constant allowed to appear in more

th t d fi iti d if h i th t fthan one type definition, and if so, how is the type of an occurrence of that constant checked?Are en meration al es coerced to integer?

Copyright © 2006 Addison-Wesley. All rights reserved. 1-19

– Are enumeration values coerced to integer?

Example

• Pascal - cannot reuse constants; they can be used for array subscripts for loops caseused for array subscripts, for loops, case selectors; no input or output; can be compared

program test;p g ;type DAYS = (Mon, Tue, Wed, Thu, Fri, Sat, Sun);var

day: DAYS;day: DAYS;begin

for day := Mon to Sun dowriteln('Hello, World');

day := Wed;if day <= Sat then …

Copyright © 2006 Addison-Wesley. All rights reserved. 1-20

ay aend.

Page 11: Chapter 6 - Data T

Example

• C/C++, like Pascal, cannot reuse constants in a given referencing environment. Enumeration values implicitly converted to integer

void main() {

enum months {Jan = 1, Feb, Mar, Apr, May, Jun,

Jul, Aug, Sep, Oct, Nov, Dec};

enum months month;

printf("%d\n", month = Sep);

Copyright © 2006 Addison-Wesley. All rights reserved. 1-21

}

Example

• Ada - constants can be reused; distinguish with context or <type_name>’; can be used as in Pascal; can be input and output

type LETTERS is ( ‘A’ ‘B’ ‘C’ ‘D’ ‘E’ ‘F’ ‘G’ ‘H’( A , B , C , D , E , F , G , H ,

‘I’, ‘J’, ‘K’, ‘L’, ‘M’, ‘N’, ‘O’, ‘P’,‘Q’, ‘R’, ‘S’, ‘T’, ‘U’, ‘V’, ‘W’, ‘X’,‘Y’ ‘Z’ )‘Y’, ‘Z’ );

type VOWELS is (‘A’, ‘E’, ‘I’, ‘O’, ‘U’);…

Copyright © 2006 Addison-Wesley. All rights reserved. 1-22

for letter in VOWELS’(‘A’) .. VOWELS’(‘U’) loop

Page 12: Chapter 6 - Data T

Evaluation of Enumerated Type

• Aid to readability, e.g., no need to code a color as a number

• Aid to reliability, e.g., compiler can check: y g p– Operations (don’t allow colors to be added)

No enumeration variable can be assigned a value– No enumeration variable can be assigned a value outside its defined rangeAd C# d J 5 0 ti t i bl– Ada, C#, and Java 5.0: enumeration type variables are not coerced into integer types (better than C )

Copyright © 2006 Addison-Wesley. All rights reserved. 1-23

C++)

Subrange Types• An ordered contiguous subsequence of an

ordinal typeordinal type– Example: 12 .. 18 is a subrange of integer

Ada’s design• Ada s designtype Days is (mon, tue, wed, thu, fri, sat, sun);

subtype Weekdays is Days range mon fri;subtype Weekdays is Days range mon..fri;

subtype Index is Integer range 1..100;

Day1: Days;

Day2: Weekday;

Copyright © 2006 Addison-Wesley. All rights reserved. 1-24

Day2 := Day1;

Page 13: Chapter 6 - Data T

Subrange Evaluation

• Aid to readability– Make it clear to the readers that variables of

subrange can store only certain range of valuesg g

• ReliabilityAssigning a value to a subrange variable that is– Assigning a value to a subrange variable that is outside the specified range is detected as an errorerror

Copyright © 2006 Addison-Wesley. All rights reserved. 1-25

Array Types

• An array is an aggregate of homogeneous data elements in which an individual element is identified by its position in the aggregate,identified by its position in the aggregate, relative to the first element

Copyright © 2006 Addison-Wesley. All rights reserved. 1-26

Page 14: Chapter 6 - Data T

Array Design Issues

• What types are legal for subscripts?• Are subscripting expressions range checked?• How many subscripts are allowed?• How many subscripts are allowed?• When does allocation take place?• Can array objects be initialized?• Are any kind of slices allowed?y

Copyright © 2006 Addison-Wesley. All rights reserved. 1-27

Array Indexing

• Indexing (or subscripting) is a mapping from indices to elementsarray name(index value)y_ _

• Index syntaxFORTRAN PL/I Ada use parentheses– FORTRAN, PL/I, Ada use parentheses

– Most other languages use brackets

Copyright © 2006 Addison-Wesley. All rights reserved. 1-28

Page 15: Chapter 6 - Data T

Arrays Index (Subscript) Types

• FORTRAN, C: integer only• Pascal: any ordinal type (integer, Boolean, char,

enumeration)• Ada: integer or enumeration (includes Boolean

and char)and char)• Java: integer types only• C/C++, Perl, and Fortran do not specify range

checking

Copyright © 2006 Addison-Wesley. All rights reserved. 1-29

• Java, ML, C# specify range checking

Subscript Binding and Array Categories

• Static array: subscript ranges are statically bound and storage allocation is static (before run-time)– Advantage: execution efficiency (no dynamic

allocation)

• Fixed stack-dynamic array: subscript ranges are statically bound but the allocation is done atstatically bound, but the allocation is done at declaration elaboration time during execution

Copyright © 2006 Addison-Wesley. All rights reserved. 1-30

– Advantage: space efficiency

Page 16: Chapter 6 - Data T

Subscript Binding and Array Categories (cont )(cont.)

• Stack-dynamic array: subscript ranges are dynamically bound and the storage allocation isdynamically bound and the storage allocation is dynamic (at run-time)

Advantage: flexibility (the size of an array need not– Advantage: flexibility (the size of an array need not be known until the array is to be used)Ada:– Ada:

procedure foo (size: integer) is

M: array (1.. size, 1 .. size) of realy ( , )

begin

Copyright © 2006 Addison-Wesley. All rights reserved. 1-31

end foo;

The compiler arranges for a pointer to M to reside at a

Copyright © 2006 Addison-Wesley. All rights reserved. 1-32

• The compiler arranges for a pointer to M to reside at a static offset from the frame pointer

Page 17: Chapter 6 - Data T

Subscript Binding and Array Categories (cont )(cont.)

• Fixed heap-dynamic array: storage binding is dynamic but fixed after allocation (i.e., binding is done when requested and storage is allocated q gfrom heap, not stack)– Example: In FORTRAN 90Example: In FORTRAN 90

INTEGER, ALLOCATABLE, ARRAY (:, :) :: MAT

ALLOCATE (MAT(10 NUMBER OF COLS))ALLOCATE (MAT(10, NUMBER_OF_COLS))

DEALLOCATE (MAT)

Copyright © 2006 Addison-Wesley. All rights reserved. 1-33

– Example: malloc/free (C), new/delete (C++)

Subscript Binding and Array Categories (cont )(cont.)

• Heap-dynamic array: binding of subscript ranges and storage allocation is dynamic and can change any number of timesg y– Advantage: Flexibility (arrays can grow or shrink

during program execution)during program execution)– Example: In Perl

@ l h (" " " ")@alpha = ("a" .. "z");

push(@array, <element>);

Copyright © 2006 Addison-Wesley. All rights reserved. 1-34

pop(@array);

Page 18: Chapter 6 - Data T

Subscript Binding and Array Categories (cont )(cont.)

• C/C++ arrays that include static modifier are static

• C/C++ arrays without static modifier are fixed stack-dynamic

• Ada arrays can be stack-dynamic• Ada arrays can be stack dynamic• C/C++ provide fixed heap-dynamic arrays• C# includes a second array class ArrayList that

provides fixed heap-dynamic

Copyright © 2006 Addison-Wesley. All rights reserved. 1-35

• Perl and JavaScript support heap-dynamic arrays

Number of subscripts

• FORTRAN I allowed up to three• FORTRAN 77 allows up to seven

Others no limit• Others - no limit

Copyright © 2006 Addison-Wesley. All rights reserved. 1-36

Page 19: Chapter 6 - Data T

Array Initialization• Some language allow initialization at the time of

t ll tistorage allocation– C, C++, Java, C#:

int list[] = {4, 5, 7, 83}

– FORTRAN 77INTEGER LIST(3)

DATA LIST /0, 5, 9/

– AdaSCORE : array (1..5) of Integer :=

Copyright © 2006 Addison-Wesley. All rights reserved. 1-37

(1 => 17, 3 => 10, others => 0);

Jagged Arrays

• A jagged matrix has rows with varying number of elements– They are made possible when multidimensioned y p

arrays actually appear as arrays of arrays

• Advantages• Advantages– It allows the rows to have different lengths,

ith t d ti t h l t th d f thwithout devoting space to holes at the ends of the rows

Copyright © 2006 Addison-Wesley. All rights reserved. 1-38

Page 20: Chapter 6 - Data T

An example in C/C++

char days[][10] = char *days[] =

{"Sunday", "Monday",

"Tuesday", "Wednesday",

"Thursday", "Friday",

{"Sunday", "Monday",

"Tuesday", "Wednesday",

"Thursday", "Friday",Thursday , Friday ,

"Saturday"};

...

Thursday , Friday ,

"Saturday"};

...

Copyright © 2006 Addison-Wesley. All rights reserved. 1-39

days[2][3] == ’s’; days[2][3] == ’s’;

Slices

• A slice is some substructure of an array – It is a mechanism for referencing part of an array

as a unitas a unit

• Slices are only useful in languages that have array operationsarray operations

Copyright © 2006 Addison-Wesley. All rights reserved. 1-40

Page 21: Chapter 6 - Data T

Slice Examples

• Fortran 90INTEGER MAT(1:3, 1:3), CUBE(1:3, 1:4, 1:4)

MAT(1:3, 2) MAT(2:3, 1:3)

CUBE(2, 1:3, 1:4) CUBE(1:3, 1:3, 2:3)

Copyright © 2006 Addison-Wesley. All rights reserved. 1-41

Arrays Operations

• APL provides the most powerful array processing operations for vectors (V) and matrixes (M) as well as unary operators

V reverses the elements of V M reverses the columns of M

M reverses the rows of M M transposes MM transposes MA + B whether A and B are scalar variables, vectors,

Copyright © 2006 Addison-Wesley. All rights reserved. 1-42

or matrixes

Page 22: Chapter 6 - Data T

Arrays Operations (cont.)

• Ada allows – The assignment, arithmetic, relational operators

type ARY INT is array(1..6) of INTEGER;yp _ y( ) ;

flag : BOOLEAN;

crowd Group1 Group2 : ARY INT;crowd, Group1, Group2 : ARY_INT;

Group1 := (12, 17, -1, 3, -100, 5);

Group2 := (13 2 22 1 1242 12);Group2 := (13, -2, 22, 1, 1242, -12);

flag := Group1 <= Group2;

d G 1 G 2

Copyright © 2006 Addison-Wesley. All rights reserved. 1-43

crowd := Group1 + Group2;

Arrays Operations (cont.)

– Logical operatorstype ARY_BOOL is array(1..4) of BOOLEAN;

Result, Answer1, Answer2 : ARY BOOL;, , _ ;

Answer1 := (TRUE, FALSE, TRUE, FALSE);

Answer2 := (TRUE, FALSE, FALSE, TRUE);Answer2 : (TRUE, FALSE, FALSE, TRUE);

Result := Answer1 and Answer2;

Copyright © 2006 Addison-Wesley. All rights reserved. 1-44

Page 23: Chapter 6 - Data T

Arrays Operations (cont.)

– Catenationdeclare

type vector is array(positive range <>) of

integer;

a : vector (1..10);

b : vector (1..5) := (1, 2, 3, 4, 5);

c : vector (1..5) := (6, 7, 8, 9, 10);

begin

a := b & c;

Copyright © 2006 Addison-Wesley. All rights reserved. 1-45

end;

Arrays Operations (cont.)

• Fortran 90 has a very rich set of array operationsA f th b ilt i ith ti t ill t k– Any of the built-in arithmetic operators will take arrays as operands; the result is an array, of the same shape as the operands, whose elements aresame shape as the operands, whose elements are the result of applying the operator to correspon-ding elements

– Slices of the same shape can be intermixed in array operations, even if the arrays from which they were sliced have very different shapesthey were sliced have very different shapes

• FORTRAN 90 also includes library functions for matrix multiplication matrix transpose and

Copyright © 2006 Addison-Wesley. All rights reserved. 1-46

matrix multiplication, matrix transpose, and vector dot product

Page 24: Chapter 6 - Data T

Implementation of Arrays

• Access function maps subscript expressions to an address in the arrayan address in the array

• A single-dimensioned array is a list of adjacent memor cellsmemory cells

• Hardware memory is linear – it is usually a simple f b t S l t f ltisequence of bytes. So elements of multi-

dimensional arrays must be mapped onto the single-dimensioned memorysingle-dimensioned memory

• Row-major or column-major orderFortran uses column major order; most other

Copyright © 2006 Addison-Wesley. All rights reserved. 1-47

– Fortran uses column-major order; most other languages use row-major order

Implementation of Arrays (cont.)

• The difference between row- and column-major l t b i t t f th tlayout can be important for programs that use nested loops to access all the elements of a large multi dimensional arraylarge, multi-dimensional array– If a small array is accessed frequently, all or most

of its elements are likely to remain in the cacheof its elements are likely to remain in the cache– For a large array, each miss will bring into the

cache not only the desired element but the nextcache not only the desired element, but the next several elements as well If elements are accessed across cache lines then If elements are accessed across cache lines, then

almost every access will result in a cache missCopyright © 2006 Addison-Wesley. All rights reserved. 1-48

Page 25: Chapter 6 - Data T

Compile-Time Descriptors

Copyright © 2006 Addison-Wesley. All rights reserved. 1-49

Single-dimensioned array Multi-dimensional array

Implementation of Arrays (cont.)

Copyright © 2006 Addison-Wesley. All rights reserved. 1-50

Page 26: Chapter 6 - Data T

Implementation of Arrays (cont.)

• Single-dimensioned array:address(a[k]) =

address(a[lower_bound]) + ((k - lower_bound) * _ _

element_size)

• Double-dimensioned array (row-major order):Double dimensioned array (row major order):address(a[i, j]) =

address(a[ro lb col lb]) + ((i ro lb) * N +address(a[row_lb, col_lb]) + ((i - row_lb) * N +

(j - col_lb)) * element_size

h N i b f lCopyright © 2006 Addison-Wesley. All rights reserved. 1-51

where N is a number of column

Associative Arrays

• An associative array is an unordered collection f d t l t th t i d d b lof data elements that are indexed by an equal

number of values called keys h l f f• Each element of an associative array is a pair of

entities: <key, value> • The design issues that are specific for

associative arrays are: – What is the form of references to elements? – Is the size of an associative array static or

Copyright © 2006 Addison-Wesley. All rights reserved. 1-52

dynamic?

Page 27: Chapter 6 - Data T

Associative Arrays - Perl

• In Perl, associative arrays are often called hashes, because in the implementation their elements are stored and retrieved with hash functions

• The size of a Perl hash is dynamic: It grows• The size of a Perl hash is dynamic: It grows when a new element is added and shrinks when

l t i d l t d d l it i ti d ban element is deleted, and also it is emptied by assignment of the empty literal

Copyright © 2006 Addison-Wesley. All rights reserved. 1-53

Associative Arrays – Perl (cont.)

• Structure and Operations Hash variables begin with %– Hash variables begin with %

– Scalar variable names begin with $%fruit = (“apples” => 3, “oranges” => 6);

– Subscripting is done using braces and keys$fruit{“apples”}

– Elements can be added with the assignment statementElements can be added with the assignment statement$fruit{“peach”} = 9;

– Elements can be removed with deletedelete $fruit{“peach”};

– The entire hash can he emptied by assigning the empty literal to it

Copyright © 2006 Addison-Wesley. All rights reserved. 1-54

te a to t@fruit = ();

Page 28: Chapter 6 - Data T

Record Types

• A record is a possibly heterogeneous aggregate of data elements in which the individual elements are identified by namesy

• Design issues:What is the syntactic form of references to the– What is the syntactic form of references to the field?

ll l f ll d– Are elliptical references allowed?

Copyright © 2006 Addison-Wesley. All rights reserved. 1-55

Definition of Records in COBOL

• COBOL uses level numbers to show nested records

01 EMP-REC.

02 EMP-NAME.02 EMP NAME.

05 FIRST PIC X(20).

05 MID PIC X(10).( )

05 LAST PIC X(20).

02 HOURLY-RATE PIC 99V99.

Copyright © 2006 Addison-Wesley. All rights reserved. 1-56

Page 29: Chapter 6 - Data T

Definition of Records in Ada• In Ada:type DATE istype DATE is

recordMonth : INTEGER range 1..12;Day : INTEGER range 1..31;Year : INTEGER range 1776..2100;

end record;;type PERSON is

recordName : STRING(1 15);Name : STRING(1..15);Birthday : DATE;Age : INTEGER;

Copyright © 2006 Addison-Wesley. All rights reserved. 1-57

Sex : CHARACTER;end record;

References to Records• Most language use dot notation• Fully qualified references must include all recordFully qualified references must include all record

names– COBOL:

MID OF EMP-NAME OF EMP-REC– Ada

Student: PERSON; … Student.Birthday.Year …• Elliptical references allow leaving out record

names as long as the reference is unambiguous– COBOLCOBOL

FIRST, FIRST OF EMP-NAME, and FIRST OF EMP-RECare elliptical references to the employee’s first name– Ada

Copyright © 2006 Addison-Wesley. All rights reserved. 1-58

Adawith Student do Name := ‘Michael’;

Page 30: Chapter 6 - Data T

Operations on Records

• Assignment is very common if the types are identical

• Ada allows record comparisonAda allows record comparison• COBOL provides MOVE CORRESPONDING

f ld f h d h– Copies a field of the source record to the corresponding field in the target record

Copyright © 2006 Addison-Wesley. All rights reserved. 1-59

Example01INPUT-RECORD.

02 NAME.05 FIRST PICTURE IS X(20).05 MIDDLE PICTURE IS X(20).05 LAST PICTURE IS X(20).

02 EMPLOYEE NUMBER PICTURE IS 9(10)02 EMPLOYEE-NUMBER PICTURE IS 9(10).02 HOURLY-RATE PICTURE IS 99.

01OUTPUT-RECORD.02 NAME02 NAME.

05 FIRST PICTURE IS X(20).05 MIDDLE PICTURE IS X(20).05 LAST PICTURE IS X(20).( )

02 EMPLOYEE-NUMBER PICTURE IS 9(10).02 GROSS-PAY PICTURE IS 999V99.02 NET-PAY PICTURE IS 999V99.

Copyright © 2006 Addison-Wesley. All rights reserved. 1-60

…MOVE CORRESPONDING INPUT-RECORD TO OUTPUT-RECORD.

Page 31: Chapter 6 - Data T

Implementation of record types

A compile-time descriptor for a record

Copyright © 2006 Addison-Wesley. All rights reserved. 1-61

Memory layout – an example in Pascal

type element = record type element = packed record

name: two_chars;

atomic_number: integer;

atomic weight: real;

name: two_chars;

atomic_number: integer;

atomic weight: real;atomic_weight: real;

metallic: Boolean

end;

atomic_weight: real;

metallic: Boolean

end;

Copyright © 2006 Addison-Wesley. All rights reserved. 1-62

Page 32: Chapter 6 - Data T

Evaluation and Comparison to Arrays

• Arrays are used when all the data values have the same type and are processed in the same way

• Records are used when the collection of dataRecords are used when the collection of data values is heterogeneous and the different fields are not processed in the same wayare not processed in the same way

• Access to array elements is much slower than access to record fields, because subscripts are dynamic (field names are static)

Copyright © 2006 Addison-Wesley. All rights reserved. 1-63

Unions Types

• A union is a type whose variables are allowed to store different type values at different times during executiong

• Design issues Should type checking be required?– Should type checking be required?

– Should unions be embedded in records?

Copyright © 2006 Addison-Wesley. All rights reserved. 1-64

Page 33: Chapter 6 - Data T

Discriminated vs. Free Unions

• Fortran, C/C++ provide union constructs in hi h h i l fwhich there is no language support for type

checking; the union in these languages is called ffree union

• Type checking of unions require that each union include a type indicator called a tag (or discriminant– Supported by ALGOL 68, Ada

• Pascal provides both discriminated and non-

Copyright © 2006 Addison-Wesley. All rights reserved. 1-65

pdiscriminated unions

Pascal Union Types

• In Pascal (and Ada), the discriminated union is called a record variant, or variant part of a record

type intreal =

record tagg : Boolean of

true : (blint : integer);

false : (blreal : real);

end;

• Problem with Pascal’s design: type checking is ineffective

Copyright © 2006 Addison-Wesley. All rights reserved. 1-66

ineffective

Page 34: Chapter 6 - Data T

Pascal Union Types (con.)

– User can create inconsistent unions (because the tag can be individually assigned)

var x : intreal; y : real;

(1) x.tagg := true; { it is an integer }

(2) x.blint := 47; { ok }

(3) x.tagg := false; { it is a real }

(4) y := x.blreal; { assigns an integer to ( ) y ; { g g

real}

– The tagg is optional. Now, the declaration and

Copyright © 2006 Addison-Wesley. All rights reserved. 1-67

gg p ,the second and last assignment cause trouble

Ada Union Types

type Shape is (Circle, Triangle, Rectangle);type Colors is (Red, Green, Blue);type Co o s s ( ed, G ee , ue);type Figure (Form: Shape := Rectangle) is record

Filled: Boolean;Color ColorsColor: Colors;case Form : Shape is

when Circle => Diameter: Float;when Triangle =>

Leftside, Rightside: Integer;Angle: Float;Angle: Float;

when Rectangle => Side1, Side2: Integer;end case;d d

Copyright © 2006 Addison-Wesley. All rights reserved. 1-68

end record;

Page 35: Chapter 6 - Data T

Ada Union Type Illustrated

color

form: tag (discriminant)rectangle: side1, side2

filledcolor

circle: diameter

triangle: leftside, rightside, angle

A discriminated union of three shape variables

Copyright © 2006 Addison-Wesley. All rights reserved. 1-69

Ada Union Types (cont.)

• Reasons Ada union types are safer than Pascal– Variant records must always have a discriminant– The discriminant must also appear in the header

of the record’s declaration and have default value– It is impossible for the user to create an incon-

cistent union. The assignment can occur either• Via whole-record assignment (e.g., A := B, where A

d B i t d )and B are variant records), or• Via assignment of an aggregate (e.g., A := {Filled => True Color => Red Form => Circle

Copyright © 2006 Addison-Wesley. All rights reserved. 1-70

=> True, Color => Red, Form => Circle,

Diameter => 5.5 }; )

Page 36: Chapter 6 - Data T

Implementation of Union Types

• Discriminated unions are implemented by simply i h dd f ibl iusing the same address for every possible variant

– Sufficient storage for the largest variant is ll dallocated

• At compile time, the complete description of each variant must be stored– Using a case table with the tag entry in the

descriptor – The case table has an entry for each variant, which

Copyright © 2006 Addison-Wesley. All rights reserved. 1-71

points to a descriptor for that particular variant

Example - Ada

type NODE(TAG: BOOLEAN) isrecordrecord

case TAG iswhen true COUNT: INTEGER;when false SUM: FLOAT;

end case;

end record;end record;

Discriminated union

COUNTINTEGER

NameType

BOOLEAN

Offset

TAG

Case tableSUM

FLOATNameType

Copyright © 2006 Addison-Wesley. All rights reserved. 1-72

Address true

false

FLOAT Type

Page 37: Chapter 6 - Data T

Evaluation of Unions

• Potentially unsafe construct in most languages (not Ada)– Do not allow type checkingDo not allow type checking

• Oberon, Modula-3, Java, and C# do not support iunions

– Reflective of growing concerns for safety in programming language

Copyright © 2006 Addison-Wesley. All rights reserved. 1-73

Set Types

• A set is a type whose variables can store unordered collections of distinct values fromunordered collections of distinct values from some ordinal typeDesign Iss e• Design Issue:– What is the maximum number of elements in any

set type?set type?• Among the common imperative languages, only

Pascal includes sets as a data typePascal includes sets as a data type– The maximum size of Pascal base sets is

compilation dependent (< 100)

Copyright © 2006 Addison-Wesley. All rights reserved. 1-74

compilation dependent (< 100) – Operations: set union, set intersection, set equality

Page 38: Chapter 6 - Data T

Sets - Pascal

type

colors = (red, blue, green, yellow, orange, white, black);

colorset = set of colors;colorset = set of colors;

var

set1, set2: colorset;

set1 := [red, blue, green];

t2 [ hit bl k]set2 := [white, black];

Copyright © 2006 Addison-Wesley. All rights reserved. 1-75

Implementation of Set Types

• Sets are usually stored as bit strings in memory• Example: If a set has the ordinal base type

[‘a’ .. ‘p’] then variables of this set type can use the first 16 bits of a machine word, with each set bit: 1: representing a present element 0: representing an absent element0: representing an absent element

• The set value [‘a’, ‘c’, ‘h’, ‘o’] would be t d 1010000100000010

Copyright © 2006 Addison-Wesley. All rights reserved. 1-76

represented as 1010000100000010

Page 39: Chapter 6 - Data T

Pointer and Reference Types

• A pointer type variable has a range of values that consists of memory addresses and a special value, nil

• Provide the power of indirect addressingP id t d i• Provide a way to manage dynamic memory– A pointer can be used to access a location in the

area where storage is dynamically created (usually called a heap)

Copyright © 2006 Addison-Wesley. All rights reserved. 1-77

Design Issues of Pointers

• What are the scope of and lifetime of a pointer variable?

• What is the lifetime of a heap-dynamic variable?• Are pointers restricted as to the type of value to

which they can point?c t ey ca po t• Are pointers used for dynamic storage

management indirect addressing or both?management, indirect addressing, or both?• Should the language support pointer types,

f t b th?Copyright © 2006 Addison-Wesley. All rights reserved. 1-78

reference types, or both?

Page 40: Chapter 6 - Data T

Pointer Operations• Two fundamental operations: assignment and

d f idereferencing• Assignment is used to set a pointer variable’s

l f l ddvalue to some useful address• Dereferencing yields the value stored at the

location represented by the pointer’s value– Dereferencing can be explicit or implicit– C/C++ uses an explicit operation via * operatorj = *ptr;

Copyright © 2006 Addison-Wesley. All rights reserved. 1-79

sets j to the value located at ptr

Pointer Assignment Illustrated

The assignment operation j = *ptr

Copyright © 2006 Addison-Wesley. All rights reserved. 1-80

g p j p

Page 41: Chapter 6 - Data T

Problems with Pointers

• Dangling pointers: A pointer points to a heap-dynamic variable that has been deallocated

1 Allocate a heap-dynamic variable and set a1. Allocate a heap dynamic variable and set a pointer to point at it

2 Set a second pointer to the value of the first2. Set a second pointer to the value of the first pointer

3 D ll h h d i i bl i3. Deallocate the heap-dynamic variable, using the first pointer

Copyright © 2006 Addison-Wesley. All rights reserved. 1-81

Problems with Pointers (cont.)

• Lost heap-dynamic variable: An allocated heap-dynamic variable that is no longer accessible to the user program (often called garbage)p g g g– Pointer p1 is set to point to a newly created heap-

dynamic variabledynamic variable– Pointer p1 is later set to point to another newly

created heap dynamic variablecreated heap-dynamic variable

Copyright © 2006 Addison-Wesley. All rights reserved. 1-82

Page 42: Chapter 6 - Data T

Pointers - Pascal

• Used for dynamic storage management and indirect addressing

• Explicit dereferencing (postfix ^)Explicit dereferencing (postfix )• Dangling pointers are possible (dispose)• Dangling objects are also possible

Copyright © 2006 Addison-Wesley. All rights reserved. 1-83

Example

var vari: integer;p: ^integer;

begin

p, q: ^integer;

begin

( )begini := 1;p := @i;^ 2

new(p);

p^ := 3;

q = p;p^ := 2;writeln(i); 2

end.

q p;

writeln(p^);

dispose(p);

{q: dangling pointer}

end.

Copyright © 2006 Addison-Wesley. All rights reserved. 1-84

Page 43: Chapter 6 - Data T

Pointers in Ada

• Some dangling pointers are disallowed because dynamic objects can be automatically deallocated at the end of pointer's scopedeallocated at the end of pointer s scope

• All pointers are initialized to null• The lost heap-dynamic variable problem is not

eliminated by Ada

Copyright © 2006 Addison-Wesley. All rights reserved. 1-85

Pointers in C and C++

• Extremely flexible but must be used with care• Used for dynamic storage management and

indirect addressing• Pointer arithmetic is possible• Explicit dereferencing and address-of operators• void * can point to any type and can be type

checked (cannot be explicitly dereferenced)• Used for parameters

– Advantages of both pass-by-reference and pass-

Copyright © 2006 Addison-Wesley. All rights reserved. 1-86

g p y pby-value

Page 44: Chapter 6 - Data T

Pointers in Fortran 95

• Pointers point to heap and non-heap variables• Implicit dereferencing• Dangling pointers are possible (DEALLOCATE)• Dangling pointers are possible (DEALLOCATE)• Pointers can only point to variables that have the TARGET attribute– The TARGET attribute is assigned in the

declaration:INTEGER, TARGET :: <variable>

Copyright © 2006 Addison-Wesley. All rights reserved. 1-87

,

Example

real, target :: a, b INTEGER, DIMENSION(:),

real, pointer :: pa

pa => a

ALLOCATABLE :: a !rank 1

ALLOCATE(a(100))…

b = pa * c

ALLOCATE(a(100))

DEALLOCATE(a)

pa = 1.2345

print *, 'a = ', a

Copyright © 2006 Addison-Wesley. All rights reserved. 1-88

Page 45: Chapter 6 - Data T

Reference Types

• C++ includes a special kind of pointer type ll d f t th t i d i il fcalled a reference type that is used primarily for

formal parametersA C f i i h i l– A C++ reference type is a pointer that is always implicitly dereferenced It t b i iti li d ith th dd f It must be initialized with the address of some

variables in its definition After initialization can never be set to reference After initialization can never be set to reference

any other variable The implicit dereference prevents assignment to

Copyright © 2006 Addison-Wesley. All rights reserved. 1-89

gthe address of a reference variable

Reference Types (cont.)

• Java extends C++’s reference variables and allows them to replace pointers entirely– No pointer arithmetic– Can only point at objects (which are all on the heap)– No explicit deallocator (garbage collection is used)No explicit deallocator (garbage collection is used),

that means there can be no dangling referencesDereferencing is always implicit– Dereferencing is always implicit

• C# includes both the references of Java and the

Copyright © 2006 Addison-Wesley. All rights reserved. 1-90

pointers of C++

Page 46: Chapter 6 - Data T

Implementation of Pointer and Reference TypesReference Types

• Representation of pointers and references– Large computers use single values– Intel microprocessors use segment and offset

• Dangling pointer problem1 Tombstone: extra heap cell that is a pointer to1. Tombstone: extra heap cell that is a pointer to

the heap-dynamic variable The actual pointer variable points only at The actual pointer variable points only at

tombstones When heap-dynamic variable deallocated,

Copyright © 2006 Addison-Wesley. All rights reserved. 1-91

When heap dynamic variable deallocated, tombstone remains but set to nil

Implementation of Pointer and Reference Types (cont )Reference Types (cont.)

Copyright © 2006 Addison-Wesley. All rights reserved. 1-92

Page 47: Chapter 6 - Data T

Implementation of Pointer and Reference Types (cont )Reference Types (cont.)2. Locks and keys: Pointer values are represented as

k dd i<key, address> pairs Heap-dynamic variables are represented as variable

l ll f i t l k lplus cell for integer lock value When heap-dynamic variable allocated, lock value is

created and placed in lock cell and key cell of pointercreated and placed in lock cell and key cell of pointer When a heap-dynamic variable is deallocated

explicitly its lock value is cleared Then although aexplicitly, its lock value is cleared. Then, although a pointer has the same address value but its key value will no longer match the lock, so the access will not be

Copyright © 2006 Addison-Wesley. All rights reserved. 1-93

g ,allowed

Evaluation of Pointers and References

• Dangling pointers and dangling objects are problems

• Pointers are like goto's - they widen the rangePointers are like goto s they widen the range of cells that can be accessed by a variableP i t f f• Pointers or references are necessary for dynamic data structures - so we can't design a language without them

Copyright © 2006 Addison-Wesley. All rights reserved. 1-94

Page 48: Chapter 6 - Data T

Garbage Collection

• Problems with garbage collection– Many languages leave it up to the programmer to

design without garbage creation - this is very hard– Others arrange for automatic garbage collection

• Garbage collection with reference countsGarbage collection with reference counts– Does not work for circular structures

W k f i– Works great for strings– Should also work to collect unneeded tombstones

Copyright © 2006 Addison-Wesley. All rights reserved. 1-95

Copyright © 2006 Addison-Wesley. All rights reserved. 1-96

Page 49: Chapter 6 - Data T

Garbage Collection (cont.)

• Mark-and-sweep– Achieved successfully in Ada, Java, ML, …– It processes in three main steps

1. The collector walks through the heap, tentatively marking every block as “useless”

2. Beginning with all pointers outside the heap, the collector recursively explores all linked data structures in the program marking each newly discovered blockin the program, marking each newly discovered block as “useful”

3 The collector again walks through the heap moving

Copyright © 2006 Addison-Wesley. All rights reserved. 1-97

3. The collector again walks through the heap, moving every block that is still marked “useless” to the free list

Garbage Collection (cont.)

– Several potential problems: The collector be able to tell where every “in-use”

block begins and ends In a language with variable-size heap blocks, every

block must begin with an indication of its size The collector must be able in Step 2 to find the

pointers contained within each block

Copyright © 2006 Addison-Wesley. All rights reserved. 1-98

Page 50: Chapter 6 - Data T

Garbage Collection (cont.)• Garbage collection with pointer reversal

Th l ti t (St 2) f k d– The exploration step (Step 2) of mark-and-sweep collection is naturally recursive The implementation needs a stack whose maximum The implementation needs a stack whose maximum

depth is proportional to the longest chain through the heap

In practice, the space for this stack may not be available: after all, we run garbage collection when we’re about to run out of space!

– As the collector explores the path to a given block, it th i t it f ll th t h

Copyright © 2006 Addison-Wesley. All rights reserved. 1-99

it reverses the pointers it follows, so that each points back to the previous block

Copyright © 2006 Addison-Wesley. All rights reserved. 1-100

Page 51: Chapter 6 - Data T

Garbage Collection (cont.)

• Stop-and-Copy– System divide the heap into two regions of equal size.

All allocation happens in the first half. When this half is ( l ) f ll th ll t b i it l ti(nearly) full, the collector begins its exploration

– Each reachable block is copied into the second half of the heap Any other pointer that refers to the samethe heap. Any other pointer that refers to the same block is set to point to the new locationWhen the collector finishes its exploration all useful– When the collector finishes its exploration, all useful objects have been moved (and compacted) into the second half of the heap, and nothing in the first half is

Copyright © 2006 Addison-Wesley. All rights reserved. 1-101

p, gneeded anymore

Summary• The data types of a language are a large part of what

determines that language’s style and usefulnessdetermines that language s style and usefulness• The primitive data types of most imperative

languages include numeric character and Booleanlanguages include numeric, character, and Boolean types

• The user-defined enumeration and subrange types• The user-defined enumeration and subrange types are convenient and add to the readability and reliability of programsy p g

• Arrays and records are included in most languages• Pointers are used for addressing flexibility and to

Copyright © 2006 Addison-Wesley. All rights reserved. 1-102

Pointers are used for addressing flexibility and to control dynamic storage management