chapter 6 - data t
TRANSCRIPT
Chapter 6
Data Types
ISBN 0-321-33025-0
Chapter 6 Topics
• Introduction• Primitive Data Types• Character String Types• User-Defined Ordinal Types• Array TypesArray Types• Associative Arrays
R d T• Record Types• Union Types
Copyright © 2006 Addison-Wesley. All rights reserved. 1-2
• Pointer and Reference Types
Introduction
• A data type defines a collection of data values and a set of predefined operations on those objects
• Some data types are specified by type operatorswhich are used to form type expressions (‘[]’, ‘*’, ‘()’ in C for array, pointer, function, respectively)
• A descriptor is the collection of the attributes of a variable. It is used for type checking and by allocation and deallocation operations
• Design issue: What operations are provided for
Copyright © 2006 Addison-Wesley. All rights reserved. 1-3
es g ssue at ope at o s a e p o ded ovariables of the type and how are they specified?
Primitive Data Types
• Almost all programming languages provide a set of primitive data types– Primitive data types: Those not defined in terms yp
of other data types
• Some primitive data types are merely reflections• Some primitive data types are merely reflections of the hardware
• Others require little software support
Copyright © 2006 Addison-Wesley. All rights reserved. 1-4
Primitive Data Types: Integer
• Almost always an exact reflection of the hardware h i i i i lso the mapping is trivial
– Java: byte, short, int, long– Ada: SHORT INTEGER, INTEGER and LONG INTEGER
• The leftmost bit is set to indicate negative and the remainder of the bit string represents the absolute value of the number
• Most computers now use a notation called two’s complement to store negative integers, which is
Copyright © 2006 Addison-Wesley. All rights reserved. 1-5
complement to store negative integers, which is convenient for addition and subtraction
Primitive Data Types: Floating Point
• Model real numbers but only as approximations• Model real numbers, but only as approximations for most real values
f f l• Languages for scientific use support at least two floating-point types (e.g., float and double; sometimes more)
• Usually exactly like the hardware, but not alwaysy y , y
Copyright © 2006 Addison-Wesley. All rights reserved. 1-6
IEEE Floating Point Formats
Copyright © 2006 Addison-Wesley. All rights reserved. 1-7
Primitive Data Types: Decimal• For business applications
S fi d b f d i l di i• Store a fixed number of decimal digits • The representations of data types are called
binar coded decimal (BCD)binary coded decimal (BCD)– Two digits per byte
Ad t• Advantage– Accuracy
Di d• Disadvantages– The range of values is restricted because no
exponents are allowed
Copyright © 2006 Addison-Wesley. All rights reserved. 1-8
exponents are allowed – Wastes memory
Primitive Data Types: Boolean
• Simplest of allp• Range of values: two elements, one for “true”
and one for “false”and one for false• Could be implemented as bits, but often as
bytesbytes– Advantage: readability
Copyright © 2006 Addison-Wesley. All rights reserved. 1-9
Primitive Data Types: Character
• Stored as numeric codings• Most commonly used coding: ASCII
An alternative 16 bit coding: Unicode• An alternative, 16-bit coding: Unicode– Includes characters from most natural languages– Originally used in Java– C# and JavaScript also support Unicode
Copyright © 2006 Addison-Wesley. All rights reserved. 1-10
Character String Types
• Values are sequences of characters• Design issues:
– Is it a primitive type or just a special kind ofIs it a primitive type or just a special kind of array?
– Should the length of strings be static orShould the length of strings be static or dynamic?
Copyright © 2006 Addison-Wesley. All rights reserved. 1-11
Character String Types Operations
• Typical operations:– Assignment and copying– Comparison– Comparison– Catenation– Substring reference– Pattern matching
Copyright © 2006 Addison-Wesley. All rights reserved. 1-12
Character String Type in Certain Languages
• AdaN t i iti– Not primitive
– Assignment, comparison, catenation, substring referencereference
NAME1 : STRING(1..30);
NAME1 := NAME1 & NAME2; (catenation)NAME1 := NAME1 & NAME2; (catenation)NAME1(2:7) (substring reference)• CC
– Not primitive– Use char arrays and a library of functions that
Copyright © 2006 Addison-Wesley. All rights reserved. 1-13
– Use char arrays and a library of functions that provide operations
Character String Type in Certain Languages
• Java– Primitive via the String class (not arrays of char).
Objects cannot be changedl f h bl b– StringBuffer is a class for changeable string objects
• SNOBOL4 (a string manipulation language)– Primitive– Many operations, including elaborate pattern
himatchingLETTER = ‘abcdefghijklmnopqrstuvwxyz’;
WORDPAT BREAK(LETTER) SPAN(LETTER) WORD
Copyright © 2006 Addison-Wesley. All rights reserved. 1-14
WORDPAT = BREAK(LETTER) SPAN(LETTER) . WORD
TEXT WORDPAT
Character String Length Options• Static length string: FORTRAN 90, COBOL, Pascal,
J ’ i lJava’s String class, …– FORTRAN 90: CHARACTER (LEN = 15) NAME1, NAME2
Limited dynamic length string: C/C++• Limited dynamic length string: C/C++– In C-based language, ‘\0’ is used to indicate the
end of a string’s charactersend of a string s characters• Dynamic length string: SNOBOL4, Perl, JavaScript
It requires the overhead of dynamic storage– It requires the overhead of dynamic storage allocation and deallocation but provides maximum flexibility
Copyright © 2006 Addison-Wesley. All rights reserved. 1-15
flexibility • Ada supports all three string length options
Character String Implementation
• Static length: compile-time descriptor• Limited dynamic length: may need a run-time
descriptor for length (but not in C/C++)descriptor for length (but not in C/C++)• Dynamic length: need run-time descriptor
b d l k d l– Strings can be stored in a linked list, or – Strings can be stored completely in adjacent
storage cells
Copyright © 2006 Addison-Wesley. All rights reserved. 1-16
Character String Implementation
Static string Limited dyn. string Dyn. string
Length Max length Current lengthLength Max. length Current length
Address Current length Address
Address
Compile-time Run-time descriptor Run-timeCompile timedescriptor for static strings
Run time descriptor for limited dynamic strings
Run time descriptor for dynamic strings
Copyright © 2006 Addison-Wesley. All rights reserved. 1-17
User-Defined Ordinal Types
• An ordinal type is one in which the range of possible values can be easily associated with the set of positive integers
• Examples of primitive ordinal types in Java– integerinteger
– char
b l– boolean
• In many languages, users can define two kinds
Copyright © 2006 Addison-Wesley. All rights reserved. 1-18
of ordinal types: enumeration and subrange
Enumeration Types
• All possible values, which are named constants, are provided in the definition
• Ada:type DAYS is (Mon, Tue, Wed, Thu, Fri, Sat, Sun);
Design issues• Design issues– Is an enumeration constant allowed to appear in more
th t d fi iti d if h i th t fthan one type definition, and if so, how is the type of an occurrence of that constant checked?Are en meration al es coerced to integer?
Copyright © 2006 Addison-Wesley. All rights reserved. 1-19
– Are enumeration values coerced to integer?
Example
• Pascal - cannot reuse constants; they can be used for array subscripts for loops caseused for array subscripts, for loops, case selectors; no input or output; can be compared
program test;p g ;type DAYS = (Mon, Tue, Wed, Thu, Fri, Sat, Sun);var
day: DAYS;day: DAYS;begin
for day := Mon to Sun dowriteln('Hello, World');
day := Wed;if day <= Sat then …
Copyright © 2006 Addison-Wesley. All rights reserved. 1-20
ay aend.
Example
• C/C++, like Pascal, cannot reuse constants in a given referencing environment. Enumeration values implicitly converted to integer
void main() {
enum months {Jan = 1, Feb, Mar, Apr, May, Jun,
Jul, Aug, Sep, Oct, Nov, Dec};
enum months month;
printf("%d\n", month = Sep);
Copyright © 2006 Addison-Wesley. All rights reserved. 1-21
}
Example
• Ada - constants can be reused; distinguish with context or <type_name>’; can be used as in Pascal; can be input and output
type LETTERS is ( ‘A’ ‘B’ ‘C’ ‘D’ ‘E’ ‘F’ ‘G’ ‘H’( A , B , C , D , E , F , G , H ,
‘I’, ‘J’, ‘K’, ‘L’, ‘M’, ‘N’, ‘O’, ‘P’,‘Q’, ‘R’, ‘S’, ‘T’, ‘U’, ‘V’, ‘W’, ‘X’,‘Y’ ‘Z’ )‘Y’, ‘Z’ );
type VOWELS is (‘A’, ‘E’, ‘I’, ‘O’, ‘U’);…
Copyright © 2006 Addison-Wesley. All rights reserved. 1-22
for letter in VOWELS’(‘A’) .. VOWELS’(‘U’) loop
Evaluation of Enumerated Type
• Aid to readability, e.g., no need to code a color as a number
• Aid to reliability, e.g., compiler can check: y g p– Operations (don’t allow colors to be added)
No enumeration variable can be assigned a value– No enumeration variable can be assigned a value outside its defined rangeAd C# d J 5 0 ti t i bl– Ada, C#, and Java 5.0: enumeration type variables are not coerced into integer types (better than C )
Copyright © 2006 Addison-Wesley. All rights reserved. 1-23
C++)
Subrange Types• An ordered contiguous subsequence of an
ordinal typeordinal type– Example: 12 .. 18 is a subrange of integer
Ada’s design• Ada s designtype Days is (mon, tue, wed, thu, fri, sat, sun);
subtype Weekdays is Days range mon fri;subtype Weekdays is Days range mon..fri;
subtype Index is Integer range 1..100;
…
Day1: Days;
Day2: Weekday;
Copyright © 2006 Addison-Wesley. All rights reserved. 1-24
Day2 := Day1;
Subrange Evaluation
• Aid to readability– Make it clear to the readers that variables of
subrange can store only certain range of valuesg g
• ReliabilityAssigning a value to a subrange variable that is– Assigning a value to a subrange variable that is outside the specified range is detected as an errorerror
Copyright © 2006 Addison-Wesley. All rights reserved. 1-25
Array Types
• An array is an aggregate of homogeneous data elements in which an individual element is identified by its position in the aggregate,identified by its position in the aggregate, relative to the first element
Copyright © 2006 Addison-Wesley. All rights reserved. 1-26
Array Design Issues
• What types are legal for subscripts?• Are subscripting expressions range checked?• How many subscripts are allowed?• How many subscripts are allowed?• When does allocation take place?• Can array objects be initialized?• Are any kind of slices allowed?y
Copyright © 2006 Addison-Wesley. All rights reserved. 1-27
Array Indexing
• Indexing (or subscripting) is a mapping from indices to elementsarray name(index value)y_ _
• Index syntaxFORTRAN PL/I Ada use parentheses– FORTRAN, PL/I, Ada use parentheses
– Most other languages use brackets
Copyright © 2006 Addison-Wesley. All rights reserved. 1-28
Arrays Index (Subscript) Types
• FORTRAN, C: integer only• Pascal: any ordinal type (integer, Boolean, char,
enumeration)• Ada: integer or enumeration (includes Boolean
and char)and char)• Java: integer types only• C/C++, Perl, and Fortran do not specify range
checking
Copyright © 2006 Addison-Wesley. All rights reserved. 1-29
• Java, ML, C# specify range checking
Subscript Binding and Array Categories
• Static array: subscript ranges are statically bound and storage allocation is static (before run-time)– Advantage: execution efficiency (no dynamic
allocation)
• Fixed stack-dynamic array: subscript ranges are statically bound but the allocation is done atstatically bound, but the allocation is done at declaration elaboration time during execution
Copyright © 2006 Addison-Wesley. All rights reserved. 1-30
– Advantage: space efficiency
Subscript Binding and Array Categories (cont )(cont.)
• Stack-dynamic array: subscript ranges are dynamically bound and the storage allocation isdynamically bound and the storage allocation is dynamic (at run-time)
Advantage: flexibility (the size of an array need not– Advantage: flexibility (the size of an array need not be known until the array is to be used)Ada:– Ada:
procedure foo (size: integer) is
M: array (1.. size, 1 .. size) of realy ( , )
…
begin
Copyright © 2006 Addison-Wesley. All rights reserved. 1-31
…
end foo;
The compiler arranges for a pointer to M to reside at a
Copyright © 2006 Addison-Wesley. All rights reserved. 1-32
• The compiler arranges for a pointer to M to reside at a static offset from the frame pointer
Subscript Binding and Array Categories (cont )(cont.)
• Fixed heap-dynamic array: storage binding is dynamic but fixed after allocation (i.e., binding is done when requested and storage is allocated q gfrom heap, not stack)– Example: In FORTRAN 90Example: In FORTRAN 90
INTEGER, ALLOCATABLE, ARRAY (:, :) :: MAT
ALLOCATE (MAT(10 NUMBER OF COLS))ALLOCATE (MAT(10, NUMBER_OF_COLS))
DEALLOCATE (MAT)
Copyright © 2006 Addison-Wesley. All rights reserved. 1-33
– Example: malloc/free (C), new/delete (C++)
Subscript Binding and Array Categories (cont )(cont.)
• Heap-dynamic array: binding of subscript ranges and storage allocation is dynamic and can change any number of timesg y– Advantage: Flexibility (arrays can grow or shrink
during program execution)during program execution)– Example: In Perl
@ l h (" " " ")@alpha = ("a" .. "z");
push(@array, <element>);
Copyright © 2006 Addison-Wesley. All rights reserved. 1-34
pop(@array);
Subscript Binding and Array Categories (cont )(cont.)
• C/C++ arrays that include static modifier are static
• C/C++ arrays without static modifier are fixed stack-dynamic
• Ada arrays can be stack-dynamic• Ada arrays can be stack dynamic• C/C++ provide fixed heap-dynamic arrays• C# includes a second array class ArrayList that
provides fixed heap-dynamic
Copyright © 2006 Addison-Wesley. All rights reserved. 1-35
• Perl and JavaScript support heap-dynamic arrays
Number of subscripts
• FORTRAN I allowed up to three• FORTRAN 77 allows up to seven
Others no limit• Others - no limit
Copyright © 2006 Addison-Wesley. All rights reserved. 1-36
Array Initialization• Some language allow initialization at the time of
t ll tistorage allocation– C, C++, Java, C#:
int list[] = {4, 5, 7, 83}
– FORTRAN 77INTEGER LIST(3)
DATA LIST /0, 5, 9/
– AdaSCORE : array (1..5) of Integer :=
Copyright © 2006 Addison-Wesley. All rights reserved. 1-37
(1 => 17, 3 => 10, others => 0);
Jagged Arrays
• A jagged matrix has rows with varying number of elements– They are made possible when multidimensioned y p
arrays actually appear as arrays of arrays
• Advantages• Advantages– It allows the rows to have different lengths,
ith t d ti t h l t th d f thwithout devoting space to holes at the ends of the rows
Copyright © 2006 Addison-Wesley. All rights reserved. 1-38
An example in C/C++
char days[][10] = char *days[] =
{"Sunday", "Monday",
"Tuesday", "Wednesday",
"Thursday", "Friday",
{"Sunday", "Monday",
"Tuesday", "Wednesday",
"Thursday", "Friday",Thursday , Friday ,
"Saturday"};
...
Thursday , Friday ,
"Saturday"};
...
Copyright © 2006 Addison-Wesley. All rights reserved. 1-39
days[2][3] == ’s’; days[2][3] == ’s’;
Slices
• A slice is some substructure of an array – It is a mechanism for referencing part of an array
as a unitas a unit
• Slices are only useful in languages that have array operationsarray operations
Copyright © 2006 Addison-Wesley. All rights reserved. 1-40
Slice Examples
• Fortran 90INTEGER MAT(1:3, 1:3), CUBE(1:3, 1:4, 1:4)
MAT(1:3, 2) MAT(2:3, 1:3)
CUBE(2, 1:3, 1:4) CUBE(1:3, 1:3, 2:3)
Copyright © 2006 Addison-Wesley. All rights reserved. 1-41
Arrays Operations
• APL provides the most powerful array processing operations for vectors (V) and matrixes (M) as well as unary operators
V reverses the elements of V M reverses the columns of M
M reverses the rows of M M transposes MM transposes MA + B whether A and B are scalar variables, vectors,
Copyright © 2006 Addison-Wesley. All rights reserved. 1-42
or matrixes
Arrays Operations (cont.)
• Ada allows – The assignment, arithmetic, relational operators
type ARY INT is array(1..6) of INTEGER;yp _ y( ) ;
flag : BOOLEAN;
crowd Group1 Group2 : ARY INT;crowd, Group1, Group2 : ARY_INT;
Group1 := (12, 17, -1, 3, -100, 5);
Group2 := (13 2 22 1 1242 12);Group2 := (13, -2, 22, 1, 1242, -12);
flag := Group1 <= Group2;
d G 1 G 2
Copyright © 2006 Addison-Wesley. All rights reserved. 1-43
crowd := Group1 + Group2;
Arrays Operations (cont.)
– Logical operatorstype ARY_BOOL is array(1..4) of BOOLEAN;
Result, Answer1, Answer2 : ARY BOOL;, , _ ;
Answer1 := (TRUE, FALSE, TRUE, FALSE);
Answer2 := (TRUE, FALSE, FALSE, TRUE);Answer2 : (TRUE, FALSE, FALSE, TRUE);
Result := Answer1 and Answer2;
Copyright © 2006 Addison-Wesley. All rights reserved. 1-44
Arrays Operations (cont.)
– Catenationdeclare
type vector is array(positive range <>) of
integer;
a : vector (1..10);
b : vector (1..5) := (1, 2, 3, 4, 5);
c : vector (1..5) := (6, 7, 8, 9, 10);
begin
a := b & c;
Copyright © 2006 Addison-Wesley. All rights reserved. 1-45
end;
Arrays Operations (cont.)
• Fortran 90 has a very rich set of array operationsA f th b ilt i ith ti t ill t k– Any of the built-in arithmetic operators will take arrays as operands; the result is an array, of the same shape as the operands, whose elements aresame shape as the operands, whose elements are the result of applying the operator to correspon-ding elements
– Slices of the same shape can be intermixed in array operations, even if the arrays from which they were sliced have very different shapesthey were sliced have very different shapes
• FORTRAN 90 also includes library functions for matrix multiplication matrix transpose and
Copyright © 2006 Addison-Wesley. All rights reserved. 1-46
matrix multiplication, matrix transpose, and vector dot product
Implementation of Arrays
• Access function maps subscript expressions to an address in the arrayan address in the array
• A single-dimensioned array is a list of adjacent memor cellsmemory cells
• Hardware memory is linear – it is usually a simple f b t S l t f ltisequence of bytes. So elements of multi-
dimensional arrays must be mapped onto the single-dimensioned memorysingle-dimensioned memory
• Row-major or column-major orderFortran uses column major order; most other
Copyright © 2006 Addison-Wesley. All rights reserved. 1-47
– Fortran uses column-major order; most other languages use row-major order
Implementation of Arrays (cont.)
• The difference between row- and column-major l t b i t t f th tlayout can be important for programs that use nested loops to access all the elements of a large multi dimensional arraylarge, multi-dimensional array– If a small array is accessed frequently, all or most
of its elements are likely to remain in the cacheof its elements are likely to remain in the cache– For a large array, each miss will bring into the
cache not only the desired element but the nextcache not only the desired element, but the next several elements as well If elements are accessed across cache lines then If elements are accessed across cache lines, then
almost every access will result in a cache missCopyright © 2006 Addison-Wesley. All rights reserved. 1-48
Compile-Time Descriptors
Copyright © 2006 Addison-Wesley. All rights reserved. 1-49
Single-dimensioned array Multi-dimensional array
Implementation of Arrays (cont.)
Copyright © 2006 Addison-Wesley. All rights reserved. 1-50
Implementation of Arrays (cont.)
• Single-dimensioned array:address(a[k]) =
address(a[lower_bound]) + ((k - lower_bound) * _ _
element_size)
• Double-dimensioned array (row-major order):Double dimensioned array (row major order):address(a[i, j]) =
address(a[ro lb col lb]) + ((i ro lb) * N +address(a[row_lb, col_lb]) + ((i - row_lb) * N +
(j - col_lb)) * element_size
h N i b f lCopyright © 2006 Addison-Wesley. All rights reserved. 1-51
where N is a number of column
Associative Arrays
• An associative array is an unordered collection f d t l t th t i d d b lof data elements that are indexed by an equal
number of values called keys h l f f• Each element of an associative array is a pair of
entities: <key, value> • The design issues that are specific for
associative arrays are: – What is the form of references to elements? – Is the size of an associative array static or
Copyright © 2006 Addison-Wesley. All rights reserved. 1-52
dynamic?
Associative Arrays - Perl
• In Perl, associative arrays are often called hashes, because in the implementation their elements are stored and retrieved with hash functions
• The size of a Perl hash is dynamic: It grows• The size of a Perl hash is dynamic: It grows when a new element is added and shrinks when
l t i d l t d d l it i ti d ban element is deleted, and also it is emptied by assignment of the empty literal
Copyright © 2006 Addison-Wesley. All rights reserved. 1-53
Associative Arrays – Perl (cont.)
• Structure and Operations Hash variables begin with %– Hash variables begin with %
– Scalar variable names begin with $%fruit = (“apples” => 3, “oranges” => 6);
– Subscripting is done using braces and keys$fruit{“apples”}
– Elements can be added with the assignment statementElements can be added with the assignment statement$fruit{“peach”} = 9;
– Elements can be removed with deletedelete $fruit{“peach”};
– The entire hash can he emptied by assigning the empty literal to it
Copyright © 2006 Addison-Wesley. All rights reserved. 1-54
te a to t@fruit = ();
Record Types
• A record is a possibly heterogeneous aggregate of data elements in which the individual elements are identified by namesy
• Design issues:What is the syntactic form of references to the– What is the syntactic form of references to the field?
ll l f ll d– Are elliptical references allowed?
Copyright © 2006 Addison-Wesley. All rights reserved. 1-55
Definition of Records in COBOL
• COBOL uses level numbers to show nested records
01 EMP-REC.
02 EMP-NAME.02 EMP NAME.
05 FIRST PIC X(20).
05 MID PIC X(10).( )
05 LAST PIC X(20).
02 HOURLY-RATE PIC 99V99.
Copyright © 2006 Addison-Wesley. All rights reserved. 1-56
Definition of Records in Ada• In Ada:type DATE istype DATE is
recordMonth : INTEGER range 1..12;Day : INTEGER range 1..31;Year : INTEGER range 1776..2100;
end record;;type PERSON is
recordName : STRING(1 15);Name : STRING(1..15);Birthday : DATE;Age : INTEGER;
Copyright © 2006 Addison-Wesley. All rights reserved. 1-57
Sex : CHARACTER;end record;
References to Records• Most language use dot notation• Fully qualified references must include all recordFully qualified references must include all record
names– COBOL:
MID OF EMP-NAME OF EMP-REC– Ada
Student: PERSON; … Student.Birthday.Year …• Elliptical references allow leaving out record
names as long as the reference is unambiguous– COBOLCOBOL
FIRST, FIRST OF EMP-NAME, and FIRST OF EMP-RECare elliptical references to the employee’s first name– Ada
Copyright © 2006 Addison-Wesley. All rights reserved. 1-58
Adawith Student do Name := ‘Michael’;
Operations on Records
• Assignment is very common if the types are identical
• Ada allows record comparisonAda allows record comparison• COBOL provides MOVE CORRESPONDING
f ld f h d h– Copies a field of the source record to the corresponding field in the target record
Copyright © 2006 Addison-Wesley. All rights reserved. 1-59
Example01INPUT-RECORD.
02 NAME.05 FIRST PICTURE IS X(20).05 MIDDLE PICTURE IS X(20).05 LAST PICTURE IS X(20).
02 EMPLOYEE NUMBER PICTURE IS 9(10)02 EMPLOYEE-NUMBER PICTURE IS 9(10).02 HOURLY-RATE PICTURE IS 99.
01OUTPUT-RECORD.02 NAME02 NAME.
05 FIRST PICTURE IS X(20).05 MIDDLE PICTURE IS X(20).05 LAST PICTURE IS X(20).( )
02 EMPLOYEE-NUMBER PICTURE IS 9(10).02 GROSS-PAY PICTURE IS 999V99.02 NET-PAY PICTURE IS 999V99.
Copyright © 2006 Addison-Wesley. All rights reserved. 1-60
…MOVE CORRESPONDING INPUT-RECORD TO OUTPUT-RECORD.
Implementation of record types
A compile-time descriptor for a record
Copyright © 2006 Addison-Wesley. All rights reserved. 1-61
Memory layout – an example in Pascal
type element = record type element = packed record
name: two_chars;
atomic_number: integer;
atomic weight: real;
name: two_chars;
atomic_number: integer;
atomic weight: real;atomic_weight: real;
metallic: Boolean
end;
atomic_weight: real;
metallic: Boolean
end;
Copyright © 2006 Addison-Wesley. All rights reserved. 1-62
Evaluation and Comparison to Arrays
• Arrays are used when all the data values have the same type and are processed in the same way
• Records are used when the collection of dataRecords are used when the collection of data values is heterogeneous and the different fields are not processed in the same wayare not processed in the same way
• Access to array elements is much slower than access to record fields, because subscripts are dynamic (field names are static)
Copyright © 2006 Addison-Wesley. All rights reserved. 1-63
Unions Types
• A union is a type whose variables are allowed to store different type values at different times during executiong
• Design issues Should type checking be required?– Should type checking be required?
– Should unions be embedded in records?
Copyright © 2006 Addison-Wesley. All rights reserved. 1-64
Discriminated vs. Free Unions
• Fortran, C/C++ provide union constructs in hi h h i l fwhich there is no language support for type
checking; the union in these languages is called ffree union
• Type checking of unions require that each union include a type indicator called a tag (or discriminant– Supported by ALGOL 68, Ada
• Pascal provides both discriminated and non-
Copyright © 2006 Addison-Wesley. All rights reserved. 1-65
pdiscriminated unions
Pascal Union Types
• In Pascal (and Ada), the discriminated union is called a record variant, or variant part of a record
type intreal =
record tagg : Boolean of
true : (blint : integer);
false : (blreal : real);
end;
• Problem with Pascal’s design: type checking is ineffective
Copyright © 2006 Addison-Wesley. All rights reserved. 1-66
ineffective
Pascal Union Types (con.)
– User can create inconsistent unions (because the tag can be individually assigned)
var x : intreal; y : real;
(1) x.tagg := true; { it is an integer }
(2) x.blint := 47; { ok }
(3) x.tagg := false; { it is a real }
(4) y := x.blreal; { assigns an integer to ( ) y ; { g g
real}
– The tagg is optional. Now, the declaration and
Copyright © 2006 Addison-Wesley. All rights reserved. 1-67
gg p ,the second and last assignment cause trouble
Ada Union Types
type Shape is (Circle, Triangle, Rectangle);type Colors is (Red, Green, Blue);type Co o s s ( ed, G ee , ue);type Figure (Form: Shape := Rectangle) is record
Filled: Boolean;Color ColorsColor: Colors;case Form : Shape is
when Circle => Diameter: Float;when Triangle =>
Leftside, Rightside: Integer;Angle: Float;Angle: Float;
when Rectangle => Side1, Side2: Integer;end case;d d
Copyright © 2006 Addison-Wesley. All rights reserved. 1-68
end record;
Ada Union Type Illustrated
color
form: tag (discriminant)rectangle: side1, side2
filledcolor
circle: diameter
triangle: leftside, rightside, angle
A discriminated union of three shape variables
Copyright © 2006 Addison-Wesley. All rights reserved. 1-69
Ada Union Types (cont.)
• Reasons Ada union types are safer than Pascal– Variant records must always have a discriminant– The discriminant must also appear in the header
of the record’s declaration and have default value– It is impossible for the user to create an incon-
cistent union. The assignment can occur either• Via whole-record assignment (e.g., A := B, where A
d B i t d )and B are variant records), or• Via assignment of an aggregate (e.g., A := {Filled => True Color => Red Form => Circle
Copyright © 2006 Addison-Wesley. All rights reserved. 1-70
=> True, Color => Red, Form => Circle,
Diameter => 5.5 }; )
Implementation of Union Types
• Discriminated unions are implemented by simply i h dd f ibl iusing the same address for every possible variant
– Sufficient storage for the largest variant is ll dallocated
• At compile time, the complete description of each variant must be stored– Using a case table with the tag entry in the
descriptor – The case table has an entry for each variant, which
Copyright © 2006 Addison-Wesley. All rights reserved. 1-71
points to a descriptor for that particular variant
Example - Ada
type NODE(TAG: BOOLEAN) isrecordrecord
case TAG iswhen true COUNT: INTEGER;when false SUM: FLOAT;
end case;
end record;end record;
Discriminated union
COUNTINTEGER
NameType
BOOLEAN
Offset
TAG
Case tableSUM
FLOATNameType
Copyright © 2006 Addison-Wesley. All rights reserved. 1-72
Address true
false
FLOAT Type
Evaluation of Unions
• Potentially unsafe construct in most languages (not Ada)– Do not allow type checkingDo not allow type checking
• Oberon, Modula-3, Java, and C# do not support iunions
– Reflective of growing concerns for safety in programming language
Copyright © 2006 Addison-Wesley. All rights reserved. 1-73
Set Types
• A set is a type whose variables can store unordered collections of distinct values fromunordered collections of distinct values from some ordinal typeDesign Iss e• Design Issue:– What is the maximum number of elements in any
set type?set type?• Among the common imperative languages, only
Pascal includes sets as a data typePascal includes sets as a data type– The maximum size of Pascal base sets is
compilation dependent (< 100)
Copyright © 2006 Addison-Wesley. All rights reserved. 1-74
compilation dependent (< 100) – Operations: set union, set intersection, set equality
Sets - Pascal
type
colors = (red, blue, green, yellow, orange, white, black);
colorset = set of colors;colorset = set of colors;
var
set1, set2: colorset;
…
set1 := [red, blue, green];
t2 [ hit bl k]set2 := [white, black];
Copyright © 2006 Addison-Wesley. All rights reserved. 1-75
Implementation of Set Types
• Sets are usually stored as bit strings in memory• Example: If a set has the ordinal base type
[‘a’ .. ‘p’] then variables of this set type can use the first 16 bits of a machine word, with each set bit: 1: representing a present element 0: representing an absent element0: representing an absent element
• The set value [‘a’, ‘c’, ‘h’, ‘o’] would be t d 1010000100000010
Copyright © 2006 Addison-Wesley. All rights reserved. 1-76
represented as 1010000100000010
Pointer and Reference Types
• A pointer type variable has a range of values that consists of memory addresses and a special value, nil
• Provide the power of indirect addressingP id t d i• Provide a way to manage dynamic memory– A pointer can be used to access a location in the
area where storage is dynamically created (usually called a heap)
Copyright © 2006 Addison-Wesley. All rights reserved. 1-77
Design Issues of Pointers
• What are the scope of and lifetime of a pointer variable?
• What is the lifetime of a heap-dynamic variable?• Are pointers restricted as to the type of value to
which they can point?c t ey ca po t• Are pointers used for dynamic storage
management indirect addressing or both?management, indirect addressing, or both?• Should the language support pointer types,
f t b th?Copyright © 2006 Addison-Wesley. All rights reserved. 1-78
reference types, or both?
Pointer Operations• Two fundamental operations: assignment and
d f idereferencing• Assignment is used to set a pointer variable’s
l f l ddvalue to some useful address• Dereferencing yields the value stored at the
location represented by the pointer’s value– Dereferencing can be explicit or implicit– C/C++ uses an explicit operation via * operatorj = *ptr;
Copyright © 2006 Addison-Wesley. All rights reserved. 1-79
sets j to the value located at ptr
Pointer Assignment Illustrated
The assignment operation j = *ptr
Copyright © 2006 Addison-Wesley. All rights reserved. 1-80
g p j p
Problems with Pointers
• Dangling pointers: A pointer points to a heap-dynamic variable that has been deallocated
1 Allocate a heap-dynamic variable and set a1. Allocate a heap dynamic variable and set a pointer to point at it
2 Set a second pointer to the value of the first2. Set a second pointer to the value of the first pointer
3 D ll h h d i i bl i3. Deallocate the heap-dynamic variable, using the first pointer
Copyright © 2006 Addison-Wesley. All rights reserved. 1-81
Problems with Pointers (cont.)
• Lost heap-dynamic variable: An allocated heap-dynamic variable that is no longer accessible to the user program (often called garbage)p g g g– Pointer p1 is set to point to a newly created heap-
dynamic variabledynamic variable– Pointer p1 is later set to point to another newly
created heap dynamic variablecreated heap-dynamic variable
Copyright © 2006 Addison-Wesley. All rights reserved. 1-82
Pointers - Pascal
• Used for dynamic storage management and indirect addressing
• Explicit dereferencing (postfix ^)Explicit dereferencing (postfix )• Dangling pointers are possible (dispose)• Dangling objects are also possible
Copyright © 2006 Addison-Wesley. All rights reserved. 1-83
Example
var vari: integer;p: ^integer;
begin
p, q: ^integer;
begin
( )begini := 1;p := @i;^ 2
new(p);
p^ := 3;
q = p;p^ := 2;writeln(i); 2
end.
q p;
writeln(p^);
dispose(p);
{q: dangling pointer}
end.
Copyright © 2006 Addison-Wesley. All rights reserved. 1-84
Pointers in Ada
• Some dangling pointers are disallowed because dynamic objects can be automatically deallocated at the end of pointer's scopedeallocated at the end of pointer s scope
• All pointers are initialized to null• The lost heap-dynamic variable problem is not
eliminated by Ada
Copyright © 2006 Addison-Wesley. All rights reserved. 1-85
Pointers in C and C++
• Extremely flexible but must be used with care• Used for dynamic storage management and
indirect addressing• Pointer arithmetic is possible• Explicit dereferencing and address-of operators• void * can point to any type and can be type
checked (cannot be explicitly dereferenced)• Used for parameters
– Advantages of both pass-by-reference and pass-
Copyright © 2006 Addison-Wesley. All rights reserved. 1-86
g p y pby-value
Pointers in Fortran 95
• Pointers point to heap and non-heap variables• Implicit dereferencing• Dangling pointers are possible (DEALLOCATE)• Dangling pointers are possible (DEALLOCATE)• Pointers can only point to variables that have the TARGET attribute– The TARGET attribute is assigned in the
declaration:INTEGER, TARGET :: <variable>
Copyright © 2006 Addison-Wesley. All rights reserved. 1-87
,
Example
real, target :: a, b INTEGER, DIMENSION(:),
real, pointer :: pa
pa => a
ALLOCATABLE :: a !rank 1
…
ALLOCATE(a(100))…
b = pa * c
…
ALLOCATE(a(100))
…
DEALLOCATE(a)
pa = 1.2345
print *, 'a = ', a
Copyright © 2006 Addison-Wesley. All rights reserved. 1-88
Reference Types
• C++ includes a special kind of pointer type ll d f t th t i d i il fcalled a reference type that is used primarily for
formal parametersA C f i i h i l– A C++ reference type is a pointer that is always implicitly dereferenced It t b i iti li d ith th dd f It must be initialized with the address of some
variables in its definition After initialization can never be set to reference After initialization can never be set to reference
any other variable The implicit dereference prevents assignment to
Copyright © 2006 Addison-Wesley. All rights reserved. 1-89
gthe address of a reference variable
Reference Types (cont.)
• Java extends C++’s reference variables and allows them to replace pointers entirely– No pointer arithmetic– Can only point at objects (which are all on the heap)– No explicit deallocator (garbage collection is used)No explicit deallocator (garbage collection is used),
that means there can be no dangling referencesDereferencing is always implicit– Dereferencing is always implicit
• C# includes both the references of Java and the
Copyright © 2006 Addison-Wesley. All rights reserved. 1-90
pointers of C++
Implementation of Pointer and Reference TypesReference Types
• Representation of pointers and references– Large computers use single values– Intel microprocessors use segment and offset
• Dangling pointer problem1 Tombstone: extra heap cell that is a pointer to1. Tombstone: extra heap cell that is a pointer to
the heap-dynamic variable The actual pointer variable points only at The actual pointer variable points only at
tombstones When heap-dynamic variable deallocated,
Copyright © 2006 Addison-Wesley. All rights reserved. 1-91
When heap dynamic variable deallocated, tombstone remains but set to nil
Implementation of Pointer and Reference Types (cont )Reference Types (cont.)
Copyright © 2006 Addison-Wesley. All rights reserved. 1-92
Implementation of Pointer and Reference Types (cont )Reference Types (cont.)2. Locks and keys: Pointer values are represented as
k dd i<key, address> pairs Heap-dynamic variables are represented as variable
l ll f i t l k lplus cell for integer lock value When heap-dynamic variable allocated, lock value is
created and placed in lock cell and key cell of pointercreated and placed in lock cell and key cell of pointer When a heap-dynamic variable is deallocated
explicitly its lock value is cleared Then although aexplicitly, its lock value is cleared. Then, although a pointer has the same address value but its key value will no longer match the lock, so the access will not be
Copyright © 2006 Addison-Wesley. All rights reserved. 1-93
g ,allowed
Evaluation of Pointers and References
• Dangling pointers and dangling objects are problems
• Pointers are like goto's - they widen the rangePointers are like goto s they widen the range of cells that can be accessed by a variableP i t f f• Pointers or references are necessary for dynamic data structures - so we can't design a language without them
Copyright © 2006 Addison-Wesley. All rights reserved. 1-94
Garbage Collection
• Problems with garbage collection– Many languages leave it up to the programmer to
design without garbage creation - this is very hard– Others arrange for automatic garbage collection
• Garbage collection with reference countsGarbage collection with reference counts– Does not work for circular structures
W k f i– Works great for strings– Should also work to collect unneeded tombstones
Copyright © 2006 Addison-Wesley. All rights reserved. 1-95
Copyright © 2006 Addison-Wesley. All rights reserved. 1-96
Garbage Collection (cont.)
• Mark-and-sweep– Achieved successfully in Ada, Java, ML, …– It processes in three main steps
1. The collector walks through the heap, tentatively marking every block as “useless”
2. Beginning with all pointers outside the heap, the collector recursively explores all linked data structures in the program marking each newly discovered blockin the program, marking each newly discovered block as “useful”
3 The collector again walks through the heap moving
Copyright © 2006 Addison-Wesley. All rights reserved. 1-97
3. The collector again walks through the heap, moving every block that is still marked “useless” to the free list
Garbage Collection (cont.)
– Several potential problems: The collector be able to tell where every “in-use”
block begins and ends In a language with variable-size heap blocks, every
block must begin with an indication of its size The collector must be able in Step 2 to find the
pointers contained within each block
Copyright © 2006 Addison-Wesley. All rights reserved. 1-98
Garbage Collection (cont.)• Garbage collection with pointer reversal
Th l ti t (St 2) f k d– The exploration step (Step 2) of mark-and-sweep collection is naturally recursive The implementation needs a stack whose maximum The implementation needs a stack whose maximum
depth is proportional to the longest chain through the heap
In practice, the space for this stack may not be available: after all, we run garbage collection when we’re about to run out of space!
– As the collector explores the path to a given block, it th i t it f ll th t h
Copyright © 2006 Addison-Wesley. All rights reserved. 1-99
it reverses the pointers it follows, so that each points back to the previous block
Copyright © 2006 Addison-Wesley. All rights reserved. 1-100
Garbage Collection (cont.)
• Stop-and-Copy– System divide the heap into two regions of equal size.
All allocation happens in the first half. When this half is ( l ) f ll th ll t b i it l ti(nearly) full, the collector begins its exploration
– Each reachable block is copied into the second half of the heap Any other pointer that refers to the samethe heap. Any other pointer that refers to the same block is set to point to the new locationWhen the collector finishes its exploration all useful– When the collector finishes its exploration, all useful objects have been moved (and compacted) into the second half of the heap, and nothing in the first half is
Copyright © 2006 Addison-Wesley. All rights reserved. 1-101
p, gneeded anymore
Summary• The data types of a language are a large part of what
determines that language’s style and usefulnessdetermines that language s style and usefulness• The primitive data types of most imperative
languages include numeric character and Booleanlanguages include numeric, character, and Boolean types
• The user-defined enumeration and subrange types• The user-defined enumeration and subrange types are convenient and add to the readability and reliability of programsy p g
• Arrays and records are included in most languages• Pointers are used for addressing flexibility and to
Copyright © 2006 Addison-Wesley. All rights reserved. 1-102
Pointers are used for addressing flexibility and to control dynamic storage management