introduction - philadelphia university jordan | home page · web viewalmost all programming...

Chapter -6: Data TypesIntroduction

• A data type defines a collection of data objects and a set of predefined operations on those objects

• A descriptor is the collection of the attributes of a variable. In an implementation, a descriptor is a collection of memory cells that store variable attributes. If the attributes are all static, descriptors are required only at compile time.

• These descriptors are built by the compiler, as a part of the symbol table, and are used during compilation.

• For dynamic attributes, part or all of the descriptor must be maintained during execution. In this case, descriptor is used by the run-time system.

• In all cases, descriptors are used for type checking and to build the code for the allocation and deallocation operations.

• An object represents an instance of a user-defined (abstract data) type

• One design issue for all data types: What operations are defined and how are they specified?

Primitive Data Types• Almost all programming languages provide a set of primitive data

types• Primitive data types: Those not defined in terms of other data

types• Some primitive data types are merely reflections of the hardware.

(E.g. integer types)• Others require little non-hardware support for their

implementation.• The primitive data types are used with one or more type

constructors, to provide the structured types.Primitive Data Types: Integer

• Almost always an exact reflection of the hardware so the mapping is trivial

• There may be as many as eight different integer types in a language

• Java’s signed integer sizes: byte, short, int, long

Ch6-1

• C++, C#, include unsigned integer types ( without sign)• A signed integer, value is represented in a computer by a string of

bits, the left most one represents sign.• A negative integer could be stored in sign-magnitude notation, in

which the sign bit is set to indicate negative and the reminder bits represent the absolute value of the number.

• Most computers now use a twos complement notation to store negative integers (take logical complement of the positive version of the number and adding one.

• E.g. -2 11111110 2= 00000010 Com=11111101 1 11111110L most bit = -128Add the rest of bits= 2+4+8+16+32+64=126Subtract= 128-126=-2

Primitive Data Types: Floating Point• Model real numbers, but only as approximations. Floating-point

values are represented as fractions and exponents.• Languages for scientific use supports at least two floating-point

types (e.g., float (4 bytes) and double (8 bytes).• The collection of values that can be represented as a floating-

point type is defined in terms of precision and range. • Precision is the accuracy of the fractional part of a value

measured as number of bits.• Range is a combination of the range of fractions and range of

exponents.• Usually exactly like the hardware (i.e. language implementers use

whatever representation is supported by the hardware), but not always

• Most newer machines use the IEEE Floating-PointStandard 754 format

Ch6-2

Primitive Data Types: Decimal• Most large computers that are designed for business applications

(money) have hardware support for decimal data types– Essential to COBOL– C# offers a decimal data type

• Store a fixed number of decimal digits with the decimal point at a fixed position in the value

• Decimal types are stored using binary codes for the decimal digits (B CD).

• In some cases, they are stored one digit per byte, but in others they are packed two digits per byte. Either way, they take more storage than binary representations. It takes at least 4 bits to code a decimal digit. E.g. 7=0111, 9=1001, 3=0011 etc. to store 6-digit coded decimal number requires 24 bits of memory. Operations on decimal values are done in hardware on machines that have such capabilities; otherwise they are simulated in software.

• Advantage: accuracy (being able to precisely store decimal values)

Ch6-3

• Disadvantages: limited range (exponents are not allowed), wastes memory

Primitive Data Types: Boolean• Simplest of all types• Range of values: two elements, one for “true” and one for “false”• C98, exceptions in which numeric expressions are used as

conditionals. All operands with non-zero values are considered true, and zero is false.

• C99, C++ have Boolean type. They also allow numeric expression to be used as if they were Boolean.

• Java, C# not allowed.• Boolean types are used to represent switches or flags in programs.• Could be implemented as bits, but often as bytes

– Advantage: readability (more readable than using integer)Primitive Data Types: Character

• Stored as numeric codings• Most commonly used coding: ASCII (8-bit code): 128 characters,

Extended ASCII: 256 characters. Ada uses Extended• An alternative, 16-bit coding: Unicode

– Includes characters from most natural languages– Originally used in Java– C# and JavaScript also support Unicode

Character String Types • Values are sequences of characters• Design issues:

– Is it a primitive type or just a special kind of array?– Should the length of strings be static or dynamic?

Character String Types Operations• If strings are not defined as a primitive type, string data is usually

stored in arrays of single characters, and referenced as such in the language (C, C++).

• C, C++ use char arrays to store character strings and provide a collection of string operations through standard library whose leader file is string.h

• Character strings are determined with a special character, null, represent zero.

Ch6-4

• Char *str=”apples”; : str is a char pointer set to point at the string of characters, apples0 where 0 is the null character. This initialization of str is legal because character string literals are represented by char pointers, rather than the string itself.

• Typical operations:– Assignment and copying. In C, C++ – Comparison (=, >, etc.) – Catenation– Substring reference– Pattern matching

Character String Type in Certain Languages• C and C++ (C++ supports strings through its standard class

library String, also support array of characters)– Not primitive– Use char arrays and a library of functions that provide

operations header file string.h– Most commonly used library functions for character strings

C, C++ are (stecpy, strcmp, strlen, strcat)• SNOBOL4 (a string manipulation language)

– Primitive– Many operations, including elaborate pattern matching

• Java– Primitive via the String class

• Fortran95: treats strings as a primitive type and provides assignment, relational operators, catenation, and substring reference operations of them (slices).

Character String Length Options• There are several design choices regarding the length of string

values.• Static length string: the length can be static and set when the

string is created. COBOL, Java’s String class. Another Java class called stringBufferclass of changeable values.

• Limited Dynamic Length: allowing strings to have varying length up to a declared and fixed maximum set by the variable’s

Ch6-5

definition. Such string variables can store any number of characters between zero and the maximum. C and C++

– In C-based language, a special character is used to indicate the end of a string’s characters, rather than maintaining the length

• Dynamic length strings: allow strings to have varying length with no maximum. This option requires the overhead of dynamic storage allocation and deallocation but provides maximum flexibility. SNOBOL4, Perl, JavaScript

• Ada supports all three string length optionsCharacter String Type Evaluation

• Aid to writability. Dealing with strings as arrays is more difficult than dealing with primitive string type.

• As a primitive type with static length, they are inexpensive to provide. Providing strings through a standard library is nearly as convenient as having them as a primitive type.

• Dynamic length is nice and flexible, but is it worth the expense? The overhead of their implementation must be weighted against that additional flexibility.

Character String Implementation• Static length: compile-time descriptor need only during

compilation. Has 3 fields. See figure-1• Limited dynamic length: may need a run-time descriptor for

length to store both the fixed maximum length and the current length. See figure-2 (but not in C and C++ because the end of string is marked with null character. Do not need max length, because index values n array references are not range-checked in these languages). Static and dynamic length strings require no special dynamic storage allocation.

• Dynamic length: need run-time descriptor; allocation/deallocation is the biggest implementation problem which requires a complex storage management. Length and storage to which it is bound grow and shrink dynamically.

• Type approaches to support dynamic allocation/ deallocation

Ch6-6

- Strings can be stored in a linked list: drawbacks are extra storage occupied by the links, and the complexity of string operations.

- Or store complete strings in adjacent storage cells. Problem: when a string grows and the adjacent space is not available. Solution is to find another hole that fits the new string, and deallocate the previous hole.

- Although linked-list method requires more storage, the dynamic allocation process is simple, but some string operations are slow due to pointer chasing (sequential access)

- Using adjacent memory for complete strings results in faster string operations and required significantly less storage. But the allocation/ deallocation process is slower.

Compile- and Run-Time Descriptors

User-Defined Ordinal Types• An ordinal type is one in which the range of possible values can

be easily associated with the set of positive integers• Examples of primitive ordinal types in Java

– Integer, char, Boolean

Ch6-7

Run-time descriptor for limited dynamic

strings

Compile-time descriptor for static strings

Figure-2Figure-1

Type name

Address of first

character

• In some languages, users can define two kinds of ordinal types: (enumerated and subrange).

Enumeration Types• All possible values, which are named constants, are provided in

the definition• C# example

enum days {mon, tue, wed, thu, fri, sat, sun};the enumeration constants are typically implicitly assigned the integer values, 0, 1,… etc.

• Design issues– Is an enumeration constant allowed to appear in more than

one type definition, and if so, how is the type of an occurrence of that constant checked?

– Are enumeration values coerced to integer?– Are any other type coerced to an enumeration type?– All these design issues are related to type checking.

• If an enumeration variable is coerced to a numeric type, there is little control over the range of legal operations or its range of values.

• If an int type value is coerced to an enumeration type, an enumeration type variable could be assigned any integer value, whether it represented an enumeration constant or not.

• Design- If a language does not have enumeration types, we could

simulated ito e.g. Fortran77: INTEGER RED, BLUE

DATA RED, BLUE /0, 1/The problem here: since we did not define a type for our colors, there is no type checking when they are used. E.g. it would be legal to add the two together. Also they could be combined with any other numeric type operand with any arithmetic operator. Also, because they are just variables, they could be assigned any integer value, destroying the relationship with the colors, although to solve this latter issue we could make them named constants.

Ch6-8

o C, Pascal: 1st include enumeration data type. C++ includes C’s enumeration type.C++ could have enum colors {red, blue, green, yellow, black}; Colors mycolor= blue; youcolor=red;Enumeration values are coerced to int when they are put in integer context. E.g. if current value of mycolor is blue, the statement mycolor++ would assign green to mycolor.o C++ allows enumeration constants to be assigned to variables of any numeric type, though that would most often be an error- No other type value is coerced to an enumeration type in C++, mycolor=4; is legal, R.H.S sould be cast to C++ enumeration constants can appear in only one enumeration type in the same referencing environment.o Ada, enumeration literals are allowed to appear in more than one declaration in the same referencing environment. These are called overloaded literals.- The rules for solving overloading must be determined from the context. E.g. if an overloaded literal and an enumeration variable are compared, the literal’s type is resolved to be that of the variable.- Because neither the enumeration literals nor the enumeration variables in Ada are coerced to integer, both the range of operations and the range of values of enumeration types are restricted, allowing many errors to be compiler-detected.

Evaluation of Enumerated Type• Aid to readability, e.g., no need to code a color as a number.

Named values are easily recognized.• Aid to reliability, e.g., compiler can check: in C#, Ada, Java0.5

– Operations (don’t allow colors to be added). No arithmetic operations are legal on enumeration types.

– No enumeration variable can be assigned a value outside its defined range

– Ada, C#, and Java 5.0 provide better support for enumeration than C++ because enumeration type variables in these languages are not coerced into integer types

Ch6-9

• C treats enumeration variables like integer variables.• C++ numeric values can be assigned to enumeration type

variables only if they are cast to type of the assigned variable.Subrange Types

• An ordered contiguous subsequence of an ordinal type– Example: 12..18 is a subrange of integer type. Introduced in

Pascal, included in Ada.• Ada’s design

- In Ada, sub ranges are included in the category of types called subtypes.

- In Pascal Type strIndex=0..mastrLength; var I: strIndex;

type Days is (mon, tue, wed, thu, fri, sat, sun);subtype Weekdays is Days range mon..fri;subtype Index is Integer range 1..100;- all operations defined for the parent type are also defined for the subtype, except assignment of values outside the specified range. E.g. in the following:Day1: Days;Day2: Weekday;Day2 := Day1;

- The assignment is legal unless the value of Day1 is sat to sun.- The compiler must generate range-checking code for every

assignment to subrange variable sub ranges require run-time range checking.

Subrange Evaluation• Aid to readability

– Make it clear to the readers that variables of subrange can store only certain range of values

• Reliability

Ch6-10

– Assigning a value to a subrange variable that is outside the specified range is detected as an error

Implementation of User-Defined Ordinal Types• Enumeration types are implemented as integers• Subrange types are implemented like the parent types with code

inserted (by the compiler) to restrict assignments to subrange variables

Array Types• An array is an aggregate of homogeneous data elements in which

an individual element is identified by its position in the aggregate, relative to the first element.

• A reference of an array element in a program includes one or more non-constant subscripts. Such references require a run-time calculation to determine the memory location being referenced.

Array Design Issues• What types are legal for subscripts?• Are subscripting expressions in element references range

checked?• When are subscript ranges bound?• When does allocation take place?• What is the maximum number of subscripts?• Can array objects be initialized?• Are any kind of slices allowed?

Array Indexing• Specific element of an array is referenced by aggregate name, and

subscripts or indexes.• Indexing (or subscripting) is a mapping from indices to elements

array_name (index_value_list) ® an element• Index Syntax

– FORTRAN, PL/I, Ada use parentheses• Ada explicitly uses parentheses to show uniformity

between array references and function calls because both are mapping. E.g. sum:=sum+B(I);

• When need another information to determine whether B(I) is a function call or an array reference. Reduce readability.

Ch6-11

– Most other languages use brackets– Two district types are involved in an array type: element

type, and type of subscripts.Arrays Index (Subscript) Types

• FORTRAN, C: integer only• Pascal: any ordinal type (integer, Boolean, char, enumeration)• Ada: integer or enumeration (includes Boolean and char)• Java: integer types only• C, C++, Perl, and Fortran do not specify range checking of

subscripts• Java, ML, C# specify range checking• Ada checks the range of all subscripts, but this feature can be

disabled by the programmer.Subscript Binding and Array Categories

• The binding of the subscript type to an array is usually static, but the subscript value ranges are sometimes dynamically bound.

• Lower bound of the subscription range, in some languages, is implicit. E.g. C-based fixed to zero, Fortran it default to 1, Pascal subscript ranges must be specified by the programmer.

• There are five categories of arrays, based on the binding to subscript value ranges and the binding to storage.

• Static: subscript ranges are statically bound and storage allocation is static (before run-time)

– Advantage: efficiency (no dynamic allocation)• Fixed stack-dynamic: subscript ranges are statically bound, but

the allocation is done at declaration elaboration time during execution

– Advantage: space efficiency. A large array in one subprogram can use the same space as a large array in a different subprogram, as long as both subprograms are not at the same time.

• Stack-dynamic: subscript ranges are dynamically bound and the storage allocation is dynamic (done at run-time). Once the subscript range is bound and the storage is allocated, they remain fixed during the lifetime of the variable.

Ch6-12

– Advantage: flexibility (the size of an array need not be known until the array is to be used)

• Fixed heap-dynamic: similar to fixed stack-dynamic: subscript range and the storage binding are dynamic but fixed after allocation. The differences are that the bindings are done when the user program requests them, rather than at elaboration time, and the storage s allocated from the heap, rather than the stack.

• Heap-dynamic: binding of subscript ranges and storage allocation is dynamic and can change any number of times during the array’s lifetime.

– Advantage: flexibility (arrays can grow or shrink during program execution)

• Examples of the categories:o C and C++ arrays that include static modifier are statico C and C++ arrays without static modifier are fixed stack-

dynamico Ada arrays can be stack-dynamic. E.g.:

o C and C++ provide fixed heap-dynamic arrays. - malloc, free (general heap allocation and deallocation operations), can be used for C arrays.- C++ uses operations (new, delete) to manage heap storage.- Fortran95 supports fixed heap-dynamic arrays, also C#.- In Java all arrays are fixed heap-dynamic array. Once created, they keep the same subscript ranges and storage.o C# includes a second array class ArrayList that provides heap-

dynamic array. Objects of this class are created without any elements.

ArrayList int List= new ArrayList();Elements are added to this object with (Add) method. ArrayList.Add(nextone);

Ch6-13

The user inputs the number of desired elements for the array list. The elements are then dynamically allocated when execution reaches the declare block. When execution reaches the end of the block, the list array is deallocated.

o Perl and JavaScript support heap-dynamic arrays. Arrays implicitly grow whenever assignments are made to elements beyond the last current element, and shrink by assign them an empty aggregate().

e.g.: In Pearl we could creat an array of 5 numbers with @list=(1,2,4,7,19); The array could be lengthend with (push) function Push(@list,13 ,17) To become, (1, 2, 4, 7, 19, 13, 17). And emptied with @list=();

Array Initialization• Some language allow initialization at the time of storage

allocation– C, C++, Java, C# example

int list [] = {4, 5, 7, 83}; compiler sets the length of the array– Character strings in C and C++ implemented as array of char

char name [] = “freddie”; The array name will have 8 elements, because all strings are terminated with null character (zero), which implicitly supplied by the system.

– Arrays of strings in C and C++ can initialized with string literalschar *names [] = {“Bob”, “Jake”, “Joe”];

– Java initialization of String objectsString[] names = {“Bob”, “Jake”, “Joe”};

Arrays Operations• APL provides the most powerful array processing operations for

vectors and matrixes as well as unary operators (for example, to reverse column elements). E.g. A+B is valid expression, where A and B are scalar variables, vectors, or matrixes.

• Ada allows array assignment but also catenation (&). Catenation is defined between two single-dimensioned arrays and between a single-dimensioned array and a scalar.

• Fortran provides elemental operations because they are between pairs of array elements

Ch6-14

– For example, + operator between two arrays results in an array of the sums of the element pairs of the two arrays

Rectangular and Jagged Arrays• A rectangular array is a multi-dimensioned array in which all of

the rows have the same number of elements and all columns have the same number of elements

• A jagged matrix has rows with varying number of elements. E.g. a jagged matrix may consist of 3 rows, one with 5 elements, one with 7 elements, and one with 12 elements. This also applies to the columns or higher dimensions.

– Jagged arrays are made possible when multi-dimensioned arrays actually appear as arrays of arrays

• C, C++, and Java support jugged arrays but nor rectangular arrays. Reference of an element of a multidimensional array uses a separate pair of brackets for each dimension. E.g.

myArray[3][7];• Fortran and Ada support rectangular arrays. All subscript

expression is references to elements are placed in a single pair of brackets. E.g. myArray[3, 7];

• C# supports both.

Slices• A slice of an array is some substructure of that array; e.g. if A is a

matrix, the 1st row of A is one possible slice. Last row, 1st column are also.

• It is not a new data type, it is nothing more than a referencing mechanism

• Slices are only useful in languages that have array operations, (i.e. if arrays cannot be manipulated as units, that language has no use for slices).

Slice Examples• Fortran 95

Integer, Dimension (10) :: VectorInteger, Dimension (3, 3) :: MatInteger, Dimension (3, 3, 4) :: Cube

Ch6-15

Remember that the default lower bound for Fortran array is 1.Vector (3:6) is a four element arrayMat(:, 2) referes to the 2nd column of Mat.Mat(3, :) referes to the 3rd row of Mat.All of these references can be used as singl-dimensioned arrays. References to all array slices are treated as if they were arrays of the remaining dimensionality.Slices Examples in Fortran 95

Implementation of Arrays• Implementation arrays require more compile-time effort than

does implementing simple types (integer). The codes to allow accessing of array elements must be generated at compile time. At run time, this code must be executed to produce element addresses.

• Access function maps subscript expressions to an address in the array

• A single-dimensioned array is a list of adjacent memory cells. Suppose the lower bound of array list is 1.

• Access function for single-dimensioned arrays:

Ch6-16

address(list[k]) = address (list[lower_bound])+ ((k-lower_bound) * element_size)

• If the element type is statically bound and the array is statically bound to storage, then the value of address (list[lower-bound]) can be computed before run time.

• If the base, or beginning address, of the array is not known until run time, the subtraction, must be done when the array is allocated.

Accessing Multi-dimensioned Arrays• Values of data types that have two or more dimensions must be

mapped onto the single-dimensional memory.• Two common ways:

– Row major order (by rows) – used in most languages– column major order (by columns) – used in Fortran

Locating an Element in a Multi-dimensioned Array• General format

Location (a[i,j]) = address of a [row_lb,col_lb] + (((i - row_lb) * n) + (j - col_lb)) * element_size

- Row major order

- Col. Major order

Ch6-17

- For matrix in row major order, the number of elements that precedes an element is the number of rows above the element times the size of the row, plus the numbers of elements to the left of the element.

Compile-Time Descriptors

Associative Arrays• An associative array is an unordered collection of data elements

that are indexed by an equal number of values called keys. In non-associative arrays, the indices never need to be stored, because of their regularities.

– In an associative array, user defined keys must be stored in the structure. So, each element of an associative array is a pair of entities (key, value).

• Design issues: What is the form of references to elementsAssociative Arrays in Perl

• Are called hashes, because elements are stored and retrieved with hash functions.

• Names begin with %; literals are delimited by parentheses%hi_temps = ("Mon" => 77, "Tue" => 79, “Wed” => 65, …);

• Subscripting is done using braces and keys.• The key value is placed in braces and the hash name is replaced

by a scalar variable name that is the same except for the first character.

Ch6-18

Single-dimensioned array Multi-dimensional array

$hi_temps{"Wed"} = 83;– Elements can be removed with delete

delete $hi_temps{"Tue"};– The entire hash can be emptied by assignng empty literal to it @hi_temps=();– The size of a pearl hash is dynamic (grows and shrinks).– The exists operators returns true or false; depending on whether its

operand key is an element in the hash if (exists $hi_temps {“Tue”} …– PHP’s arrays are both normal arrays and associative.– A hash is much better than an array if searches of the elements are

required, because the implicit hashing operation used to access hash elements is very efficient. On the other hand, if every element of a list must be processed, it would be more efficient to use an array.

Record Types• A record is a possibly heterogeneous aggregate of data elements

in which the individual elements are identified by names• Design issues:

– What is the syntactic form of references to the fields? – Are elliptical references allowed?

Definition of Records• COBOL uses level numbers to show nested records; others use

recursive definition 01 EMP-REC. 02 EMP-NAME. 05 FIRST PIC X(20). 05 MID PIC X(10). 05 LAST PIC X(20). 02 HOURLY-RATE PIC 99V99.

• Ada: Record structures are indicated in an orthogonal way type Emp_Rec_Type is record

First: String (1..20);Mid: String (1..10);

Ch6-19

Last: String (1..20);Hourly_Rate: Float;

end record;Emp_Rec: Emp_Rec_Type;

• Record Field References1. COBOLfield_name OF record_name_1 OF ... OF record_name_n; where recore_name_1 is the smallest or innermost record that contains the field. Ex: MID OF EMP_NAME OF EMP_REC2. Others (dot notation)record_name_1.record_name_2. ... record_name_n.field_nameEx: Employee_Record.Employee_name.Mid

References to Records• Most language use dot notation

Ex: reference to the field mid in Ada record example• Fully qualified references must include all record names• Elliptical references allow leaving out record names as long as the

reference is unambiguous, for example in COBOLFIRST, FIRST OF EMP-NAME, and FIRST of EMP-REC are

elliptical references to the employee’s first nameOperations on Records

• Assignment is very common if the types are identical• Ada allows record comparison for equality or inequality.• Ada records can be initialized with aggregate literals• COBOL provides MOVE CORRESPONDING statement for:

– Copies a field of the source record to the corresponding field in the target record

Evaluation and Comparison to Arrays• Design of record is straight forward and safe design.• The only aspect of records that is not clearly readable is the

elliptical references allowed by COBOL.• Records are used when collection of data values is heterogeneous

and the different fields are not processed in the same way.

Ch6-20

• Access to array elements is much slower than access to record fields, because subscripts are dynamic (field names are static)

• Dynamic subscripts could be used with record field access, but it would disallow type checking and it would be much slower

Implementation of Record Type The field of records are stored in adjacent memory location , but

because the sizes of fields are not necessarily the same, the access method used for arrays is not used for records.

Unions Types

Ch6-21

Offset address relative to the

beginning of the records is associated

with each field.And the field access is accomplished by using these offsets.

Compile-time descriptor for record

• A union is a type whose variables are allowed to store different type values at different times during execution. Example, table of contents for a compiler.

• Design issues – Should type checking be required? And this must be

dynamic.– Should unions be embedded in records?

Discriminated vs. Free Unions• Fortran, C, and C++ provide union constructs in which there is no

language support for type checking; the union in these languages is called free union because programmers are allowed complete freedom from type checking in their use.

• Type checking of unions require that each union include a type indicator called a discriminant or tag (discriminated union). ALGOL68 was the 1st language to provide it.

– Supported by Ada

Ada Union Types Ada design for discriminated unions, allowes the user to specify

variables of a variant record type that will store only one of the possible type values in the variant. In this way the user can tell the system when the type checking can be static. Such a restricted variable is called a constrained variant variable.

Unconstrained variable records in Ada allow the values of their variants to change type during execution.

The type of variant can be changed only by assigning the entire record, including the discriminant.

Ex: Consider Ada Variant recordtype Shape is (Circle, Triangle, Rectangle);type Colors is (Red, Green, Blue);type Figure (Form: Shape) is record

Filled: Boolean;Color: Colors;case Form is

when Circle => Diameter: Float;when Triangle =>

Ch6-22

Leftside, Rightside: Integer;Angle: Float;

when Rectangle => Side1, Side2: Integer;end case;

end record;the following two statements declare variables of type figureFigure_1: Figure;// Unconstrained variable record that has no initial value. Its type can be changed by assignment of whole record. Figure_1:=(FilledTrue, ColorBlue Form Rectangle Side_112 Side_23);

Figure_2:Figure(FormTrangle);// Is constrained to be triangle and cannot be changed to another variant.

Ada Union Type Illustrated

Evaluation of Unions• Potentially unsafe construct

– Do not allow type checking. This way Fortran, C, C++ are not strongly typed

Ch6-23

A discriminated union of three shape variables(Assume all the variables are the same size)

• Java and C# do not support unions– Reflective of growing concerns for safety in programming

language• Discriminated unions are implemented by simply using the same

address for every possible variant. Sufficient storage of the largest variant is allocated.

Pointer and Reference Types• A pointer type variable has a range of values that consists of

memory addresses and a special value, nil. The value nil is not a valid address and is used to indicate that a pointer cannot currently be used to reference any memory cel.

• Provide the power of indirect addressing• Provide a way to manage dynamic memory• A pointer can be used to access a location in the area where

storage is dynamically created or allocated (usually called a heap)• Variables that are dynamically allocated from the heap are called

heap-dynamic variables, which do not have identifiers associated with them, and can be referenced only by pointers or variables.

• Variables without names are called anonymous variables.• Pointers are not structured types, although are defined using the

type operator (* in C and C++, access in Ada).• Pointers are different from scalar variables because they are most

often used to reference some other variables, rather than being used to store data of same sort.

• Pointers add writability to a language (dynamic structures trees, linked lists).

Design Issues of Pointers• What are the scope of and lifetime of a pointer variable?• What is the lifetime of a heap-dynamic variable?• Are pointers restricted as to the type of value to which they can

point?• Are pointers used for dynamic storage management, indirect

addressing, or both?• Should the language support pointer types, reference types, or

both?Pointer Operations

Ch6-24

• Two fundamental operations: assignment and dereferencing• Assignment is used to set a pointer variable’s value to some

useful address• Dereferencing yields the value stored at the location represented

by the pointer’s value– Dereferencing can be explicit or implicit– C++ uses an explicit operation via *

j = *ptrsets j to the value located at ptr

Pointer Assignment Illustrated

The assignment operation j = *ptrWhen pointers point to records, the syntax of the references to the fields of these records varies among languages.C, C++, there are two ways. If a pointer variable P points to a record with a field named age, we use, (*p).age, another way pageProblems with Pointers

• Dangling pointers (dangerous)

Ch6-25

– A pointer points to a heap-dynamic variable that has been de-allocated. Dangling pointers are dangerous for several reasons:

1. The location being pointed to may have been reallocated to some new heap-dynamic variable. If the new variable is not the same type as the old one, type checks of uses of the dangling pointers are invalid.

2. Even if the new one is the same type, its new value will bear no relationship to the old pointer’s dereferenced value.

3. If the dangling pointer is used to change the heap-dynamic variable will be destroyed.

4. It is possible that the location now is being temporarily used by the storage management system, possibly as a pointer in a chain of variable blocks of storage, thereby allowing a change to the location to cause the storage manager to fail.

Ex: C++int *arrayptr1;int *arrayptr2=new int [100]; // create heap-dynamic structurearrayptr1=arrayptr2;delete []arrayptr2;//new, arrayptr1 is dangling, because the heap storage to which it was pointing has been deallocated.• Lost heap-dynamic variable

– An allocated heap-dynamic variable that is no longer accessible to the user program (often called garbage). Lost heap-dynamic variables are created by the following sequence of operations.

• Pointer p1 is set to point to a newly created heap-dynamic variable

• Pointer p1 is later set to point to another newly created heap-dynamic variable

• The 1st heap-dynamic variable is now inaccessible, or lost (memory leakage)

Pointers in Ada

Ch6-26

• Some dangling pointers are disallowed because dynamic objects can be automatically de-allocated at the end of pointer's type scope

• The lost heap-dynamic variable problem is not eliminated by AdaPointers in C and C++

• Extremely flexible but must be used with care• Pointers can point at any variable regardless of when it was

allocated• Used for dynamic storage management and addressing• Pointer arithmetic is possible• Explicit dereferencing and address-of operators the asterisk (*)

denotes the dereferencing operation, and ampersand (&) denotes the operator for producing the address of a variable.

Ex: int *ptr;int count, init;…ptr=&init; are equivalent to count=init;count=*ptr;

• Domain type need not be fixed (void *) (generic pointers)• void * can point to any type and can be type checked (cannot be

de-referenced)Pointer Arithmetic in C and C++float stuff[100];float *p;p = stuff; //assign the address of stuff[0] to p

*(p+5) is equivalent to stuff[5] and p[5]*(p+i) is equivalent to stuff[i] and p[i]Pointers in Fortran 95

• Pointers point to heap and non-heap variables (static)• Implicit dereferencing• Pointers can only point to variables that have the TARGET

attribute• The TARGET attribute is assigned in the declaration:

Ch6-27

INTEGER, TARGET :: NODEReference Types

• C++ includes a special kind of pointer type called a reference type that is used primarily for formal parameters. Reference type variables are specified y (&)

Ex: int result=0;int &ref_result=result;…ref_result=100;result and ref_result are aliases.

– Advantages of both pass-by-reference and pass-by-value • Java extends C++’s reference variables and allows them to

replace pointers entirely– References refer to class instances. Java reference variables

can be assigned to refer to different class instances. In the following, String is a standard Java class

String str1;…str1=”This is a Java literal string”;str1 is defined to be a reference to a string class instance or object.Because Java class instances are implicitly deallocated, there cannot be a dangling reference.

• C# includes both the references of Java and the pointers of C++Evaluation of Pointers

• Dangling pointers and dangling objects are problems as is heap management

• Pointers are like goto's--they widen the range of cells that can be accessed by a variable

• Pointers or references are necessary for dynamic data structures--so we can't design a language without them

Representations of Pointers• Large computers use single values• Intel microprocessors use segment and offset. So pointers are

references and implemented as pairs of 16-bits cells, one for each of the two parts of an address.

Ch6-28

Dangling Pointer Problem• There have been several proposed solutions to the dangling

pointer problem.• Tombstone: extra heap cell that is a pointer to the heap-dynamic

variable– The actual pointer variable points only at tombstones and

never to heap-dynamic variables.– When heap-dynamic variable de-allocated, tombstone

remains but set to nil, indicating that the heap-dynamic variable no longer exists.

– This approach prevents a pointer from ever pointing to deallocated variable. Any reference to any pointer that point to a nil tombstone can be detected as an error.

– Tombstones are costly in time and space. Because tombstones are never deallocated, their storage is never reclaimed. Every access to heap-dynamic variable through a tombstone requires one more level of indirection access.

. an alternatve locks-and-keys: Pointer values are represented as ordered (key, address) pairs

– Heap-dynamic variables are represented as variable plus cell for integer lock value

– When heap-dynamic variable allocated, lock value is created and placed in lock cell and key cell of the pointer that is specified in the call to new.

– Every access to the dereferenced pointer compares the key value of the pointer to the lock value in the heap-dynamic variable. If they match, the access is legal; otherwise, the access is treated as run-time error.

– Any copies of the pointer value to other pointer must copy the key value. Therefore, any number of pointers can reference a given heap-dynamic variable. When a heap-dynamic variable is deallocated with dispose, its lock value is cleared to an illegal lock value. Then, if the pointer other than the one specified in the dispose is dereferenced, its address value will still be intact, but its key value will no longer match the lock, so the address will not be allowed.

Ch6-29

Heap Management• A very complex run-time process• Single-size cells vs. variable-size cells. i.e. all heap storage is

allocated and deallocated in units of a single size, or, in which variable size segments are allocated and deallocated.

• Single-size cell: every cell contains a pointer (like Lisp)• In a single-size allocation heap, all available cells are linked

together using the pointer in the cells, forming a list of available space. Allocation is simple taking required number of cells from this list.

• Deallocation is complex. A heap-dynamic variable can be pointed to by more than one pointer making it difficult to determine when the variable is no longer useful to the program. One pointer is disconnected from a cell does not make it garbage.

• Two approaches to reclaim garbage– Reference counters (eager approach): reclamation is

gradual– Garbage collection (lazy approach): reclamation occurs

when the list of variable space becomes emptyReference Counter

• Reference counters: maintain a counter in every cell that stores the number of pointers currently pointing at the cell. If reference counter reaches zero, it considered garbage and returned to free list.

– Disadvantages: space required for the counters, execution time required to maintain counter value if pointers changing values heavily (Lisp), complications for cells connected circularly. The problem is that each cell in the circular list has a reference counter value of at least 1, which prevents it from return to available list.

Garbage Collection• The run-time system allocates storage cells as requested and

disconnects pointers from cells as necessary until it has allocated all available cells. Garbage collection then begins to gather al the garbage left floating around in the heap.

Ch6-30

– To facilitate the process, every heap cell has an extra bit used by collection algorithm. The process consists of 3 phases:

– All cells in the heap initially set to garbage– All pointers in program are traced into heap, and reachable

cells marked as not garbage– All garbage cells returned to list of available cells. To see

how marking algorithm works: assume all heap-dynamic variables (heap cells), consist of information part (tag), and two pointers (Llink, Rlink). We build directed graphs with at most two edges leading from any node. Marking algorithm traverse all spanning trees of the graphs marking all cells that are found

For every pointer r do Mark(r)Void mark (void *ptr){ If (ptr !=0) If (*ptr.tag is not marked) { Set *ptr.tag; Mark (*ptr.llink); Mark (*ptr.rlink); } }– Disadvantages: when you need it most, it works worst (takes

most time when program needs most of cells in heap) because it makes a good deal of time to trace and mark useful cells.

– In this case, the process yields only a small number of cells that can be placed in the free list. Marking algorithm requires a great deal of storage (for stack) because of recursion.

Ch6-31

Marking Algorithm

Variable-Size Cells• All the difficulties of single-size cells plus more• Variable size are required by most programming languages• If garbage collection is used, additional problems occur

– The initial setting of the indicators of all cells in the heap to indicate that they are garbage is difficult. Because cells are different sizes, scanning them is a problem. One solution is for each cell to have the cell size as its first field.

– The marking process in nontrivial. How can a chain be followed from a pointer if there is no predefined location for the pointer in the pointed-to cell?

Ch6-32

– Cells that do not contain pointers at all are also a problem. Adding a system pointer to each cell will work, but it must be maintained in parallel with the user-defined pointers. This task adds space and execution time overhead to the cost of running the program.

– Maintaining the list of available space is another source of overhead. The list can begin with a single cell consisting of all available space. Requests for segments reduce the size of this block. Reclaimed cells are added to the list. The list becomes a long list of various size segments (blocks). This slows allocation because requests cause the list to be searched for sufficiently large blocks. Then fragmentation is very high. Need to compact. Or use best-fit strategy which needs to keep the list ordered by block size, which is an overhead time.

Ch6-33

introduction - philadelphia university jordan | home page · web viewalmost all programming...

Documents