data types and data structures

Data Types and Data Structures

Ranga Rodrigo

No.

Does the computerknow about data types?

Data Types

Computer programs manipulate data of various types, such as:

numbers, both integral and floating point, characters, based on the ASCII code, boolean values, and compound structures such as arrays and records.

In memory, however, all data is held as bit patterns

which must be interpretedbefore the data can be processed.

There is clearly a threat of insecurity here: if a bit pattern is interpreted wrongly,

the program may crash or produce erroneous results.

Type Errors

Type errors arise when an operation defined for one type of data is applied to another.E.g., if you try to add an array to a string. In general, these errors are detected at run-time if and when the run-time system tries to execute the erroneous operation.

Untyped Languages

E.g., Perl.It is the programmer's responsibility to avoid run-time type errors. Any variable can store data of any type, and it is up to the programmer to make sure that operations are only applied to data of the correct type. Interpreted languages are often untyped.

Untyped Languages

Variables do not have a type. Programmers have to keep track of what is stored where. Errors may only be caught at run-time, when it may be too late. Worse, data corruption may take place unnoticed.

Typed Languages

Typed languages try to use the compiler to detect type errors.This is to ensure that programs will not crash at run-time.This is widely seen as a crucial aspect of language security.

Typed Languages

Variables and similar entities have defined types. Each type has a range of permissable operations defined for it.The compiler can therefore ensure that operations are only applied to data of the correct sort.

What is a type-securelanguage?

A type-secure language is one which would in no circumstances give rise to a run-time error related to

types.

It is hard to guarantee this property.

Languages

Untyped Typed

Strongly Weakly Pascal Eiffel

C

Perl

Weak and Strong Typing

The distinction relates to the extent to which the compiler will silently convert data of one type to another related type.E.g., converting numeric types, or integers to addresses in C. Weak typing has the potential to let through more errors than strong typing.

Type Conversion and Casting

Sometimes it is convenient or necessary to convert data at run time from one type to another. A common example is given by calculations which need to mix integer and floating point data. Here, some languages, including C++, Java and Eiffel, will carry out some data conversion automatically, e.g., changing an integer data item into the corresponding floating point value.

Conversions in Eiffel

Some conversions might involve a loss of information: for example, converting a floating point value like 3.14159 into an integer. C++ allows such conversions, whereas Java and Eiffel compilers report an error in this case.If the programmer wants the conversion to go ahead, an explicit function call can be inserted in Eiffel to specify exactly what conversion is required.

In C++ and Java

void main { int i ; float f ; i = (int) f ;}

Casing can be used.

Dangers of Casting

There are dangers associated with the unrestricted use of casts.It provides a means for a programmer to override the type checks implemented by the compiler.Casting is a common source of programming errors. For this reason, languages intended to be secure, like Eiffel, do not support casting.

Types

Value types Reference types

The actual data of interest---an integer or a boolean value, say---will be stored in the allocated memory.

A memory address will be stored. The actual data will be placed at the memory location pointed to by the reference.

Reference Types

Reference types allow the same data to be referred to at different points in a program, and complex data structures to be constructed. Reference types allow data structures to be passed as parameters efficiently: instead of copying the whole structure, a reference is passed.

Value Types

With value types, there is no danger of accidental corruption of data through sharing. Value types avoid the overhead of having to de-reference an address before getting hold of the data to be processed.

Types in Java

All classes in Java define reference types: this means that if Person, is a class, then following a variable declaration such as

the variable p will only be able to hold references. Java also defines a range of primitve value types---e.g., int, char, bool etc---which allow simple data to be manipulated efficiently.

Person p ;

Primitive Types in Java

The values of the primitive types lie outside the Java class hierarchy.So in one sense Java is not a pure object-oriented language.To get round this, Java defines classes that correspond to the built-in types---Integer, Boolean etc.

Autoboxing and Unboxing

It can sometimes be rather clumsy and confusing to convert data between value and the corresponding reference types. To deal with this problem, later versions of Java have introduced autoboxing and unboxing---in effect, built in conversions between the primitive types and the corresponding reference types.

In C++, the distinction between value and reference types on the one hand, and simple and class data on the other is orthogonal. In other words, the two concepts are quite independent of each other. It is possible to have references to ints in C++, and equally for an instance of a class to be stored as a simple value.

Reference Types in C++

In C++, reference types are defined explicitly.Person p; //a valuePerson& pr; //a referencePerson* pp; //a pointer, also requiring dereferencing

This gives great flexibility in the way that memory is managed, but is also a common source of programming errors. By contrast, in Java it is impossible to take the address of or obtain a reference to an int and there is no equivalent of the pointer manipulations possible in C++.

Types in Eiffel

In Eiffel, every type is a class, including basic types like INTEGER. There are no special primitive types as in Java, so in this sense Eiffel is more object-oriented than Java. This makes the language very consistent and conceptually simple.

Types in Eiffel

However, to avoid the inefficiency involved if it was necessary to dereference addresses to calculate something like 3 + 4, Eiffel provides a mechanism of expanded types to enable data to be stored by value rather than by reference. In a way this is the opposite of C++:

in C++, all data is stored by value by default, and operators are provided to define reference types; in Eiffel, data is stored by reference by default, and an operator is defined enabling some data to be stored by value.

Expanded Types

If a class is defined as expanded, variables of that class hold data values, not references. The classes defining the basic types are all defined to be expanded:

expanded class INTEGER...

Expanded classes can't be unexpanded, so it's not possible to define a variable which holds a reference to an INTEGER, but for each expanded class there is a corresponding reference class defined in the Base Library, eg INTEGER_REF.

Expanded Types

Expanded classes cannot be unexpanded.So it is not possible to define a variable which holds a reference to an INTEGER, but for each expanded class there is a corresponding reference class defined in the Base Library, e.g., INTEGER_REF. So Eiffel can provide a consistent type system, without a performance handicap on basic types. For practical purposes, the language works much as expected; it is rarely necessary to deal explicitly with expanded types.

Specifying EXPANDED Variables

It is also possible to specify that individual variables are expanded, in which case they will hold data values instead of references:

In a case like this, the class must provide a creation procedure with no arguments, so that the variable can be correctly initialized.

x : expanded COUNTER

User-Defined Types

Classic languages, like Pascal, defined a range of basic types and a number of user-defined types which enabled programmers to define more complex data structures based on the basic types. User-defined types included sets, subtypes, enumerated types, records and arrays. In OO languages, the class is the main vehicle for the definition of user-defined types, essentially replacing and extending record types.

Enumerations

This is a user-defined type consisting of a fixed number of values normally given names and thought of as uninterpreted symbols; They are commonly implemented by assigning a unique integer value to each symbol. E.g., in C++ and Java 5.0:

In C++, this defines Colour to be a value type. In Java 5.0, this is a form of class definition: attributes and methods can be added, and enum types include the functionality inherited from Object.

enum Colour {red, yellow, green};

Enumerations in Eiffel

Eiffel does not support the declaration of enumeration types. A set of constant attributes to act as an enumeration can be defined:

Unique attributes are guaranteed to have values that are different from that of any other unique attribute defined in the same class. They are typically used in inspect statements to discriminate the various possible cases.

red : INTEGER is unique ;yellow : INTEGER is unique ;green : INTEGER is unique ;

Arrays

Since the beginning of programming, languages have included arrays to facilitate the handling of repeated data. Arrays are characterized by two basic properties:

The type of data contained in the array. The size of the array, specified either as number or elements, or by giving array bounds, i.e., the lowest and highest permissable indicies.

Arrays in C++

In C/C++ arrays are defined to be the same as pointers, ie the address of the area in memory holding the array. There is no notion of an array type, though the component type of arrays is given. Arrays are created by specifying the required length, but this length cannot then be checked at run-time: it is the programmer's responsibility to keep track of the end of an array, eg null-terminated strings. This is very insecure.

Arrays in Pascal

In Pascal, an array type is defined by the component type and the bounds, from which the length can be deduced.This was found to be very strict: for example a sort routine for arrays of one length won't type won't work for others, even though the algorithm would work unchanged.Pascal got round this problem by defining a looser array type for parameters, that specified only the component type of the array, and allowing the size of these arrays to be obtained at run-time.

Arrays in Java

In Java, arrays are quasi-objects, though there is no array class defined. In particular, you can find out the length of an array at run-time by calling something that looks very like a class method.

Arrars in General

Languages have converged on defining array types simply in terms of the component type, and letting the size of an array object be determined at run-time. In fact, no extra type security is obtained by including array size in the type, as the compiler cannot check that array bounds will not be exceeded at run-time, so run-time errors cannot be eliminated.

Arrays in Eiffel

In Eiffel, arrays are defined by a class, like all types. The syntax is the same as other classes, with the addition of special notation for array literals:

x : ARRAY[INTEGER]

create x.make(1, 10)

x := << 2, 4, 6, 8, 10, 12, 14, 16, 18, 20 >> -- assign constant array

x.put(42, 1) -- put 42 at position 1 in x

io.put_integer(x.item(1)) -- prints 42

io.put_integer(x @ 1) -- synonym for "item"

C++ Notation in Eiffel Arrays

In later versions of Eiffel, the same notation as C++ or Java can be used to store data in an array and to retrieve it: x[1] := 42

io.put_integer(x[1])

Run-Time Violations of Bounds

Languages respond in different ways to a run-time violation of an array's bounds. C and C++ simply ignore it; more secure languages such as Java and Eiffel will raise a run-time exception if an out-of-bounds array access is attempted.

Arrays and Design by Contract

Consider the following partial definition of a class intended to record details of a football team's performances during a season.

class RESULTScreate makefeature points : ARRAY[INTEGER] played : INTEGER total : INTEGER make( games : INTEGER ) is

do create points.make(1, games)

end add_result( pts : INTEGER ) is

do played := played + 1 points.put( pts, played ) total := total + pts

endend

Invarients

The invariant for this class should state, among other things, that all the values in the array points should be equal to 0, 1 or 3, the possible points that a team can be awarded for each game. Using conventional boolean expressions, the only way of specifying this would be to write something like

(points[1] = 0 or points[1] = 1 or points[1] = 3) and (points[2] = 0 ...) and ...

valid_results : BOOLEAN islocal i : INTEGERdo Result := true from

i := 1 until not Result or else i > points.upper loop

Result := Result and (points[i] = 0 or points[i] = 1 or points[i] = 3)

i := i + 1 endend

invariant

valid_results

for_all and there_exists

A better approach would be to find a way to mimic the quantifiers of logic, or in other words to extend the boolean expressions used in assertions so that it is possible to say things like "every element of the array is ..." or "at least one element of the array is ...". Eiffel provides this kind of facility by defining the features for_all and there_exists in the ARRAY class.

for_all and there_exists

Both for_all and there_exists apply a given Boolean-valued function to every element of an array.

for_all returns true if every element of the array satisfies the given function, and there_exists returns true if at least one does. The way in which the function is supplied to for_all and there_exists varies between different versions of EiffelStudio.

Here the helper function valid_result tests a single value, and the loop code is provided by the for_all feature. The keyword agent creates a 'function reference', and the '?' indicates which parameter should be replaced by each array element. With EiffelStudio 5.7 or later it is possible to use the agent keyword to define an anonymous function. This means that the invariant can be written without defining a separate feature whose job is simply to check the value of an array element.

With EiffelStudio 5.6 or Later

valid_result( i : INTEGER) : BOOLEAN is

do Result := i = 0 or else i = 1

or else i = 3end

invariant

valid_results: points.for_all( agent

valid_result(?) )

With EiffelStudio 5.7 or Later

invariant

valid_results: points.for_all( (agent (i : INTEGER) : BOOLEAN do

Result := i = 0 or else i = 1 or else i = 3

end)

)

data types and data structures

Documents