paper 3. sec 3.1) data representation

18
Paper 3. Sec 3.1) Data Representation Computer Science 9608 with Majid Tahir 1 Syllabus Content: 3.1 Data representation 3.1.1 User-defined data types why user-defined types are necessary, define and use non-composite types: enumerated, pointer define and use composite data types: set, record and class/object choose and design an appropriate user-defined data type for a given problem 3.1.2 File organisation and access file organisation: serial, sequential (using a key field) and random (using record key) file access: sequential access (serial & sequential files) – direct access (sequential & random files) select an appropriate method of file organisation and file access for a given problem 3.1.3 Real numbers and normalised floating-point representation describe the format of binary floating-point real numbers convert binary floating-point real numbers into denary and vice versa normalise floating-point numbers & show understanding of the reasons for normalization effects of changing the allocation of bits to mantissa and exponent in a floating-point representation how underflow and overflow can occur consequences of a binary representation only being an approximation to the real number it represents (in certain cases) show understanding that binary representations can give rise to rounding errors (Sec 3.1.1) User defined Data Type You have already met a variety of built-in data types with integers, strings, chars and more. But often these limited data types aren't enough and a programmer wants to build their own data types. Just as an integer is restricted to "a whole number from -2,147,483,648 through 2,147,483,647", user-defined data types have limits placed on their use by the programmer. A user defined data type is a feature in most high level programming languages which allows a user (programmer) to define data type according to his/her own requirements There are two categories of user defined data types.: 1. Non-composite data type a. Enumerated data type b. Pointer data type 2. Composite a. Record data type b. Set data type c. Objects and classes Non-composite user-defined data type:

Upload: others

Post on 06-Feb-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Paper 3. Sec 3.1) Data Representation Computer Science 9608

with Majid Tahir

1

Syllabus Content: 3.1 Data representation 3.1.1 User-defined data types

why user-defined types are necessary, define and use non-composite types: enumerated, pointer

define and use composite data types: set, record and class/object

choose and design an appropriate user-defined data type for a given problem

3.1.2 File organisation and access

file organisation: serial, sequential (using a key field) and random (using record key)

file access: sequential access (serial & sequential files) – direct access (sequential & random files)

select an appropriate method of file organisation and file access for a given problem

3.1.3 Real numbers and normalised floating-point representation

describe the format of binary floating-point real numbers

convert binary floating-point real numbers into denary and vice versa

normalise floating-point numbers & show understanding of the reasons for normalization

effects of changing the allocation of bits to mantissa and exponent in a floating-point

representation how underflow and overflow can occur

consequences of a binary representation only being an approximation to the real number it

represents (in certain cases)

show understanding that binary representations can give rise to rounding errors

(Sec 3.1.1) User defined Data Type You have already met a variety of built-in data types with integers, strings, chars and more. But often these limited data types aren't enough and a programmer wants to build their own data types. Just as an integer is restricted to "a whole number from -2,147,483,648 through 2,147,483,647", user-defined data types have limits placed on their use by the programmer.

A user defined data type is a feature in most high level programming languages which allows a user

(programmer) to define data type according to his/her own requirements

There are two categories of user defined data types.:

1. Non-composite data type

a. Enumerated data type

b. Pointer data type

2. Composite

a. Record data type

b. Set data type

c. Objects and classes

Non-composite user-defined data type:

Paper 3. Sec 3.1) Data Representation Computer Science 9608

with Majid Tahir

2

An enumerated data type defines a list of possible values. The following pseudocode

shows two examples of type definitions:

Pseudocode

Variables can then be declared and assigned values, for example:

It is important to note that the values of the enumerated type look like string values but they are not. They must not be enclosed in quote marks. The values defined in an enumerated data type are ordinal. This means that they have an implied order of values. This makes the second example much more useful because the ordering can be put to many uses in a program. For example, a comparison statement can be used with the values and variables of the enumerated data type:

Enumerated data type in vb.net

Enum Example

When you are in a situation to have a number of constants that are logically related to each other, you can define them together these constants in an enumerator list. An enumerated type is declared using the enum keyword.

Syntax:

Paper 3. Sec 3.1) Data Representation Computer Science 9608

with Majid Tahir

3

Enum declaration

Enum Temperature Low Medium High End Enum

An enumeration has a name, an underlying data type, and a set of members. Each member represents a constant. It is useful when you have a set of values that are functionally significant and fixed.

Retrieve and check the Enum value

Dim value As Temperature = Temperature.Medium If value = Temperature.Medium Then Console.WriteLine("Temperature is Mediuam..") End If

Paper 3. Sec 3.1) Data Representation Computer Science 9608

with Majid Tahir

4

Pointer Data Type:

The pointer data type is unique among the FreeBasic numeric data types. Instead of containing

data, like the other numeric types, a pointer contains the memory address of data.

For many beginning programmers, pointers seem like a strange and mysterious beast.

However, if you keep one rule in mind, you should not have any problems using pointers in

your program. The rule is very simple: a pointer contains an address, not data. If you keep this

simple rule in mind, you should have no problems using pointers.

The main non-composite, derived type is the pointer, a data type whose value refers directly to (or "points to") another value stored elsewhere in the computer memory using its address. It is a primitive kind of reference. (In everyday terms, a page number in a book could be considered a piece of data that refers to another one). Pointers are often stored in a format similar to an integer; however, attempting to dereference or "look up" a pointer whose value was never a valid memory address would cause a program to crash. To ameliorate this potential problem, pointers are considered a separate type to the type of data they point to, even if the underlying representation is the same.

As you can see there are three basic steps to using pointers.

1. Declare a pointer variable. 2. Initialize the pointer to a memory address. 3. Dereference the pointer to manipulate the data at the pointed-to memory location.

This isn't really any different than using a standard variable, and you use pointers in much the

same way as standard variables. The only real difference between the two is that in a standard

variable, you can access the data directly, and with a pointer you must dereference the pointer

to interact with the data.

A pointer references a location in memory, and obtaining the value stored at that location is

known as dereferencing the pointer. As an analogy, a page number in a book's index could be

considered a pointer to the corresponding page; dereferencing such a pointer would be done by

flipping to the page with the given page number and reading the text found on the indexed

page.

Pseudocode for the definition of a pointer is illustrated by:

Declaration of a variable of pointer type does not require the caret symbol ^ to be used:

Paper 3. Sec 3.1) Data Representation Computer Science 9608

with Majid Tahir

5

A special use of a pointer variable is to access the value stored at the address pointed to. The pointer variable is said to be 'dereferenced'

Composite user-defined data types

A composite user-defined data type has a definition with reference to at least one other type.

Three examples are considered here.

Record data type A record data type is the most useful and therefore most widely used. It allows the programmer to collect together values with different data types when these form a coherent whole.

As an example, a record could be used for a program using employee data. Pseudocode for defining the type could be:

An individual data item can th en be accessed using a dot notation:

A particular use of a record is for the implementation of a data structure where one or possibly two of the variables defined are pointer variables.

Paper 3. Sec 3.1) Data Representation Computer Science 9608

with Majid Tahir

6

Records in VB:

Set data type: A set data type allows a program to create sets and to apply the mathematical operations defined in set theory. The following is a representative list of the operations to be expected:

Union Difference Intersection include an element in the set exclude an element from the set Check whether an element is in a set.

Paper 3. Sec 3.1) Data Representation Computer Science 9608

with Majid Tahir

7

Objects and classes In object-oriented programming, a program defines the classes to be used - they are all user-defined data types. Then for each class the objects must be defined. Chapter 27 (Section 27.03) has a full discussion of this subject.

Why are user-defined data types necessary? When object-oriented programming is not being used a programmer may choose not to use any user-defined data types. However, for any reasonably large program it is likely that their use will make a program more understandable and less error-prone. Once the programmer has decided because of this advantage to use a data type that is not one of the built-in types then user-definition is inevitable. The use of, for instance, an integer variable is the same for any program. However, there cannot be a built-in record type because each different problem will need an individual

definition of a record.

Serial Files:

Serial file organisation is the simplest file organisation method. In serial files, records

are entered in the order of their creation. As such, the file is unordered, and is at best

in chronological order. Serial files are primarily used as transaction files in which the

transactions are recorded in the order that they occur.

Sequential Files:

The key difference between a sequential file and a serial file is that it is ordered in a

logical sequence based on a key field. This key is usually the primary key, though

secondary keys may be used as well. Sequential files are therefore files that are sorted

based on some key values.

Sequential files are primarily used in applications where there is a high file hit rate. Hit

rate is a measure of the proportion of the records that is accessed in a single run of the

application. Therefore, sequential files are ideal for master files and batch processing

applications such as payroll systems in which almost all records are processed in a

single run of the application.

Random Files:

In random file organisation, records are stored in random order within the file. Though

there is no sequencing to the placement of the records, there is however, a pre-defined

Paper 3. Sec 3.1) Data Representation Computer Science 9608

with Majid Tahir

8

relationship between the key of the record and its location within the file. In other

words, the value of the record key is mapped by an established function to the address

within the file where it resides. Therefore, any record within the file can be directly

accessed through the mapping function in roughly the same amount of time. The

location of the record within the file therefore is not a factor in the access time of the

record. As such, random files are also known in some literature as direct access files.

To create and maintain a random file, a mapping function must be established between

the record key and the address where the record is held. If M is the mapping function,

then

M(value of record key) => address of record

Hashing

There are various mapping techniques. Some involve using the key field to directly

map to the location with the file, while others refer to some lookup table for the

location. However, the more common method is to employ a hash function to derive

the address.

Hashing is the process of transforming the key value of a record to yield an address

location where the record is stored. In some literatures, it is also known as Key-to-

Address Transformation, Address-Calculation, Scatter Storage, or Randomization.

A hash function generates the record address by performing some simple operations on

the key or parts of the key. A good hashing function should be

• quick to calculate

• cover the complete range of the address space.

• give an even distribution

Paper 3. Sec 3.1) Data Representation Computer Science 9608

with Majid Tahir

9

• not generate addresses that tend to cluster within a few locations, thus resulting in frequent collisions.

Real numbers: A real number is one with a fractional part. When we write down a value for a real number in the denary system we have a choice. We can use a simple representation or we can use an exponential notation (sometimes referred to as scientific notation). In this latter case we have options. For example, the number 25.3 might alternatively be written as:

0.253 x 102 or 2.53 x 101 or 25.3 x 10° or 253 x 10-1 For this number, the simple expression is best but if a number is very large or very small the exponential notation is the only sensible choice.

Fixed-point representations:

One possibility for handling numbers with fractional parts is to add bits after the decimal point: The first bit after the decimal point is the halves place, the next bit the quarter’s place, the next bit the eighth’s place, and so on.

Place Values

Suppose that we want to represent 1.625(10). We would want 1 in the ones place, leaving us

with 0.625. Then we want 1 in the halves place, leaving us with 0.625 − 0.5 = 0.125. No

quarters will fit, so put a 0 there. We want a 1 in the eighths place, and we subtract 0.125

from 0.125 to get 0.

So the binary representation of 1.625 would be 1.101(2).

So how fixed number representation stores a fractional number in binary format? See explanation below

Paper 3. Sec 3.1) Data Representation Computer Science 9608

with Majid Tahir

10

A number 40.125 is to be converted into binary. First number 40 has to

be converted as a normal denary to binary conversion, which gives:

40 = 101000

40 written in complete byte 40 = 00101000

Now 0.125 has to be converted to binary. We Multiply 0.125 by 2 e.g. 0.125 x 2 = 0.25 this is less than 1 so we put 0

0.125x2 = 0.25 0 0.25 x 2 = 0.5 0 0.5 x 2 = 1 1

So fractional number 40.125 number becomes 00101000.001

Sign bit +ve

Place value 128 64 32 16 8 4 2 1 . ½ ¼ 1/8

0 0 1 0 1 0 0 0 . 0 0 1

Negative Fixed Point Representation

Suppose -52.625 has to be converted into binary: 52 = 110100 50 written in complete byte 52 = 00110100

One’s Compliment = 11001011 Two’s Compliment -52= 11001100 (by adding 1 in One’s compliment)

Now Multiply 0.625 by 2 e.g. 0.625 x 2 = 1.25 so we put 1

0.625x2 = 1.25 1 0.25 x 2 = 0.5 0 0.5 x 2 = 1 1

So fractional number -52.125 becomes 11001100.101

Sign bit -ve

Paper 3. Sec 3.1) Data Representation Computer Science 9608

with Majid Tahir

11

In Fixed-Point Representation option, an overall number of bits are chosen with a defined number of bits for the whole number part and the remainder for the fractional part. Some important non-zero values in this representation are shown in Table below. (The bits are shown with a gap to indicate the implied position of the binary point.)

Floating-Point Number Representation The alternative is a floating-point representation. The format for a float in g-point number can be generalised as: In this option a defined number of bits are used for what is called the significand or mantissa, ±M. The remaining bits are used for the exponent or exrad, E. The radix, R is not stored in the representation; it has an implied value of 2.

Conversion from +ve Real Number to Binary Number:

A number 40.125 is to be converted into binary. First number 40 has to be converted as a

normal denary to binary conversion, which gives:

40 = 101000 40 written in complete byte 40 = 00101000

Now 0.125 has to be converted to binary. We Multiply 0.125 by 2 e.g.

0.125 x 2 = 0.25 this is less than 1 so we put 0

0.125x2 = 0.25 0 0.25 x 2 = 0.5 0 0.5 x 2 = 1 1

Paper 3. Sec 3.1) Data Representation Computer Science 9608

with Majid Tahir

12

So fractional number 40.125 number becomes 00101000.001

Sign bit +ve

But in binary numbers, decimals cannot be written. Decimal has to be converted into binary number too.

0 0 1 0 1 0 0 0 . 0 0 1

Decimal moved 6 places to left so exponent = 6 Six in binary is represented as 6 = 110

So the number becomes 0 101000 001 110 so number is 0101000001110

Conversion from -ve Real Number to Binary Number:

Suppose -52.625 has to be converted into binary: 52 = 110100

50 written in complete byte 52 = 00110100

One’s Compliment = 11001011(by inverting 0’s to 1 and 1’s to 0) Two’s Compliment -52= 11001100 (by adding 1 in One’s compliment)

When we multiply 0.625 by 2 e.g. 0.625 x 2 = 1.25 so we put 1

0.625x2 = 1.25 1 0.25 x 2 = 0.5 0 0.5 x 2 = 1 1

So fractional number -52.125 becomes 11001100.101

Sign bit -ve But in binary numbers, decimals cannot be written. Decimal has to be converted into binary number too.

1 1 0 0 1 1 0 0 . 1 0 1

Decimal moved 7 places to left so exponent = 7 Seven in binary is represented as 7 = 111

So the number becomes 1 1001100101 111 so number is 11001100101111

Paper 3. Sec 3.1) Data Representation Computer Science 9608

with Majid Tahir

13

Conversion from Binary Number to +ve Real Number:

Example 1: Q#1) In a computer system, real numbers are stored using floating-point representation with:

8 bits for the mantissa, followed by 8 bits for the exponent Two’s complement form is used for both mantissa and exponent.

A real number is stored as the following two bytes: Mantissa Exponent

0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 1 Calculate the denary value of this number. Show your working 9608/31/O/N/15

Solution: sign bit

As Exponent 0 0 0 0 0 0 1 1 = 3 denary Now Mantissa = 0 0 1 0 1 0 0 0

Decimal is actually after the sign bit

So decimal would be 0.0101000 As exponent is 3 so decimal will move three places to right 0 . 0 1 0 1 0 0 0

8 4 2 1 . ½ ¼ 1/8 1/16

0 0 1 0 . 1 0 0 0 2 . 5 (Calculated as per place values)

Example 2: Q#2) Floating Point Binary representation uses 4 bits for Mantissa and 4 bits for exponent Convert 0110 0010

Solution:

As Exponent = 0 0 1 0 = 2 denary sign bit

Now Mantissa = 0 1 1 0

Decimal is actually after the sign bit

So decimal would be 0.110

As exponent is 2 so decimal will move two places to right 0 . 1 1 0

011 . 0 + 3 . 0 (Calculated as per place values)

Paper 3. Sec 3.1) Data Representation Computer Science 9608

with Majid Tahir

14

Conversion from Binary Number to -ve Real Number:

Example 1: Floating Point Binary representation uses 4 bits for Mantissa and 4 bits for exponent Convert 1001 0001

Solution: sign bit

As Exponent = 0 0 0 1 = 1 denary Now Mantissa = 1 0 0 1

Decimal is actually after the sign bit

So decimal would be 1.001 As exponent is 1 so decimal will move one place to right 1 . 0 0 1

2 1 ½ ¼ (place Values) 1 0 . 0 1

sign bit

-2 + ¼ (Calculated as per place values)

= -2 + 0.25 (-2 is –ve and +0.25 is +ve, so by adding 0.25 in -2, we get)

= - 1 . 75 (Answer)

A floating-point number is typically expressed in the scientific notation, with a fraction (F) or Mantissa (M), and exponent (E) of a certain radix (R), in the form of

+M×RE or -M×RE

Denary numbers use radix 10 (M×10E); while binary numbers use radix of 2 (M×2E). Representation of floating point number is not unique. For example, the number 55.66 can be represented as 5.566×101, 0.5566×102, 0.05566×103, and so on. The fractional part can be normalized. In the normalized form, there is only a single non-zero digit before the radix point. For example, decimal number 123.4567 can be normalized as 1.234567×102; binary number 1010.1011 can be normalized as 1.0101011×23.

Both formats use essentially the same method for storing floating-point binary numbers, so

Paper 3. Sec 3.1) Data Representation Computer Science 9608

with Majid Tahir

15

we will use the Short Real as an example in this tutorial. The bits in an IEEE Short Real are

arranged as follows, with the most significant bit (MSB) on the left:

Fig

. 1

Decimal

Fraction Factored As... Binary Real

1/2 1/2 .1

1/4 1/4 .01

3/4 1/2 + 1/4 .11

1/8 1/8 .001

7/8 1/2 + 1/4 + 1/8 .111

3/8 1/4 + 1/8 .011

1/16 1/16 .0001

3/16 1/8 + 1/16 .0011

5/16 1/4 + 1/16 .0101

The Sign

The sign of a binary floating-point number is represented by a single bit. A 1 bit indicates a

negative number, and a 0 bit indicates a positive number.

The Mantissa

It is useful to consider the way decimal floating-point numbers represent their mantissa.

Using -3.154 x 105 as an example, the sign is negative, the mantissa is 3.154, and

Paper 3. Sec 3.1) Data Representation Computer Science 9608

with Majid Tahir

16

the exponent is 5. The fractional portion of the mantissa is the sum of each digit multiplied

by a power of 10:

.154 = 1/10 + 5/100 + 4/1000

A binary floating-point number is similar. For example, in the number +11.1011 x 23, the

sign is positive, the mantissa is 11.1011, and the exponent is 3. The fractional portion of the mantissa is the sum of successive powers of 2. In our example, it is expressed as:

.1011 = 1/2 + 0/4 + 1/8 + 1/16

Or, you can calculate this value as 1011 divided by 24. In decimal terms, this is eleven

divided by sixteen, or 0.6875. Combined with the left-hand side of 11.1011, the decimal

value of the number is 3.6875. Here are additional examples:

Binary Floating-Point Base 10

Fraction Base 10 Decimal

11.11 3 3/4 3.75

0.00000000000000000000001 1/8388608 0.00000011920928955078125

The last entry in this table shows the smallest fraction that can be stored in a 23-bit

mantissa. The following table shows a few simple examples of binary floating-point numbers alongside their equivalent decimal fractions and decimal values:

Binary Decimal Fraction Decimal

Value

.1 1/2 .5

.01 1/4 .25

.001 1/8 .125

.0001 1/16 .0625

.00001 1/32 .03125

The Exponent

Paper 3. Sec 3.1) Data Representation Computer Science 9608

with Majid Tahir

17

IEEE Short Real exponents are stored as 8-bit unsigned integers with a bias of 127. Let's

use the number 1.101 x 25 as an example. The exponent (5) is added to 127 and the sum

(132) is binary 10000100. Here are some examples of exponents, first shown in decimal, then adjusted, and finally in unsigned binary:

Exponent (E) Adjusted

(E + 127) Binary

+5 132 10000100

0 127 01111111

-10 117 01110101

+128 255 11111111

-127 0 00000000

-1 126 01111110

The binary exponent is unsigned, and therefore cannot be negative. The largest possible

exponent is 128-- when added to 127, it produces 255, the largest unsigned value

represented by 8 bits. The approximate range is from 1.0 x 2-127 to 1.0 x 2+128.

Normalizing the Mantissa

Before a floating-point binary number can be stored correctly, its mantissa must be

normalized. The process is basically the same as when normalizing a floating-point decimal

number. For example, decimal 1234.567 is normalized as 1.234567 x 103 by moving the

decimal point so that only one digit appears before the decimal. The exponent expresses the

number of positions the decimal point was moved left (positive exponent) or moved right (negative exponent).

Similarly, the floating-point binary value 1101.101 is normalized as 1.101101 x 23 by

moving the decimal point 3 positions to the left, and multiplying by 23. Here are some

examples of normalizations:

Binary Value Normalized As Exponent

1101.101 1.101101 3

.00101 1.01 -3

Paper 3. Sec 3.1) Data Representation Computer Science 9608

with Majid Tahir

18

1.0001 1.0001 0

10000011.0 1.0000011 7

You may have noticed that in a normalized mantissa, the digit 1 always appears to the left

of the decimal point. In fact, the leading 1 is omitted from the mantissa in the IEEE storage format because it is redundant.

A floating-point number (or real number) can represent a very large (1.23×1088) or a very small (1.23×10-88) value. It could also represent very large negative number (-1.23×1088) and very small negative number (-1.23×10-88), as well as zero, as illustrated:

Refrences

Book:A-level Computing

https://www.tutorialspoint.com/vb.net/vb.net_constants.htm