Download - Data Structures and Algorithms - Pradžiaalgis/dsax/Data-structures-0.pdf · ò Data Structures and Algorithms ... B-trees and other similar trees, ... Hashing idea, hashing functions

Data Structures and

Algorithms

Course’s slides: Introduction, Basic data types

www.mif.vu.lt/~algis

Motto

ò  Data Structures and Algorithms

ò  The key to your professional reputation

ò  A much more dramatic effect can be made on the performance of a program by changing to a better algorithm than by

ò  hacking

ò  converting to assembler

ò  Buy a text for long-term reference!

ò  Professional software engineers have algorithms text(s) on their shelves

ò  Hackers have user manuals for the latest software package

Content

ò  1. Introduction, computing model, von Neuman principles, data, abstract data types, data structures, basic data types

ò  2. Sorting, internal sorting, quicksort

ò  3. Merge sort, von Neuman sorting, external sorting

ò  4. Abstract data types, stack, queue, programming of stack and queue

ò  5. Heap, priority queue, priority queue by heap structure, lists, list programming, dynamic sets ADT

ò  6. Hierarchical structures, binary search trees, tree allocation in memory

ò  7. AVL trees, 2-3-4 trees, red-black trees

Content

ò  8. B-trees and other similar trees, Huffman algorithm for data compression

ò  9. Hashing idea, hashing functions and tables, hashing procedures and algorithms, extendable hashing

ò  10. Radix search algorithms, radix trees, radix algorithms, radix search

ò  11. Patricia trees, suffix tree

ò  12. Text search

ò  13. Analysis of algorithms

Introduction

ò  Informally, algorithm means is a well-defined computational procedure that takes value (set of values) as input and produces some value (set of values) as output.

ò  An algorithm is thus a sequence of computational steps that transform the input into the output.

ò  Algorithm is also viewed as a tool for solving a well-specified problem, involving computers.

ò  There exist many points of view to algorithms. A good example of this is a famous Euclid’s algorithm:

ò  for two integers x, y calculate the greatest common divisor gcd (x, y). Direct implementation of the algorithm looks like:

Introduction

program euclid (input, output); var x,y: integer; function gcd (u,v: integer): integer;

var t: integer; begin repeat

if u<v then begin t := u; u := v; v := t end; u := u-v; until u = 0; gcd := v end;

begin while not eof do

begin readln (x, y); if (x>0) and (y>0) then writeln (x, y, gcd (x, y)) end;

end.

Introduction

ò  This algorithm has some exceptional features

ò  it is applicable only to numbers;

ò  it has to be changed every time when something of the environment changes, say if numbers are very long and does not fit into a size of variable (numbers like 1000!).

ò  For algorithms of applications in the focus of this course, like databases, information systems, etc., they are usually understood in a slightly different way:

ò  is repeated many times;

ò  in a various circumstancies;

ò  with different types of data.

Introduction ò  404800479988610197196058631666872994808558901323↵ 829669944590997424504087073759918823627727188732↵

519779505950995276120874975462497043601418278094↵ 646496291056393887437886487337119181045825783647↵ 849977012476632889835955735432513185323958463075↵ 557409114262417474349347553428646576611667797396↵ 668820291207379143853719588249808126867838374559↵ 731746136085379534524221586593201928090878297308↵ 431392844403281231558611036976801357304216168747↵ 609675871348312025478589320767169132448426236131↵ 412508780208000261683151027341827977704784635868↵ 170164365024153691398281264810213092761244896359↵ 928705114964975419909342221566832572080821333186↵ 116811553615836546984046708975602900950537616475↵ 847728421889679646244945160765353408198901385442↵ 487984959953319101723355556602139450399736280750↵ 137837615307127761926849034352625200015888535147↵ 331611702103968175921510907788019393178114194545↵ 257223865541461062892187960223838971476088506276↵ 862967146674697562911234082439208160153780889893↵ 964518263243671616762179168909779911903754031274↵ 622289988005195444414282012187361745992642956581↵ 746628302955570299024324153181617210465832036786↵ 906117260158783520751516284225540265170483304226↵ 143974286933061690897968482590125458327168226458↵ 066526769958652682272807075781391858178889652208↵ 164348344825993266043367660176999612831860788386↵ 150279465955131156552036093988180612138558600301↵ 435694527224206344631797460594682573103790084024↵ 432438465657245014402821885252470935190620929023↵ 136493273497565513958720559654228749774011413346↵ 962715422845862377387538230483865688976461927383↵ 814900140767310446640259899490222221765904339901↵ 886018566526485061799702356193897017860040811889↵ 729918311021171229845901641921068884387121855646↵ 124960798722908519296819372388642614839657382291↵ 123125024186649353143970137428531926649875337218↵ 940694281434118520158014123344828015051399694290↵ 153483077644569099073152433278288269864602789864↵ 321139083506217095002597389863554277196742822248↵ 757586765752344220207573630569498825087968928162↵ 753848863396909959826280956121450994871701244516↵ 461260379029309120889086942028510640182154399457↵ 156805941872748998094254742173582401063677404595↵ 741785160829230135358081840096996372524230560855↵ 903700624271243416909004153690105933983835777939↵ 410970027753472000000000000000000000000000000000↵ 000000000000000000000000000000000000000000000000↵ 000000000000000000000000000000000000000000000000↵ 000000000000000000000000000000000000000000000000↵ 000000000000000000000000000000000000000000000000↵ 000000000000000000000000

Von Neuman computing model

1943: ENIAC

•  Presper Eckert and John Mauchly -- first general electronic computer. Hard-wired program -- settings of dials and switches.

•  or John V. Atanasoff in 1939

1944: Beginnings of EDVAC

•  among other improvements, includes program stored in memory

1945: John von Neumann - wrote a report on the stored program concept, known as the First Draft of a Report on EDVAC , the “von Neumann machine” (or model):

•  a memory, containing instructions and data

•  a processing unit, for performing arithmetic and logical operations

•  a control unit, for interpreting instructions


valdymas skaičiavimas

registrai

RAM

3

5

8

PC

Memory Types: RAM

ò  RAM is typically volatile memory (meaning it doesn’t retain voltage settings once power is removed)

ò  RAM is an array of cells, each with a unique address

ò  A cell is the minimum unit of access. Originally, this was 8 bits taken together as a byte. In today’s computer, word-sized cells (16 bits, grouped in 4) are more typical.

ò  RAM gets its name from its access performance. In RAM memory, theoretically, it would take the same amount of time to access any memory cell, regardless of its location with the memory bank (“random” access).

Memory

University of Texas at Austin CS310H - Computer Organization Spring 2010 Don Fussell 4

Memory

2k x m array of stored bits

Address

unique (k-bit) identifier of location

Contents

m-bit value stored in location

Basic Operations:

LOAD

read a value from a memory location

STORE

write a value to a memory location

•••

0000

0001

0010

0011

0100

0101

0110

1101

1110

1111

00101101

10100010

Interface to memory


Interface to Memory

How does processing unit get data to/from memory?

MAR: Memory Address Register

MDR: Memory Data Register

To LOAD a location (A):

1. Write the address (A) into the MAR.

2. Send a “read” signal to the memory.

3. Read the data from MDR.

1. To STORE a value (X) to a location (A):

1. Write the data (X) to the MDR.

2. Write the address (A) into the MAR.

3. Send a “write” signal to the memory.

MEMORY

MAR MDR

Processing unit


Processing Unit

Functional Units

ALU = Arithmetic and Logic Unit

could have many functional units.

some of them special-purpose

(multiply, square root, …)

LC-3 performs ADD, AND, NOT

Registers

Small, temporary storage

Operands and results of functional units

LC-3 has eight registers (R0, …, R7), each 16 bits wide

Word Size

number of bits normally processed by ALU in one instruction

also width of registers

LC-3 is 16 bits

PROCESSING UNIT

ALU TEMP

The ALU

ò  The third component in the von Neumann architecture is called the Arithmetic Logic Unit.

ò  This is the subcomponent that performs the arithmetic and logic operations for which we have been building parts.

ò  The ALU is the “brain” of the computer.

Instruction processing


Instruction Processing

Decode instruction

Evaluate address

Fetch operands from memory

Execute operation

Store result

Fetch instruction from memory

Boolean data Data values: {false, true} In C/C++: false = 0, true = 1 (or nonzero) Could store 1 value per bit, but usually use a byte (or word)

M

M Operations: and && or || not !

&& 0 1

0 0 0

1 0 1

| | 0 1

0 0 1

1 1 1

x !x 0 1 1 0

Character Data Store numeric codes (ASCII, EBCDIC, Unicode) 1 byte for ASCII and EBCDIC, 2 bytes for Unicode

Basic operation: comparison of chars to determine if ==, <, >, etc. uses their numeric codes (i.e. uses their ordinal values)

Integer Data Nonegative (unsigned) integer:

type unsigned (and variations) in C++ Store its base-two representation in a fixed number w of bits

(e.g., w = 16 or w = 32)

88 = 00000000010110002

Signed integer: type int (and variations) in C++ Store in a fixed number w of bits using one of the following

representations:

Sign-magnitude representation

Save one bit (usually most significant) for sign (0 = +, 1 = – )

Use base-two representation in the other bits.

88 → _000000001011000

0↑ sign bit

1. Cumbersome for arithmetic computations

2. 2 0’s in this scheme

3.  Incrementing by one results in subtraction of one, not addition!

–88 → _000000001011000 ↓

1

Both 0 and -0

Complement representation

For negative n (–n): (1) Find w-bit base-2 representation of n (2) Complement each bit. (3) Add 1

Example: –88 1. 88 as a 16-bit base-two number

0000000001011000

Same as subtracting the number from 0!

Same as sign mag. For non-negative n:

Use ordinary base-two representation with leading (sign) bit 0

2. Complement this bit string 3. Add 1

1111111110100111 1111111110101000

(see p. 38)

5 + 7: 0000000000000101 +0000000000000111

5 + –6: 0000000000000101 +1111111111111010

These work for both + and – integers

0000000000001100

111←⎯⎯ carry bits

1111111111111111

+ 0 1

0 0 1

1 1 10

x 0 1

0 0 0

1 0 1

Good for arithmetic computation

Problems with Integer Representation

ò  Limited Capacity -- a finite number of bitsò  Overflow and Underflow:

Overflow- addition or multiplication can exceed largest value permitted by

storage schemeUnderflow

- subtraction or multiplication can exceed smallest value permitted by storage scheme

ò  Not a perfect representation of (mathematical) integersò  can only store a finite (sub)range of them

How is Real Data represented? Types float and double (and variations) in C++Single precision (IEEE Floating-Point Format )

1. Write binary representation in floating-point form:b1.b2b3 . . . × 2k with each bi a bit and b1 = 1 (unless number is 0)

↑mantissa exponent

or fractional part

Example: 22.625 = (see p.41) Floating point form:

10110.1012

1.01101012 × 24

+ 127

double: Exp: 11 bits, bias 1023 Mant: 52 bits

p. 756

Round-off Errors

base

Basic data types for C

C programming language:

char - smallest addressable unit of the machine that can contain basic character set. It is an integer type, actual type can be either signed or unsigned depending on the implementation.

signed char - same size as char, but guaranteed to be signed.

unsigned char - same size as char, but guaranteed to be unsigned.

short short int signed short signed short int - short signed integer type, at least 16 bits in size.

Basic data types for C unsigned unsigned int - same as int, but unsigned.

long long int signed long signed long int - long signed integer type, at least 32 bits in size.

unsigned long unsigned long int - same as long, but unsigned.

long long long long int signed long long signed long long int - long long signed integer type, at least 64 bits in size


unsigned long long int - same as long long, but unsigned.

float - single precision floating-point type.

double - double precision floating-point type, actual properties unspecified (except minimum limits), however on most systems this is the IEEE 754 double-precision binary floating-point format

long double - extended precision floating-point type, actual properties unspecified.

Boolean type

Structures:

•  struct birthday { char name[20]; int day; int month; int year; };


Array - array of N elements of type T

•  int cat[10]; // array of 10 elements, each of type int

•  int bob[]; // array of an unspecified number of 'int' elements.

•  int a[10][8]; // array of 10 elements, each of type 'array of 8 int elements'

•  float f[][32]; // array of unspecified number of 'array of 32 float elements’

Pointer - char *square; long *circle;

Unions - union types are special structures which allow access to the same memory using different type descriptions

Download - Data Structures and Algorithms - Pradžiaalgis/dsax/Data-structures-0.pdf · ò Data Structures and Algorithms ... B-trees and other similar trees, ... Hashing idea, hashing functions

Top Related