cs 315 data structures fall 2011 instructor: b. (ravi) ravikumar office: 116 i darwin...

34
CS 315 Data Structures Fall 2011 Instructor: B. (Ravi) Ravikumar Office: 116 I Darwin Hall Phone: 664 3335 E-mail: [email protected] Course Web site: http://ravi.cs.sonoma.edu/cs315fa11

Upload: hollye

Post on 25-Feb-2016

37 views

Category:

Documents


1 download

DESCRIPTION

CS 315 Data Structures Fall 2011 Instructor: B. (Ravi) Ravikumar Office: 116 I Darwin Hall Phone: 664 3335 E-mail: [email protected] Course Web site: http://ravi.cs.sonoma.edu/cs315fa11. Textbook:. Data Structures and Algorithm Analysis in C++. 3 rd edition - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CS 315 Data Structures   Fall 2011 Instructor: B. (Ravi) Ravikumar          Office: 116 I Darwin Hall Phone: 664 3335 E-mail: cs315spring11@gmail.com

CS 315 Data Structures Fall 2011

Instructor:

B. (Ravi) Ravikumar Office: 116 I Darwin Hall

Phone: 664 3335E-mail: [email protected]

Course Web site: http://ravi.cs.sonoma.edu/cs315fa11

Page 2: CS 315 Data Structures   Fall 2011 Instructor: B. (Ravi) Ravikumar          Office: 116 I Darwin Hall Phone: 664 3335 E-mail: cs315spring11@gmail.com

Textbook:

Data Structures and Algorithm Analysis in C++. 3rd edition by Mark Allen Weiss

Course Schedule

Lecture: M W 10:45 – 12, Salazar 2024

Lab: F 9 to 11:50 AM, Darwin 28

Page 3: CS 315 Data Structures   Fall 2011 Instructor: B. (Ravi) Ravikumar          Office: 116 I Darwin Hall Phone: 664 3335 E-mail: cs315spring11@gmail.com

Data Structures – what is the main focus?

• (more) programming• larger problems (than what was studied in CS 215)• new data types (images, audio, text)• (systematic) algorithm design

• performance issues• comparison of algorithms• time, storage requirements

• applications• image processing• compression• web search• board games

Page 4: CS 315 Data Structures   Fall 2011 Instructor: B. (Ravi) Ravikumar          Office: 116 I Darwin Hall Phone: 664 3335 E-mail: cs315spring11@gmail.com

Data Structure – what is the main focus?

How to organize data in memory so that the we can solve the problem most efficiently?

Hard disk RAM

CPU

Page 5: CS 315 Data Structures   Fall 2011 Instructor: B. (Ravi) Ravikumar          Office: 116 I Darwin Hall Phone: 664 3335 E-mail: cs315spring11@gmail.com

Data Structure – what is the main focus?

Topics of concern:

• software design, problem solving and applications

• online vs. offline phase of a solution

• preprocessing vs. updating

• trade-offs between operations

• efficiency

Page 6: CS 315 Data Structures   Fall 2011 Instructor: B. (Ravi) Ravikumar          Office: 116 I Darwin Hall Phone: 664 3335 E-mail: cs315spring11@gmail.com

Course Goals • Learn to use fundamental data structures:

• arrays, linked lists, stacks and queues• hash table• priority queue• binary search tree etc.

• Improve programming skill• recursion, classes, algorithm design, implementation• build projects using different data structures

• Analytical and experimental analysis• quantitative reasoning about the performance of algorithms (time, storage, etc.)• comparing different data structures

Page 7: CS 315 Data Structures   Fall 2011 Instructor: B. (Ravi) Ravikumar          Office: 116 I Darwin Hall Phone: 664 3335 E-mail: cs315spring11@gmail.com

Course Goals • Applications:

• image storage, manipulation• compression (text, image, video etc.)• audio processing • web search• algorithms for networking and communication

• routing protocols• computer interconnections

Page 8: CS 315 Data Structures   Fall 2011 Instructor: B. (Ravi) Ravikumar          Office: 116 I Darwin Hall Phone: 664 3335 E-mail: cs315spring11@gmail.com

Data Structures – key to software design

• Data structures play a key role in every type of software.

•Data structure deals with how to store the data internally while solving a problem in order to

•Optimize• the overall running time of a program• the response time (for queries)• the memory requirements• other resources (e.g. band-width of a network)• Simplify software design• make solution extendible, more robust

Page 9: CS 315 Data Structures   Fall 2011 Instructor: B. (Ravi) Ravikumar          Office: 116 I Darwin Hall Phone: 664 3335 E-mail: cs315spring11@gmail.com

Abstract vs. concrete data structures Abstract data structure (sometimes called ADT ->

Abstract Data Type) is a collection of data with a set of operations supported to manipulate the structure

Examples:• stack, queue insert, delete• priority queue insert, deleteMin• Dictionary insert, search, delete

Concrete data structures are the implementations of abstract data structures: • Arrays, linked lists, trees, heaps, hash table

A recurring theme: Find the best mapping between abstract and concrete data structures.

Page 10: CS 315 Data Structures   Fall 2011 Instructor: B. (Ravi) Ravikumar          Office: 116 I Darwin Hall Phone: 664 3335 E-mail: cs315spring11@gmail.com

Abstract Data Structure (ADT)container supporting operations

•Dictionary•search •insert primary operations•Delete

•deleteMin•Range search•Successor secondary operations•Merge

•Priority queue•Insert•deleteMin •Merge, split etc. Secondary operations

primary operations

Page 11: CS 315 Data Structures   Fall 2011 Instructor: B. (Ravi) Ravikumar          Office: 116 I Darwin Hall Phone: 664 3335 E-mail: cs315spring11@gmail.com

Linear data structures

• key properties of the (1-dim.) array:

• a sequence of items are stored in consecutive physical memory locations.

• main advantage: array provides a constant time access to k-th element for any k. (access the element by: Element[k].)

• other operations are expensive:• Search• Insert• delete

Page 12: CS 315 Data Structures   Fall 2011 Instructor: B. (Ravi) Ravikumar          Office: 116 I Darwin Hall Phone: 664 3335 E-mail: cs315spring11@gmail.com

2-dim. arrays Used to store images, tables etc. Given row number r, and column number s,

the element in A[r, s] can be accessed in one clock cycle.

(usually row major or column major order is used.)

Other operations are expensive. Sparse array representation

• Used to compress images• Trade-offs between storage and time

Page 13: CS 315 Data Structures   Fall 2011 Instructor: B. (Ravi) Ravikumar          Office: 116 I Darwin Hall Phone: 664 3335 E-mail: cs315spring11@gmail.com

Linked lists Linked lists:

• Storing a sequence of items in non-consecutive locations of the memory.

• Not easy to search for a key (even if sorted).

• Inserting next to a given item is easy.• Array vs. linked list:

• Don’t need to know the number of items in advance. (dynamic memory allocation)

• disadvantages

order is important

Page 14: CS 315 Data Structures   Fall 2011 Instructor: B. (Ravi) Ravikumar          Office: 116 I Darwin Hall Phone: 664 3335 E-mail: cs315spring11@gmail.com

stacks and queues• stacks:

• insert and delete at the same end.• equivalently, last element inserted will be the first one to be deleted.• very useful to solve many problems

•Processing arithmetic expressions

• queues:• insert at one end, deletion at the other end.• equivalently, first element inserted is the first one to be deleted.

Page 15: CS 315 Data Structures   Fall 2011 Instructor: B. (Ravi) Ravikumar          Office: 116 I Darwin Hall Phone: 664 3335 E-mail: cs315spring11@gmail.com

Non-linear data structures Various versions of trees

• Binary search trees• Height-balanced trees etc.

Lptr key Rptr15

Main purpose of a binary search tree supportdictionary operations efficiently

Page 16: CS 315 Data Structures   Fall 2011 Instructor: B. (Ravi) Ravikumar          Office: 116 I Darwin Hall Phone: 664 3335 E-mail: cs315spring11@gmail.com

Priority queue Max priority key is the one that gets deleted

next.• Equivalently, support for the following

operations:• insert• deleteMin

Useful in solving many problems• fast sorting (heap-sorting)• shortest-path, minimum spanning tree,

scheduling etc.

Page 17: CS 315 Data Structures   Fall 2011 Instructor: B. (Ravi) Ravikumar          Office: 116 I Darwin Hall Phone: 664 3335 E-mail: cs315spring11@gmail.com

Hashing • Supports dictionary operations very efficiently (most of the time).

• Main advantages:•Simple to design, implement• on average very fast• not good in the worst-case.

Page 18: CS 315 Data Structures   Fall 2011 Instructor: B. (Ravi) Ravikumar          Office: 116 I Darwin Hall Phone: 664 3335 E-mail: cs315spring11@gmail.com

Applications

•arithmetic expression evaluation

• data compression (Huffman coding, LZW algorithm)

• image segmentation, image compression

• backtrack searching

• finding the best path to route in a network

• geometric problems (e.g. rectangle area)

Page 19: CS 315 Data Structures   Fall 2011 Instructor: B. (Ravi) Ravikumar          Office: 116 I Darwin Hall Phone: 664 3335 E-mail: cs315spring11@gmail.com

What data structure to use?

Example 1: There are more than 1 billion web pages. When you type on google search page something like:

You get instantaneous response. What kind of data structure is used here?

• The details are quite complicated, but the main data structure used is quite simple.

Page 20: CS 315 Data Structures   Fall 2011 Instructor: B. (Ravi) Ravikumar          Office: 116 I Darwin Hall Phone: 664 3335 E-mail: cs315spring11@gmail.com

Data structure used - inverted indexArray of lists – each array entry contains a word and a pointer to all the web pages that contain that word:

Data structure 38 97 145 297

This list is kept sorted

876

Question: How do we access the array index from key word?Hashing is used.

Page 21: CS 315 Data Structures   Fall 2011 Instructor: B. (Ravi) Ravikumar          Office: 116 I Darwin Hall Phone: 664 3335 E-mail: cs315spring11@gmail.com

Example 2: The entire landscape of the world is being digitized (there is a whole new branch that combines information technology and geography called GIS – Geographic Information System). What kind of data structure should be used to store all this information?

SnapshotfromGoogleearth

Page 22: CS 315 Data Structures   Fall 2011 Instructor: B. (Ravi) Ravikumar          Office: 116 I Darwin Hall Phone: 664 3335 E-mail: cs315spring11@gmail.com

Some general issues related to GIS• How much memory do we need? Can this be stored in one computer? Building the database is done in the background (off-line processing)• How fast can the queries be answered? Response to query is called the on-line processing• Suppose each square mile is represented by a 1024 by 1024 pixel image, how much storage do we need to store the map of the United States?

Page 23: CS 315 Data Structures   Fall 2011 Instructor: B. (Ravi) Ravikumar          Office: 116 I Darwin Hall Phone: 664 3335 E-mail: cs315spring11@gmail.com

Calculate the memory neededVery rough estimate of the memory needed:

•Area of USA is 4 x 106 sq miles (roughly)

•Each square mile needs 106 pixels (roughly)

•Each pixel requires 32 bits usually.

Thus the total memory needed = 4 x 106 x 32 x 106 = 168 x 1012 = 168000 Giga bits

(A standard desk top has ~ 200 Giga bits of memory.)

Need about 800 such computers to store the data

Page 24: CS 315 Data Structures   Fall 2011 Instructor: B. (Ravi) Ravikumar          Office: 116 I Darwin Hall Phone: 664 3335 E-mail: cs315spring11@gmail.com

What data structure to store the images?

• each 1024 x 1024 image can be stored in a two-dimensional array. (standard way to store all kinds of images – bmp, jpg, png etc.) The actual images are stored in a secondary memory (hard disks on several servers either in a central location or distributed).

• The number of images would be roughly 4 x 106. A set of pointers to these images can be stored in a 1 (or 2) dimensional array.

• When you click on a point on the map, its index in the array is calculated.

• From that index, the image is accessed and sent by a network to the requesting client.

Page 25: CS 315 Data Structures   Fall 2011 Instructor: B. (Ravi) Ravikumar          Office: 116 I Darwin Hall Phone: 664 3335 E-mail: cs315spring11@gmail.com

Some projects from past semesters• Generate all the poker hands More generally, given a set of N items and a

number k<= N, generate all possible combinations or permutations of k items.

(concept: recursion, arrays, lists)

• Image manipulation: (concept: arrays, library, algorithm analysis)

1)

Page 26: CS 315 Data Structures   Fall 2011 Instructor: B. (Ravi) Ravikumar          Office: 116 I Darwin Hall Phone: 664 3335 E-mail: cs315spring11@gmail.com

image manipulation: (concept: arrays, library, analysis of algorithm)

Page 27: CS 315 Data Structures   Fall 2011 Instructor: B. (Ravi) Ravikumar          Office: 116 I Darwin Hall Phone: 664 3335 E-mail: cs315spring11@gmail.com

• Bounding box construction: OCR is one of the early success stories in software applications.

Scan a printed page and recognize the characters in it.

First step: bounding box construction.

Page 28: CS 315 Data Structures   Fall 2011 Instructor: B. (Ravi) Ravikumar          Office: 116 I Darwin Hall Phone: 664 3335 E-mail: cs315spring11@gmail.com

Input:

Output: “In 1830 there were but twenty-three miles of railroad in operation in the United States, and in that year Kentucky took … “

Final step:

Page 29: CS 315 Data Structures   Fall 2011 Instructor: B. (Ravi) Ravikumar          Office: 116 I Darwin Hall Phone: 664 3335 E-mail: cs315spring11@gmail.com

• Spelling checker: Given a text file T, identify all the misspelled words in T.

Idea: build a hash table H of all the words in a dictionary, and search for each word of the text T in the table H. For each misspelled word, suggest the correct spelling.

(hashing, strings, vectors)

Page 30: CS 315 Data Structures   Fall 2011 Instructor: B. (Ravi) Ravikumar          Office: 116 I Darwin Hall Phone: 664 3335 E-mail: cs315spring11@gmail.com

• Peg solitaire (backtracking, recursion, hash table)

Find a sequence of moves that leaves exactly one peg on the board. (starting position can be specified. In some cases, there may be no solution.)

Page 31: CS 315 Data Structures   Fall 2011 Instructor: B. (Ravi) Ravikumar          Office: 116 I Darwin Hall Phone: 664 3335 E-mail: cs315spring11@gmail.com

• Geometric computation problem – given a set of rectangles, determine the total area covered by them. Trace the contour, report all intersections etc.

Data structure: binary search tree.

Page 32: CS 315 Data Structures   Fall 2011 Instructor: B. (Ravi) Ravikumar          Office: 116 I Darwin Hall Phone: 664 3335 E-mail: cs315spring11@gmail.com

•Given two photographs of the same scene taken from two different positions, combine them into a single image.

Page 33: CS 315 Data Structures   Fall 2011 Instructor: B. (Ravi) Ravikumar          Office: 116 I Darwin Hall Phone: 664 3335 E-mail: cs315spring11@gmail.com

Image compression

original

(compressed x10)

(compressed x 50)

(Quadtree data structure)

Page 34: CS 315 Data Structures   Fall 2011 Instructor: B. (Ravi) Ravikumar          Office: 116 I Darwin Hall Phone: 664 3335 E-mail: cs315spring11@gmail.com

Index generation for a document

Index contains the list of all the words appearing in a document, with the line numbers in which they appear.

Typical index for a book looks:

Data structure binary search tree, hash table