pf congres20110917 data-structures

36
1 SPL Data Structures and their Complexity Jurri¨ en Stutterheim September 17, 2011

Upload: norm2782

Post on 13-Jun-2015

1.391 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Pf congres20110917 data-structures

1

SPL Data Structures and their Complexity

Jurrien Stutterheim

September 17, 2011

Page 2: Pf congres20110917 data-structures

2

1. Introduction

Page 3: Pf congres20110917 data-structures

3

This presentation §1

I Understand what data structures are

I How they are represented internally

I How “fast” each one is and why that is

Page 4: Pf congres20110917 data-structures

4

Data structures §1

I Classes that offer the means to store and retrieve data,possibly in a particular order

I Implementation is (often) optimised for certain use cases

I array is PHP’s oldest and most frequently used datastructure

I PHP 5.3 adds support for several others

Page 5: Pf congres20110917 data-structures

5

Current SPL data structures §1

I SplDoublyLinkedList

I SplStack

I SplQueue

I SplHeap

I SplMaxHeap

I SplMinHeap

I SplPriorityQueue

I SplFixedArray

I SplObjectStorage

Page 6: Pf congres20110917 data-structures

6

Why care? §1

I Using the right data structure in the right place couldimprove performance

I Already implemented and tested: saves work

I Can add a type hint in a function definition

I Adds semantics to your code

Page 7: Pf congres20110917 data-structures

7

Algorithmic complexity §1

I We want to be able to talk about the performance of thedata structure implementation

I Running speed (time complexity)I Space consumption (space complexity)

I We describe complexity in terms of input size, which ismachine and programming language independent

Page 8: Pf congres20110917 data-structures

8

Example §1

for ($i = 0; $i < $n; $i++)

for ($j = 0; $j < $n; $j++)

echo ’tick’;

For some n, how many times is “tick” printed? I.e. what is thetime complexity of this algorithm?

n2 times

Page 9: Pf congres20110917 data-structures

8

Example §1

for ($i = 0; $i < $n; $i++)

for ($j = 0; $j < $n; $j++)

echo ’tick’;

For some n, how many times is “tick” printed? I.e. what is thetime complexity of this algorithm?

n2 times

Page 10: Pf congres20110917 data-structures

9

Talking about complexity §1

I Pick a function to act as boundary for the algorithm’scomplexity

I Worst-caseI Denoted O (big-Oh)I “My algorithm will not be slower than this function”

I Best-caseI Denoted Ω (big-Omega)I “My algorithm will at least be as slow as this function”

I If they are the same, we write Θ (big-Theta)

I In example: both cases are n2, so the algorithm is in Θ(n2)

Page 11: Pf congres20110917 data-structures

10

Visualized §1

Page 12: Pf congres20110917 data-structures

11

Example 2 §1

for ($i = 0; $i < $n; $i++)

if ($myBool)

for ($j = 0; $j < $n; $j++)

echo ’tick’;

What is the time complexity of this algorithm?

I O(n2)

I Ω(n) (if $myBool is false)

I No Θ!

Page 13: Pf congres20110917 data-structures

11

Example 2 §1

for ($i = 0; $i < $n; $i++)

if ($myBool)

for ($j = 0; $j < $n; $j++)

echo ’tick’;

What is the time complexity of this algorithm?

I O(n2)

I Ω(n) (if $myBool is false)

I No Θ!

Page 14: Pf congres20110917 data-structures

12

We can be a bit sloppy §1

for ($i = 0; $i < $n; $i++)

if ($myBool)

for ($j = 0; $j < $n; $j++)

echo ’tick’;

I We describe algorithmic behaviour as input size grows toinfinity

I constant factors and smaller terms don’t matter too much

I E.g. 3n2 + 4n + 1 is in O(n2)

Page 15: Pf congres20110917 data-structures

13

Other functions §1

for ($i = 0; $i < $n; $i++)

for ($j = 0; $j < $n; $j++)

echo ’tick’;

for ($i = 0; $i < $n; $i++)

echo ’tock’;

This algorithm is still in Θ(n2).

Page 16: Pf congres20110917 data-structures

14

Bounds §1

Figure: Order relations1

1Taken from Cormen et al. 2009

Page 17: Pf congres20110917 data-structures

15

Complexity Comparison §1

100

101

101

102

103

Logarithmic

Linear

Quadratic

ExponentialFactorialSuperexponential

Constant: 1, logarithmic: lg n, linear: n, quadratic: n2,exponential: 2n, factorial: n!, super-exponential: nn

Page 18: Pf congres20110917 data-structures

16

In numbers §1

Approximate growth for n = 50:

1 1

lg n 5.64

n 50

n2 2500

n3 12500

2n 1125899906842620

n! 3.04 ∗ 1064

nn 8.88 ∗ 1084

Page 19: Pf congres20110917 data-structures

17

Some more notes on complexity §1

I Constant time is written 1, but goes for any constant c

I Polynomial time contains all functions in nc for someconstant c

I Everything in this presentation will be in polynomial time

Page 20: Pf congres20110917 data-structures

18

2. SPL Data Structures

Page 21: Pf congres20110917 data-structures

19

Credit where credit is due §2

The first three pictures in this section are from Wikipedia

Page 22: Pf congres20110917 data-structures

20

SplDoublyLinkedList §2

12 99 37

I Superclass of SplStack and SplQueue

I SplDoublyLinkedList is not truly a doubly linked list; itbehaves like a hashtable

Page 23: Pf congres20110917 data-structures

20

SplDoublyLinkedList §2

12 99 37

I Superclass of SplStack and SplQueue

I SplDoublyLinkedList is not truly a doubly linked list; itbehaves like a hashtable

I Usual doubly linked list time complexityI Append/prepend to available node in Θ(1)I Lookup by scanning in O(n)I Access to beginning/end in Θ(1)

Page 24: Pf congres20110917 data-structures

20

SplDoublyLinkedList §2

12 99 37

I Superclass of SplStack and SplQueue

I SplDoublyLinkedList is not truly a doubly linked list; itbehaves like a hashtable

I Usual doubly linked list time complexityI Append/prepend to available node in Θ(1)I Lookup by scanning in O(n)I Access to beginning/end in Θ(1)

I SplDoublyLinkedList time complexityI Insert/delete by index in Θ(1)I Lookup by index in Θ(1)I Access to beginning/end in Θ(1)

Page 25: Pf congres20110917 data-structures

21

SplStack §2

I Subclass of SplDoublyLinkedList; adds no new operations

I Last-in, first-out (LIFO)

I Pop/push value from/on the top of the stack in Θ(1)

PopPush

Page 26: Pf congres20110917 data-structures

22

SplQueue §2

I Subclass of SplDoublyLinkedList; adds enqueue/dequeueoperations

I First-in, first-out (FIFO)

I Read/dequeue element from front in Θ(1)

I Enqueue element to the end in Θ(1)

Dequeue

Enqueue

Page 27: Pf congres20110917 data-structures

23

Short excursion: trees §2

100

19 36

17 3 25 1

2 7

I Consists of nodes (vertices) and directed edgesI Each node always has in-degree 1

I Except the root: always in-degree 0

I Previous property implies there are no cycles

I Binary tree: each node has at most two child-nodes

Page 28: Pf congres20110917 data-structures

24

SplHeap, SplMaxHeap and SplMinHeap §2

100

19 36

17 3 25 1

2 7

I A heap is a tree with the heap property : for all A and B, ifB is a child node of A, then

I val(A) > val(B) for a max-heap: SplMaxHeapI val(A) 6 val(B) for a min-heap: SplMinHeap

I Where val(A) denotes the value of node A

Page 29: Pf congres20110917 data-structures

25

Heaps contd. §2

I SplHeap is an abstract superclass

I Implemented as binary tree

I Access to root element in Θ(1)

I Insertion/deletion in O(lg n)

Page 30: Pf congres20110917 data-structures

26

SplPriorityQueue §2

I Variant of SplMaxHeap: for all A and B, if B is a childnode of A, then prio(A) > prio(B)

I Where prio(A) denotes the priority of node A

Page 31: Pf congres20110917 data-structures

27

SplFixedArray §2

I Fixed-size array with numerical indices onlyI Efficient OO array implementation

I No hashing required for keysI Can make assumptions about array size

I Lookup, insertion, deletion in Θ(1) time

I Resize in Θ(n)

Page 32: Pf congres20110917 data-structures

28

SplObjectStorage §2

I Storage container for objects

I Insertion, deletion in Θ(1)

I Verification of presence in Θ(1)

I Missing: set operationsI Union, intersection, difference, etc.

Page 33: Pf congres20110917 data-structures

29

3. Concluding

Page 34: Pf congres20110917 data-structures

30

Missing in PHP §3

I Set data structureI Map/hashtable data structure

I Does SplDoublyLinkedList satisfy this use case?I If yes: split it in two separate structures and make

SplDoublyLinkedList a true doubly linked list

I Immutable data structuresI Allows us to more easily emulate “pure” functionsI Less bugs in your code due to lack of mutable state

Page 35: Pf congres20110917 data-structures

31

Closing remarks §3

I Use the SPL data structures!

I Choose them with care

I Reason about your code’s complexity

Page 36: Pf congres20110917 data-structures

32

Questions §3

Questions?