Download - Pf congres20110917 data-structures
1
SPL Data Structures and their Complexity
Jurrien Stutterheim
September 17, 2011
2
1. Introduction
3
This presentation §1
I Understand what data structures are
I How they are represented internally
I How “fast” each one is and why that is
4
Data structures §1
I Classes that offer the means to store and retrieve data,possibly in a particular order
I Implementation is (often) optimised for certain use cases
I array is PHP’s oldest and most frequently used datastructure
I PHP 5.3 adds support for several others
5
Current SPL data structures §1
I SplDoublyLinkedList
I SplStack
I SplQueue
I SplHeap
I SplMaxHeap
I SplMinHeap
I SplPriorityQueue
I SplFixedArray
I SplObjectStorage
6
Why care? §1
I Using the right data structure in the right place couldimprove performance
I Already implemented and tested: saves work
I Can add a type hint in a function definition
I Adds semantics to your code
7
Algorithmic complexity §1
I We want to be able to talk about the performance of thedata structure implementation
I Running speed (time complexity)I Space consumption (space complexity)
I We describe complexity in terms of input size, which ismachine and programming language independent
8
Example §1
for ($i = 0; $i < $n; $i++)
for ($j = 0; $j < $n; $j++)
echo ’tick’;
For some n, how many times is “tick” printed? I.e. what is thetime complexity of this algorithm?
n2 times
8
Example §1
for ($i = 0; $i < $n; $i++)
for ($j = 0; $j < $n; $j++)
echo ’tick’;
For some n, how many times is “tick” printed? I.e. what is thetime complexity of this algorithm?
n2 times
9
Talking about complexity §1
I Pick a function to act as boundary for the algorithm’scomplexity
I Worst-caseI Denoted O (big-Oh)I “My algorithm will not be slower than this function”
I Best-caseI Denoted Ω (big-Omega)I “My algorithm will at least be as slow as this function”
I If they are the same, we write Θ (big-Theta)
I In example: both cases are n2, so the algorithm is in Θ(n2)
10
Visualized §1
11
Example 2 §1
for ($i = 0; $i < $n; $i++)
if ($myBool)
for ($j = 0; $j < $n; $j++)
echo ’tick’;
What is the time complexity of this algorithm?
I O(n2)
I Ω(n) (if $myBool is false)
I No Θ!
11
Example 2 §1
for ($i = 0; $i < $n; $i++)
if ($myBool)
for ($j = 0; $j < $n; $j++)
echo ’tick’;
What is the time complexity of this algorithm?
I O(n2)
I Ω(n) (if $myBool is false)
I No Θ!
12
We can be a bit sloppy §1
for ($i = 0; $i < $n; $i++)
if ($myBool)
for ($j = 0; $j < $n; $j++)
echo ’tick’;
I We describe algorithmic behaviour as input size grows toinfinity
I constant factors and smaller terms don’t matter too much
I E.g. 3n2 + 4n + 1 is in O(n2)
13
Other functions §1
for ($i = 0; $i < $n; $i++)
for ($j = 0; $j < $n; $j++)
echo ’tick’;
for ($i = 0; $i < $n; $i++)
echo ’tock’;
This algorithm is still in Θ(n2).
14
Bounds §1
Figure: Order relations1
1Taken from Cormen et al. 2009
15
Complexity Comparison §1
100
101
101
102
103
Logarithmic
Linear
Quadratic
ExponentialFactorialSuperexponential
Constant: 1, logarithmic: lg n, linear: n, quadratic: n2,exponential: 2n, factorial: n!, super-exponential: nn
16
In numbers §1
Approximate growth for n = 50:
1 1
lg n 5.64
n 50
n2 2500
n3 12500
2n 1125899906842620
n! 3.04 ∗ 1064
nn 8.88 ∗ 1084
17
Some more notes on complexity §1
I Constant time is written 1, but goes for any constant c
I Polynomial time contains all functions in nc for someconstant c
I Everything in this presentation will be in polynomial time
18
2. SPL Data Structures
19
Credit where credit is due §2
The first three pictures in this section are from Wikipedia
20
SplDoublyLinkedList §2
12 99 37
I Superclass of SplStack and SplQueue
I SplDoublyLinkedList is not truly a doubly linked list; itbehaves like a hashtable
20
SplDoublyLinkedList §2
12 99 37
I Superclass of SplStack and SplQueue
I SplDoublyLinkedList is not truly a doubly linked list; itbehaves like a hashtable
I Usual doubly linked list time complexityI Append/prepend to available node in Θ(1)I Lookup by scanning in O(n)I Access to beginning/end in Θ(1)
20
SplDoublyLinkedList §2
12 99 37
I Superclass of SplStack and SplQueue
I SplDoublyLinkedList is not truly a doubly linked list; itbehaves like a hashtable
I Usual doubly linked list time complexityI Append/prepend to available node in Θ(1)I Lookup by scanning in O(n)I Access to beginning/end in Θ(1)
I SplDoublyLinkedList time complexityI Insert/delete by index in Θ(1)I Lookup by index in Θ(1)I Access to beginning/end in Θ(1)
21
SplStack §2
I Subclass of SplDoublyLinkedList; adds no new operations
I Last-in, first-out (LIFO)
I Pop/push value from/on the top of the stack in Θ(1)
PopPush
22
SplQueue §2
I Subclass of SplDoublyLinkedList; adds enqueue/dequeueoperations
I First-in, first-out (FIFO)
I Read/dequeue element from front in Θ(1)
I Enqueue element to the end in Θ(1)
Dequeue
Enqueue
23
Short excursion: trees §2
100
19 36
17 3 25 1
2 7
I Consists of nodes (vertices) and directed edgesI Each node always has in-degree 1
I Except the root: always in-degree 0
I Previous property implies there are no cycles
I Binary tree: each node has at most two child-nodes
24
SplHeap, SplMaxHeap and SplMinHeap §2
100
19 36
17 3 25 1
2 7
I A heap is a tree with the heap property : for all A and B, ifB is a child node of A, then
I val(A) > val(B) for a max-heap: SplMaxHeapI val(A) 6 val(B) for a min-heap: SplMinHeap
I Where val(A) denotes the value of node A
25
Heaps contd. §2
I SplHeap is an abstract superclass
I Implemented as binary tree
I Access to root element in Θ(1)
I Insertion/deletion in O(lg n)
26
SplPriorityQueue §2
I Variant of SplMaxHeap: for all A and B, if B is a childnode of A, then prio(A) > prio(B)
I Where prio(A) denotes the priority of node A
27
SplFixedArray §2
I Fixed-size array with numerical indices onlyI Efficient OO array implementation
I No hashing required for keysI Can make assumptions about array size
I Lookup, insertion, deletion in Θ(1) time
I Resize in Θ(n)
28
SplObjectStorage §2
I Storage container for objects
I Insertion, deletion in Θ(1)
I Verification of presence in Θ(1)
I Missing: set operationsI Union, intersection, difference, etc.
29
3. Concluding
30
Missing in PHP §3
I Set data structureI Map/hashtable data structure
I Does SplDoublyLinkedList satisfy this use case?I If yes: split it in two separate structures and make
SplDoublyLinkedList a true doubly linked list
I Immutable data structuresI Allows us to more easily emulate “pure” functionsI Less bugs in your code due to lack of mutable state
31
Closing remarks §3
I Use the SPL data structures!
I Choose them with care
I Reason about your code’s complexity
32
Questions §3
Questions?