masterizing php data structure 102
DESCRIPTION
We all have certainly learned data structures at school: arrays, lists, sets, stacks, queues (LIFO/FIFO), heaps, associative arrays, trees, … and what do we mostly use in PHP? The “array”! In most cases, we do everything and anything with it but we stumble upon it when profiling code.During this session, we’ll learn again to use the structures appropriately, leaning closer on the way to employ arrays, the SPL and other structures from PHP extensions as well.TRANSCRIPT
Masterizing PHP Data Structure 102Patrick Allaert
PHPBenelux Conference Antwerp 2012
About me
● Patrick Allaert● Founder of Libereco● Playing with PHP/Linux for +10 years● eZ Publish core developer● Author of the APM PHP extension● @patrick_allaert● [email protected]● http://github.com/patrickallaert/● http://patrickallaert.blogspot.com/
Masterizing =Mastering +
Rising
PHP native datatypes
● NULL (IS_NULL)
● Booleans (IS_BOOL)
● Integers (IS_LONG)
● Floating point numbers (IS_DOUBLE)
● Strings (IS_STRING)
● Arrays (IS_ARRAY, IS_CONSTANT_ARRAY)
● Objects (IS_OBJECT)
● Resources (IS_RESOURCE)
● Callable (IS_CALLABLE)
Wikipedia datatypes●
2-3-4 tree
●2-3 heap
●2-3 tree
●AA tree
●Abstract syntax tree
●(a,b)-tree
●Adaptive k-d tree
●Adjacency list
●Adjacency matrix
●AF-heap
●Alternating decision tree
●And-inverter graph
●And–or tree
●Array
●AVL tree
●Beap
●Bidirectional map
●Bin
●Binary decision diagram
●Binary heap
●Binary search tree
●Binary tree
●Binomial heap
●Bit array
●Bitboard
●Bit field
●Bitmap
●BK-tree
●Bloom filter
● Boolean
●Bounding interval hierarchy
●B sharp tree
●BSP tree
●B-tree
●B*-tree
●B+ tree
●B-trie
●Bx-tree
●Cartesian tree
●Char
●Circular buffer
●Compressed suffix array
●Container
●Control table
●Cover tree
●Ctrie
●Dancing tree
●D-ary heap
●Decision tree
●Deque
●Directed acyclic graph
●Directed graph
●Disjoint-set
●Distributed hash table
●Double
●Doubly connected edge list
●Doubly linked list
●Dynamic array
●Enfilade
●Enumerated type
●Expectiminimax tree
●Exponential tree
●Fenwick tree
●Fibonacci heap
●Finger tree
●Float
●FM-index
●Fusion tree
●Gap buffer
●Generalised suffix tree
●Graph
●Graph-structured stack
●Hash
●Hash array mapped trie
● Hashed array tree
● Hash list
● Hash table
● Hash tree
● Hash trie
● Heap
● Heightmap
● Hilbert R-tree
● Hypergraph
● Iliffe vector
● Image
● Implicit kd-tree
● Interval tree
● Int
● Judy array
● Kdb tree
● Kd-tree
● Koorde
● Leftist heap
● Lightmap
● Linear octree
● Link/cut tree
● Linked list
● Lookup table
●Map/Associative array/Dictionary
●Matrix
●Metric tree
●Minimax tree
●Min/max kd-tree
●M-tree
●Multigraph
●Multimap
●Multiset
●Octree
●Pagoda
●Pairing heap
●Parallel array
●Parse tree
●Plain old data structure
●Prefix hash tree
●Priority queue
●Propositional directed acyclic graph
●Quad-edge
●Quadtree
●Queap
●Queue
●Radix tree
●Randomized binary search tree
●Range tree
●Rapidly-exploring random tree
●Record (also called tuple or struct)
●Red-black tree
●Rope
●Routing table
●R-tree
●R* tree
●R+ tree
●Scapegoat tree
●Scene graph
●Segment tree
●Self-balancing binary search tree
●Self-organizing list
●Set
●Skew heap
●Skip list
●Soft heap
●Sorted array
●Spaghetti stack
●Sparse array
●Sparse matrix
●Splay tree
●SPQR-tree
●Stack
●String
●Suffix array
●Suffix tree
●Symbol table
●Syntax tree
●Tagged union (variant record, discriminated union, disjoint union)
●Tango tree
●Ternary heap
●Ternary search tree
●Threaded binary tree
●Top tree
●Treap
●Tree
●Trees
●Trie
●T-tree
●UB-tree
●Union
●Unrolled linked list
●Van Emde Boas tree
●Variable-length array
●VList
●VP-tree
●Weight-balanced tree
●Winged edge
●X-fast trie
●Xor linked list
●X-tree
●Y-fast trie
●Zero suppressed decision diagram
●Zipper
●Z-order
Game:Can you recognize some structures?
Array: PHP's untruthfulness
PHP “Arrays” are not true Arrays!
Array: PHP's untruthfulness
PHP “Arrays” are not true Arrays!
An array is typically implemented like this:
Data DataDataData Data Data
Array: PHP's untruthfulness
PHP “Arrays” can be iterated both directions (reset(), next(), prev(), end()), exclusively with O(1) operations.
Array: PHP's untruthfulness
PHP “Arrays” can be iterated both directions (reset(), next(), prev(), end()), exclusively with O(1) operations.
Implementation based on a Doubly Linked List (DLL):
Data Data Data Data Data
Head Tail
Enables List, Deque, Queue and Stack implementations
Array: PHP's untruthfulness
PHP “Arrays” elements are always accessible using a key (index).
Array: PHP's untruthfulness
PHP “Arrays” elements are always accessible using a key (index).
Implementation based on a Hash Table:
Data Data Data Data Data
Head Tail
Bucket Bucket Bucket Bucket Bucket
Bucket pointers array
Bucket *
0
Bucket *
1
Bucket *
2
Bucket *
3
Bucket *
4
Bucket *
5 ...
Bucket *
nTableSize -1
Array: PHP's untruthfulness
http://php.net/manual/en/language.types.array.php:
“This type is optimized for several different uses; it can be treated as an array, list (vector), hash table (an implementation of a map), dictionary, collection, stack, queue, and probably more.”
Optimized for anything ≈ Optimized for nothing!
Array: PHP's untruthfulness
● In C: 100 000 integers (using long on 64bits => 8 bytes) can be stored in 0.76 Mb.
● In PHP: it will take 13.97 Mb!≅● A PHP variable (containing an integer) takes 48
bytes.● The overhead of buckets for every “array” entries is
about 96 bytes.● More details:
http://nikic.github.com/2011/12/12/How-big-are-PHP-arrays-really-Hint-BIG.html
Data Structure
Structs (or records, tuples,...)
● A struct is a value containing other values which are typically accessed using a name.
● Example:Person => firstName / lastNameComplexNumber => realPart / imaginaryPart
Structs – Using array
$person = array( "firstName" => "Patrick", "lastName" => "Allaert");
Structs – Using a class
$person = new PersonStruct( "Patrick", "Allaert");
Structs – Using a class (Implementation)
class PersonStruct{ public $firstName; public $lastName; public function __construct($firstName, $lastName) { $this->firstName = $firstName; $this->lastName = $lastName; }}
Structs – Using a class (Implementation)
class PersonStruct{ public $firstName; public $lastName; public function __construct($firstName, $lastName) { $this->firstName = $firstName; $this->lastName = $lastName; } public function __set($key, $value) { // a. Do nothing // b. trigger_error() // c. Throws an exception }}
Structs – Pros and Cons
Array+ Uses less memory (PHP < 5.4)
- Uses more memory (PHP = 5.4)
- No type hinting
- Flexible structure
+|- Less OO
+ Slightly faster
Class- Uses more memory (PHP < 5.4)
+ Uses less memory (PHP = 5.4)
+ Type hinting possible
+ Rigid structure
+|- More OO
- Slightly slower
“true” Arrays
● An array is a fixed size collection where elements are each identified by a numeric index.
“true” Arrays
● An array is a fixed size collection where elements are each identified by a numeric index.
Data DataDataData Data Data
0 1 2 3 4 5
“true” Arrays – Using SplFixedArray
$array = new SplFixedArray(3);$array[0] = 1; // or $array->offsetSet()$array[1] = 2; // or $array->offsetSet()$array[2] = 3; // or $array->offsetSet()$array[0]; // gives 1$array[1]; // gives 2$array[2]; // gives 3
“true” Arrays – Pros and Cons
Array- Uses more memory
+|- Less OO
+ Slightly faster
SplFixedArray+ Uses less memory
+|- More OO
- Slightly slower
Queues
● A queue is an ordered collection respecting First In, First Out (FIFO) order.
● Elements are inserted at one end and removed at the other.
Queues
● A queue is an ordered collection respecting First In, First Out (FIFO) order.
● Elements are inserted at one end and removed at the other.
Data DataDataData Data Data
Data
Data
Enqueue
Dequeue
Queues – Using array
$queue = array();$queue[] = 1; // or array_push()$queue[] = 2; // or array_push()$queue[] = 3; // or array_push()array_shift($queue); // gives 1array_shift($queue); // gives 2array_shift($queue); // gives 3
Queues – Using SplQueue
$queue = new SplQueue();$queue[] = 1; // or $queue->enqueue()$queue[] = 2; // or $queue->enqueue()$queue[] = 3; // or $queue->enqueue()$queue->dequeue(); // gives 1$queue->dequeue(); // gives 2$queue->dequeue(); // gives 3
Queues – Pros and Cons
Array- Uses more memory
(overhead / entry: 96 bytes)
- No type hinting
+|- Less OO
SplQueue+ Uses less memory
(overhead / entry: 48 bytes)
+ Type hinting possible
+|- More OO
Stacks
● A stack is an ordered collection respecting Last In, First Out (LIFO) order.
● Elements are inserted and removed on the same end.
Stacks
● A stack is an ordered collection respecting Last In, First Out (LIFO) order.
● Elements are inserted and removed on the same end.
Data DataDataData Data Data
Data
Data
Push
Pop
Stacks – Using array
$stack = array();$stack[] = 1; // or array_push()$stack[] = 2; // or array_push()$stack[] = 3; // or array_push()array_pop($stack); // gives 3array_pop($stack); // gives 2array_pop($stack); // gives 1
Stacks – Using SplStack
$stack = new SplStack();$stack[] = 1; // or $stack->push()$stack[] = 2; // or $stack->push()$stack[] = 3; // or $stack->push()$stack->pop(); // gives 3$stack->pop(); // gives 2$stack->pop(); // gives 1
Stacks – Pros and Cons
Array- Uses more memory
(overhead / entry: 96 bytes)
- No type hinting
+|- Less OO
Class+ Uses less memory
(overhead / entry: 48 bytes)
+ Type hinting possible
+|- More OO
Sets
● A set is a collection with no particular ordering especially suited for testing the membership of a value against a collection or to perform union/intersection/complement operations between them.
Sets
● A set is a collection with no particular ordering especially suited for testing the membership of a value against a collection or to perform union/intersection/complement operations between them.
Data
Data
Data
Data
Data
Sets – Using array
$set = array();$set[] = 1;$set[] = 2;$set[] = 3;
in_array(2, $set); // truein_array(5, $set); // false
array_merge($set1, $set2); // unionarray_intersect($set1, $set2); // intersectionarray_diff($set1, $set2); // complement
Sets – Using array
$set = array();$set[] = 1;$set[] = 2;$set[] = 3;
in_array(2, $set); // truein_array(5, $set); // false
array_merge($set1, $set2); // unionarray_intersect($set1, $set2); // intersectionarray_diff($set1, $set2); // complement
True performance killers!
Sets – Using array (simple types)
$set = array();$set[1] = true; // Any dummy value$set[2] = true; // is good but NULL!$set[3] = true;
isset($set[2]); // trueisset($set[5]); // false
$set1 + $set2; // unionarray_intersect_key($set1, $set2); // intersectionarray_diff_key($set1, $set2); // complement
Sets – Using array (simple types)
● Remember that PHP Array keys can be integers or strings only!
$set = array();$set[1] = true; // Any dummy value$set[2] = true; // is good but NULL!$set[3] = true;
isset($set[2]); // trueisset($set[5]); // false
$set1 + $set2; // unionarray_intersect_key($set1, $set2); // intersectionarray_diff_key($set1, $set2); // complement
Sets – Using array (objects)
$set = array();$set[spl_object_hash($object1)] = $object1;$set[spl_object_hash($object2)] = $object2;$set[spl_object_hash($object3)] = $object3;
isset($set[spl_object_hash($object2)]); // trueisset($set[spl_object_hash($object5)]); // false
$set1 + $set2; // unionarray_intersect_key($set1, $set2); // intersectionarray_diff_key($set1, $set2); // complement
Sets – Using SplObjectStorage (objects)
$set = new SplObjectStorage();$set->attach($object1); // or $set[$object1] = null;$set->attach($object2); // or $set[$object2] = null;$set->attach($object3); // or $set[$object3] = null;
isset($set[$object2]); // trueisset($set[$object2]); // false
$set1->addAll($set2); // union$set1->removeAllExcept($set2); // intersection$set1->removeAll($set2); // complement
Sets – Using QuickHash (int)
● No union/intersection/complement operations (yet?)
● Yummy features like (loadFrom|saveTo)(String|File)
$set = new QuickHashIntSet(64, QuickHashIntSet::CHECK_FOR_DUPES);$set->add(1);$set->add(2);$set->add(3);
$set->exists(2); // true$set->exists(5); // false
Sets – With finite possible values
define("E_ERROR", 1); // or 1<<0define("E_WARNING", 2); // or 1<<1define("E_PARSE", 4); // or 1<<2define("E_NOTICE", 8); // or 1<<3
$set = 0;$set |= E_ERROR;$set |= E_WARNING;$set |= E_PARSE;
$set & E_ERROR; // true$set & E_NOTICE; // false
$set1 | $set2; // union$set1 & $set2; // intersection$set1 ^ $set2; // complement
Sets – With finite possible values (function features)
Instead of:function remove($path, $files = true, $directories = true, $links = true, $executable = true){ if (!$files && is_file($path)) return false; if (!$directories && is_dir($path)) return false; if (!$links && is_link($path)) return false; if (!$executable && is_executable($path)) return false; // ...}
remove("/tmp/removeMe", true, false, true, false); // WTF ?!
Sets – With finite possible values (function features)
Instead of:define("REMOVE_FILES", 1 << 0);define("REMOVE_DIRS", 1 << 1);define("REMOVE_LINKS", 1 << 2);define("REMOVE_EXEC", 1 << 3);define("REMOVE_ALL", ~0); // Setting all bits
function remove($path, $options = REMOVE_ALL){ if (~$options & REMOVE_FILES && is_file($path)) return false; if (~$options & REMOVE_DIRS && is_dir($path)) return false; if (~$options & REMOVE_LINKS && is_link($path)) return false; if (~$options & REMOVE_EXEC && is_executable($path)) return false; // ...}
remove("/tmp/removeMe", REMOVE_FILES | REMOVE_LINKS); // Much better :)
Sets: Conclusions
● Use the key and not the value when using PHP Arrays.
● Use QuickHash for set of integers if possible.● Use SplObjectStorage as soon as you are playing
with objects.● Don't use array_unique() when you need a set!
Bloom filters
● A bloom filter is a space-efficient probabilistic data structure used to test whether an element is member of a set.
● False positives are possible, but false negatives are not!
Bloom filters – Using bloomy
// BloomFilter::__construct(int capacity [, double error_rate [, int random_seed ] ])$bloomFilter = new BloomFilter(10000, 0.001);
$bloomFilter->add("An element");
$bloomFilter->has("An element"); // true for sure$bloomFilter->has("Foo"); // false, most probably
Maps
● A map is a collection of key/value pairs where all keys are unique.
Maps – Using array
● Don't use array_merge() on maps.
$map = array();$map["ONE"] = 1;$map["TWO"] = 2;$map["THREE"] = 3;
// Merging maps:array_merge($map1, $map2); // SLOW!$map2 + $map1; // Fast :)
Multikey Maps – Using array
● Don't use array_merge() on maps.
$map = array();$map["ONE"] = 1;$map["UN"] =& $map["ONE"];$map["UNO"] =& $map["ONE"];$map["TWO"] = 2;$map["DEUX"] =& $map["TWO"];$map["DUE"] =& $map["TWO"];
$map["UNO"] = "once";$map["DEUX"] = "twice";
var_dump($map);/*array(6) {["ONE"] => &string(4) "once"["UN"] => &string(4) "once"["UNO"] => &string(4) "once"["TWO"] => &string(5) "twice"["DEUX"] => &string(5) "twice"["DUE"] => &string(5) "twice"}*/
Heap
● A heap is a tree-based structure in which all elements are ordered with largest key at the top, and the smallest one as leafs.
Heap
● A heap is a tree-based structure in which all elements are ordered with largest key at the top, and the smallest one as leafs.
Heap – Using array
$heap = array();$heap[] = 3;sort($heap);$heap[] = 1;sort($heap);$heap[] = 2;sort($heap);
Heap – Using Spl(Min|Max)Heap
$heap = new SplMinHeap;$heap->insert(3);$heap->insert(1);$heap->insert(2);
Heaps: Conclusions
● MUCH faster than having to re-sort() an array at every insertion.
● If you don't require a collection to be sorted at every single step and can insert all data at once and then sort(). Array is a much better/faster approach.
● SplPriorityQueue is very similar, consider it is the same as SplHeap but where the sorting is made on the key rather than the value.
Other related projects
● SPL Types: Various types implemented as object: SplInt, SplFloat, SplEnum, SplBool and SplString http://pecl.php.net/package/SPL_Types
Other related projects
● SPL Types: Various types implemented as object: SplInt, SplFloat, SplEnum, SplBool and SplString http://pecl.php.net/package/SPL_Types
● Judy: Sparse dynamic arrays implementation http://pecl.php.net/package/Judy
Other related projects
● SPL Types: Various types implemented as object: SplInt, SplFloat, SplEnum, SplBool and SplString http://pecl.php.net/package/SPL_Types
● Judy: Sparse dynamic arrays implementation http://pecl.php.net/package/Judy
● Weakref: Weak references implementation. Provides a gateway to an object without preventing that object from being collected by the garbage collector.
Conclusions
● Use appropriate data structure. It will keep your code clean and fast.
Conclusions
● Use appropriate data structure. It will keep your code clean and fast.
● Think about the time and space complexity involved by your algorithms.
Conclusions
● Use appropriate data structure. It will keep your code clean and fast.
● Think about the time and space complexity involved by your algorithms.
● Name your variables accordingly: use “Map”, “Set”, “List”, “Queue”,... to describe them instead of using something like: $ordersArray.
Questions?
Thanks
● Don't forget to rate this talk on https://joind.in/4753
Photo Credits
● Northstar Ski Jump: http://www.flickr.com/photos/renotahoe/5593248965
● Tuned car:http://www.flickr.com/photos/gioxxswall/5783867752
● London Eye Structure: http://www.flickr.com/photos/photographygal123/4883546484
● Cigarette:http://www.flickr.com/photos/superfantastic/166215927
● Heap structure:http://en.wikipedia.org/wiki/File:Max-Heap.svg