spl: the undiscovered library - datastructures

Post on 15-Jan-2015

3.255 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Slides from presentation given to the Brighton PHP group on 15th December 2014

TRANSCRIPT

SPLThe Undiscovered Library

Exploring DataStructures

Who am I?

Mark BakerDesign and Development ManagerInnovEd (Innovative Solutions for Education) Ltd

Coordinator and Developer of:Open Source PHPOffice library

PHPExcel, PHPWord,PHPPowerPoint, PHPProject, PHPVisioMinor contributor to PHP core

@Mark_Baker

https://github.com/MarkBaker

http://uk.linkedin.com/pub/mark-baker/b/572/171

SPL – Standard PHP Library

• SPL provides a standard set of interfaces for PHP5• The aim of SPL is to implement some efficient data access interfaces

and classes for PHP• Introduced with PHP 5.0.0• Included as standard with PHP since version 5.3.0• SPL DataStructures were added for version 5.3.0

SPL DataStructures

Dictionary DataStructures (Maps)• Fixed Arrays

Linear DataStructures• Doubly-Linked Lists• Stacks• Queues

Tree DataStructures• Heaps

SPL DataStructures – Why use them?• Can improve performance• When the right structures are used in the right place

• Can reduce memory usage• When the right structures are used in the right place

• Already implemented and tested in PHP core• Saves work!

• Can be type-hinted in function/method definitions• Adds semantics to your code

SPL DataStructures

Dictionary DataStructures (Maps)• Fixed Arrays

Linear DataStructuresTree DataStructures

Fixed Arrays

• Predefined Size• Enumerated indexes only, not Associative• Indexed from 0• Is an object• No hashing required for keys

• Implements • Iterator• ArrayAccess• Countable

Fixed Arrays – Uses

• Returned Database resultsets, Record collections• Hours of Day• Days of Month/Year• Hotel Rooms, Airline seats

As a 2-d fixed array

Fixed Arrays – Big-O Complexity

• Insert an element O(1)• Delete an element O(1)• Lookup an element O(1)• Resize a Fixed Array O(n)

Fixed ArraysStandard Arrays SPLFixedArray

Data Record 1

Key 12345

Data Record 2Key 23456

Data Record 4Key 34567

Data Record 3Key 45678

[0]

[1]

[2]

[…]

[…]

[12]

[n-1]

HashFunction

Key 12345Key 23456

Key 45678Key 34567

Data Record 1Key 0

Data Record 2Key 1

Data Record 3Key 2

Data Record 4Key 3

[0]

[1]

[2]

[…]

[…]

[12]

[n-1]

Key 0Key 1

Key 2Key 3

Fixed Arrays

$a = array();

for ($i = 0; $i < $size; ++$i) {    $a[$i] = $i;}

//  Random/Indexed access for ($i = 0; $i < $size; ++$i) {    $r = $a[$i];}

//  Sequential access foreach($a as $v) { }

//  Sequential access with keysforeach($a as $k => $v) {}

Initialise: 0.0000 sSet 1,000,000 Entries: 0.4671 sRead 1,000,000 Entries: 0.3326 sIterate values for 1,000,000 Entries: 0.0436 sIterate keys and values for 1,000,000 Entries: 0.0839 s

Total Time: 0.9272 sMemory: 82,352.55 k

Fixed Arrays

$a = new \SPLFixedArray($size);

for ($i = 0; $i < $size; ++$i) {    $a[$i] = $i;}

//  Random/Indexed access for ($i = 0; $i < $size; ++$i) {    $r = $a[$i];}

//  Sequential access foreach($a as $v) { }

//  Sequential access with keysforeach($a as $k => $v) {}

Initialise: 0.0013 sSet 1,000,000 Entries: 0.3919 sRead 1,000,000 Entries: 0.3277 sIterate values for 1,000,000 Entries: 0.1129 sIterate keys and values for 1,000,000 Entries: 0.1531 s

Total Time: 0.9869 sMemory: 35,288.41 k

Initialise (s) Set Values (s) Sequential Read (s)

Random Read (s) Pop (s)0.0000

0.0100

0.0200

0.0300

0.0400

0.0500

0.0600

0.0700

Speed

SPL Fixed Array Standard PHP Array

Fixed Arrays

Current Memory (k) Peak Memory (k)0

1000

2000

3000

4000

5000

6000

7000

8000

9000

Memory Usage

SPL Fixed Array Standard PHP Array

Fixed Arrays

• Faster direct access• Lower memory usage• Faster for random/indexed access than for sequential access

Fixed Arrays – Gotchas

• Can be extended, but at a cost in speed• Standard array functions won’t work with SPLFixedArray

e.g. array_walk(), sort(), array_pop(), implode()

• Avoid unsetting elements if possible• Unlike standard PHP enumerated arrays, this leaves empty nodes that trigger

an Exception if accessed

SPL DataStructures

Dictionary DataStructures (Maps)Linear DataStructures• Doubly-Linked Lists• Stacks• Queues

Tree DataStructures

Doubly Linked Lists

Doubly Linked Lists

• Iterable Lists• Top to Bottom• Bottom to Top

• Unindexed• Good for sequential access

• Not good for random/indexed access

• Implements • Iterator• ArrayAccess• Countable

Doubly Linked Lists – Uses

• Stacks• Queues• Most-recently used lists• Undo functionality• Trees• Memory Allocators• Fast dynamic, iterable arrays (not PHP’s hashed arrays)• iNode maps• Video frame queues

Doubly Linked Lists – Big-O Complexity• Insert an element by index O(1)• Delete an element by index O(1)• Lookup by index O(n)• I have seen people saying that SPLDoublyLinkedList behaves like a hash table

for lookups, which would make it O(1); but timing tests prove otherwise

• Access a node at the beginning of the list O(1)• Access a node at the end of the list O(1)

Doubly Linked Lists

Head Tail

A B C D E

Doubly Linked Lists

$a = array();

for ($i = 0; $i < $size; ++$i) {    $a[$i] = $i;}

//  Random/Indexed access for ($i = 0; $i < $size; ++$i) {    $r = $a[$i];}

//  Sequential access for ($i = 0; $i < $size; ++$i) {     $r = array_pop($a); }

Initialise: 0.0000 sSet 100,000 Entries: 0.0585 sRead 100,000 Entries: 0.0378 sPop 100,000 Entries: 0.1383 sTotal Time: 0.2346 s

Memory: 644.55 kPeak Memory: 8457.91 k

Doubly Linked Lists

$a = new \SplDoublyLinkedList();

for ($i = 0; $i < $size; ++$i) {     $a->push($i); }

//  Random/Indexed access for ($i = 0; $i < $size; ++$i) {     $r = $a->offsetGet($i); }

//  Sequential access for ($i = $size-1; $i >= 0; --$i) {     $a->pop(); }

Initialise: 0.0000 sSet 100,000 Entries: 0.1514 sRead 100,000 Entries: 22.7068 sPop 100,000 Entries: 0.1465 sTotal Time: 23.0047 s

Memory: 133.67 kPeak Memory: 5603.09 k

Doubly Linked Lists

• Fast for sequential access• Lower memory usage• Traversable in both directions• Size limited only by memory

• Slow for random/indexed access• Insert into middle of list only available from PHP 5.5.0

SPL DataStructures

Dictionary DataStructures (Maps)Linear DataStructures• Doubly-Linked Lists• Stacks• Queues

Tree DataStructures

Stacks

Stacks

• Implemented as a Doubly-Linked List• LIFO

• Last-In• First-Out

• Essential Operations• push()• pop()

• Optional Operations• count()• isEmpty()• peek()

Stack – Uses

• Undo mechanism (e.g. In text editors)• Backtracking (e.g. Finding a route through a maze)• Call Handler (e.g. Defining return location for nested calls)• Shunting Yard Algorithm (e.g. Converting Infix to Postfix notation)• Evaluating a Postfix Expression• Depth-First Search

Stacks – Big-O Complexity

• Push an element O(1)• Pop an element O(1)

Stacksclass StandardArrayStack {

    private $_stack = array();

    public function count() {         return count($this->_stack);     }

    public function push($data) {         $this->_stack[] = $data;     }

    public function pop() {         if (count($this->_stack) > 0) {             return array_pop($this->_stack);         }         return NULL;     }

    function isEmpty() {         return count($this->_stack) == 0;     }

}

Stacks

$a = new \StandardArrayStack();

for ($i = 1; $i <= $size; ++$i) {     $a->push($i); }

while (!$a->isEmpty()) {     $i = $a->pop(); }

PUSH 100,000 ENTRIESPush Time: 0.5818 sCurrent Memory: 8.75

POP 100,000 ENTRIESPop Time: 1.6657 sCurrent Memory: 2.25

Total Time: 2.2488 sCurrent Memory: 2.25Peak Memory: 8.75

Stacksclass StandardArrayStack {

    private $_stack = array();

    private $_count = 0;

    public function count() {         return $this->_count;     }

    public function push($data) {         ++$this->_count;         $this->_stack[] = $data;     }

    public function pop() {         if ($this->_count > 0) {             --$this->_count;             return array_pop($this->_stack);         }         return NULL;     }

    function isEmpty() {         return $this->_count == 0;     }

}

Stacks

$a = new \StandardArrayStack();

for ($i = 1; $i <= $size; ++$i) {     $a->push($i); }

while (!$a->isEmpty()) {     $i = $a->pop(); }

PUSH 100,000 ENTRIESPush Time: 0.5699 sCurrent Memory: 8.75

POP 100,000 ENTRIESPop Time: 1.1005 sCurrent Memory: 1.75

Total Time: 1.6713 sCurrent Memory: 1.75Peak Memory: 8.75

Stacks

$a = new \SPLStack();

for ($i = 1; $i <= $size; ++$i) {     $a->push($i); }

while (!$a->isEmpty()) {     $i = $a->pop(); }

PUSH 100,000 ENTRIESPush Time: 0.4301 sCurrent Memory: 5.50

POP 100,000 ENTRIESPop Time: 0.6413 sCurrent Memory: 0.75

Total Time: 1.0723 sCurrent Memory: 0.75Peak Memory: 5.50

Stacks

StandardArrayStack StandardArrayStack2 SPLStack0.0000

0.0200

0.0400

0.0600

0.0800

0.1000

0.1200

0.1400

0

1

2

3

4

5

6

7

8

9

10

0.0796 0.0782

0.0644

0.1244

0.0998

0.0693

8.75 8.75

5.50

Stack Timings

Push Time (s)Pop Time (s)Memory after Push (MB)

Tim

e (s

econ

ds)

Mem

ory

(MB)

Stacks – Gotchas• Peek (view an entry from the middle of the stack)

• StandardArrayStackpublic function peek($n = 0) {     if ((count($this->_stack) - $n) < 0) {         return NULL;     }     return $this->_stack[count($this->_stack) - $n - 1]; }

• StandardArrayStack2public function peek($n = 0) {     if (($this->_count - $n) < 0) {         return NULL;     }     return $this->_stack[$this->_count - $n - 1]; }

• SPLStack$r = $a->offsetGet($n);

Stacks – Gotchas

StandardArrayStack StandardArrayStack2 SPLStack0.0000

0.0200

0.0400

0.0600

0.0800

0.1000

0.1200

0.1400

0.1600

0.1800

0.00

0.20

0.40

0.60

0.80

1.00

1.20

0.0075 0.0077 0.00640.0111

0.0078

0.1627

0.0124 0.00980.0066

1.00 1.00

0.75

Stack Timings

Push Time (s)Peek Time (s)Pop Time (s)Memory after Push (MB)

Tim

e (s

econ

ds)

Mem

ory

(MB)

Stacks – Gotchas

• PeekWhen looking through the stack, SPLStack has to follow each link in the “chain” until it finds the nth entry

SPL DataStructures

Dictionary DataStructures (Maps)Linear DataStructures• Doubly-Linked Lists• Stacks• Queues

Tree DataStructures

Queues

Queues

• Implemented as a Doubly-Linked List• FIFO

• First-In• First-Out

• Essential Operations• enqueue()• dequeue()

• Optional Operations• count()• isEmpty()• peek()

Queues – Uses

• Job/print/message submissions• Breadth-First Search• Request handling (e.g. a Web server)

Queues – Big-O Complexity

• Enqueue an element O(1)• Dequeue an element O(1)

Queuesclass StandardArrayQueue {

    private $_queue = array();

    private $_count = 0;

    public function count() {         return $this->_count;     }

    public function enqueue($data) {         ++$this->_count;         $this->_queue[] = $data;     }

    public function dequeue() {         if ($this->_count > 0) {             --$this->_count;             return array_shift($this->_queue);         }         return NULL;     }

    function isEmpty() {         return $this->_count == 0;     }

}

Queues

$a = new \StandardArrayQueue();

for ($i = 1; $i <= $size; ++$i) {     $a->enqueue($i); }

while (!$a->isEmpty()) {     $i = $a->dequeue(); }

ENQUEUE 100,000 ENTRIESEnqueue Time: 0.6884Current Memory: 8.75

DEQUEUE 100,000 ENTRIESDequeue Time: 335.8434Current Memory: 1.75

Total Time: 336.5330Current Memory: 1.75Peak Memory: 8.75

Queues

$a = new \SPLQueue();

for ($i = 1; $i <= $size; ++$i) {     $a->enqueue($i); }

while (!$a->isEmpty()) {     $i = $a->dequeue(); }

ENQUEUE 100,000 ENTRIESEnqueue Time: 0.4087Current Memory: 5.50

DEQUEUE 100,000 ENTRIESDequeue Time: 0.6148Current Memory: 0.75

Total Time: 1.0249Current Memory: 0.75Peak Memory: 5.50

Queues

StandardArrayQueue StandardArrayQueue2 SPLQueue0.0000

0.1000

0.2000

0.3000

0.4000

0.5000

0.6000

0.7000

0.00

0.20

0.40

0.60

0.80

1.00

1.20

0.0075 0.0080 0.00640.0087 0.0070

0.1582

0.6284 0.6277

0.0066

1.00 1.00

0.75

Queue Timings

Enqueue Time (s)Peek Time (s)Dequeue Time (s)Memory after Enqueue (MB)

Tim

e (s

econ

ds)

Mem

ory

(MB)

Queues – Gotchas

• DequeueIn standard PHP enumerated arrays, shift() and unshift() are expensive operations because they re-index the entire arrayThis problem does not apply to SPLQueue

• PeekWhen looking through the queue, SPLQueue has to follow each link in the “chain” until it finds the nth entry

SPL DataStructures

Dictionary DataStructures (Maps)Linear DataStructuresTree DataStructures• Heaps

Heaps

Heaps

• Ordered Lists• Random Input• Ordered Output

• Implemented as a binary tree structure• Essential Operations

• Insert• Extract• Ordering Rule

• Abstract that requires extending with the implementation of a compare() algorithm• compare() is reversed in comparison with usort compare callbacks

• Partially sorted on data entry

Heaps – Uses

• Heap sort• Selection algorithms (e.g. Max, Min, Median)• Graph algorithms• Prim’s Minimal Spanning Tree (connected weighted undirected graph)• Dijkstra’s Shortest Path (network or traffic routing)

• Priority Queues

Heaps – Big-O Complexity

• Insert an element O(log n)• Delete an element O(log n)• Access root element O(1)

Heaps

Heapsclass ExtendedSPLHeap extends \SPLHeap {

    protected function compare($a, $b) {        if ($a->latitude == $b->latitude) {            return 0;        }        return ($a->latitude < $b->latitude) ? -1 : 1;    }

}

$citiesHeap = new \ExtendedSPLHeap();

$file = new \SplFileObject("cities.csv"); $file->setFlags(     \SplFileObject::DROP_NEW_LINE |      \SplFileObject::SKIP_EMPTY );

while (!$file->eof()) {     $cityData = $file->fgetcsv();     if ($cityData !== NULL) {         $city = new \StdClass;         $city->name = $cityData[0];         $city->latitude = $cityData[1];         $city->longitude = $cityData[2];

        $citiesHeap->insert($city);     } }

Heaps

echo 'There are ', $citiesHeap->count(),       ' cities in the heap', PHP_EOL;

echo 'FROM NORTH TO SOUTH', PHP_EOL; foreach($citiesHeap as $city) {     echo sprintf(         "%-20s %+3.4f  %+3.4f" . PHP_EOL,         $city->name,         $city->latitude,         $city->longitude     ); }

echo 'There are ', $citiesHeap->count(),       ' cities in the heap', PHP_EOL;

Heaps

echo 'There are ', $citiesHeap->count(),       ' cities in the heap', PHP_EOL;

echo 'FROM NORTH TO SOUTH', PHP_EOL; foreach($citiesHeap as $city) {     echo sprintf(         "%-20s %+3.4f  %+3.4f" . PHP_EOL,         $city->name,         $city->latitude,         $city->longitude     ); }

echo 'There are ', $citiesHeap->count(),       ' cities in the heap', PHP_EOL;

There are 69 cities in the heap

FROM NORTH TO SOUTH

Inverness +57.4717 -4.2254

Aberdeen +57.1500 -2.1000

Dundee +56.4500 -2.9833

Perth +56.3954 -3.4353

Stirling +56.1172 -3.9397

Edinburgh +55.9500 -3.2200

Glasgow +55.8700 -4.2700

Derry +54.9966 -7.3086

Newcastle upon Tyne +54.9833 -1.5833

Carlisle +54.8962 -2.9316

Sunderland +54.8717 -1.4581

Durham +54.7771 -1.5607

Belfast +54.6000 -5.9167

Lisburn +54.5097 -6.0374

Armagh +54.2940 -6.6659

Newry +54.1781 -6.3357

Ripon +54.1381 -1.5223

Heapsclass ExtendedSPLHeap extends \SPLHeap {

    const NORTH_TO_SOUTH = 'north_to_south';     const SOUTH_TO_NORTH = 'south_to_north';     const EAST_TO_WEST = 'east_to_west';     const WEST_TO_EAST = 'west_to_east';

    protected $_sortSequence = self::NORTH_TO_SOUTH;

    protected function compare($a, $b) {         switch($this->_sortSequence) {             case self::NORTH_TO_SOUTH :                 if ($a->latitude == $b->latitude)                     return 0;                 return ($a->latitude < $b->latitude) ? -1 : 1;             case self::SOUTH_TO_NORTH :                 if ($a->latitude == $b->latitude)                     return 0;                 return ($b->latitude < $a->latitude) ? -1 : 1;             case self::EAST_TO_WEST :                 if ($a->longitude == $b->longitude)                     return 0;                 return ($a->longitude < $b->longitude) ? -1 : 1;             case self::WEST_TO_EAST :                 if ($a->longitude == $b->longitude)                     return 0;                 return ($b->longitude < $a->longitude) ? -1 : 1;         }     }

    public function setSortSequence(         $sequence = self::NORTH_TO_SOUTH     ) {         $this->_sortSequence = $sequence;     } }

$sortSequence = \ExtendedSPLHeap::WEST_TO_EAST; $citiesHeap = new \ExtendedSPLHeap(); $citiesHeap->setSortSequence($sortSequence);

$file = new \SplFileObject("cities.csv"); $file->setFlags(     \SplFileObject::DROP_NEW_LINE |      \SplFileObject::SKIP_EMPTY );

while (!$file->eof()) {     $cityData = $file->fgetcsv();     if ($cityData !== NULL) {         $city = new \StdClass;         $city->name = $cityData[0];         $city->latitude = $cityData[1];         $city->longitude = $cityData[2];

        $citiesHeap->insert($city);     } }

Heapsclass ExtendedSPLHeap extends \SPLHeap {

    protected $_longitude = 0;     protected $_latitude = 0;

    protected function compare($a, $b) {         if ($a->distance == $b->distance)             return 0;         return ($a->distance > $b->distance) ? -1 : 1;

    }

    public function setLongitude($longitude) {         $this->_longitude = $longitude;     }

    public function setLatitude($latitude) {         $this->_latitude = $latitude;     }

…..

    public function insert($value) {         $value->distance =              $this->_calculateDistance($value);         parent::insert($value);     } }

$citiesHeap = new \ExtendedSPLHeap(); // Latitude and Longitude for Brighton $citiesHeap->setLatitude(50.8300); $citiesHeap->setLongitude(-0.1556);

$file = new \SplFileObject("cities.csv"); $file->setFlags(     \SplFileObject::DROP_NEW_LINE |      \SplFileObject::SKIP_EMPTY );

while (!$file->eof()) {     $cityData = $file->fgetcsv();     if ($cityData !== NULL) {         $city = new \StdClass;         $city->name = $cityData[0];         $city->latitude = $cityData[1];         $city->longitude = $cityData[2];

        $citiesHeap->insert($city);     } }

Heaps – Gotchas

• Compare method is reversed logic from a usort() callback• Traversing the heap removes elements from the heap

SPL – Standard PHP Library

E-BookMastering the SPL LibraryJoshua ThijssenAvailable in PDF, ePub, Mobi

http://www.phparch.com/books/mastering-the-spl-library/

SPL DataStructures

?Questions

top related