february 12, 2007 walcom '2007 1/22 disktrie: an efficient data structure using flash memory...

22
February 12, 2007 WALCOM '2007 1/22 DiskTrie: An Efficient Data Structure Using Flash Memory for Mobile Devices N. M. Mosharaf Kabir Chowdhury Md. Mostofa Akbar M. Kaykobad

Upload: angelica-trousdale

Post on 01-Apr-2015

219 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: February 12, 2007 WALCOM '2007 1/22 DiskTrie: An Efficient Data Structure Using Flash Memory for Mobile Devices N. M. Mosharaf Kabir Chowdhury Md. Mostofa

February 12, 2007 WALCOM '2007 1/22

DiskTrie: An Efficient Data Structure Using Flash Memory for Mobile Devices

N. M. Mosharaf Kabir Chowdhury

Md. Mostofa Akbar

M. Kaykobad

Page 2: February 12, 2007 WALCOM '2007 1/22 DiskTrie: An Efficient Data Structure Using Flash Memory for Mobile Devices N. M. Mosharaf Kabir Chowdhury Md. Mostofa

February 12, 2007 WALCOM '2007 2/22

Outline

Problem statement Current status and motivation for a new

solution Preliminaries DiskTrie Idea Results Limitations Future directions

Page 3: February 12, 2007 WALCOM '2007 1/22 DiskTrie: An Efficient Data Structure Using Flash Memory for Mobile Devices N. M. Mosharaf Kabir Chowdhury Md. Mostofa

February 12, 2007 WALCOM '2007 3/22

Problem Statement

Let S be a static set of n unique finite strings with the following operations:

Lookup (str) – check if the string str belong to the set S Prefix-Matching (P) – find all the elements in S that have

the same prefix P

The Problem: An efficient data structure that can operate in low-spec mobile devices and supports this definition

Page 4: February 12, 2007 WALCOM '2007 1/22 DiskTrie: An Efficient Data Structure Using Flash Memory for Mobile Devices N. M. Mosharaf Kabir Chowdhury Md. Mostofa

February 12, 2007 WALCOM '2007 4/22

Current Status

At present, use of mobile devices and different sensor networks is increasing rapidly

Mobile devices and embedded systems are characterized by – Low processing power Low memory (both internal and external) Low power consumption

Data structures and algorithms addressing these devices has huge application

Page 5: February 12, 2007 WALCOM '2007 1/22 DiskTrie: An Efficient Data Structure Using Flash Memory for Mobile Devices N. M. Mosharaf Kabir Chowdhury Md. Mostofa

February 12, 2007 WALCOM '2007 5/22

Motivation for a New Solution

Use of external memory is necessary

Popular external memory data structures for computer include String B-tree, Hierarchy of indexes etc.

The problem is still not very well discussed in case of flash memory (Gal and Toledo)

Looking for a more space-efficient (both internal and external) data structure that is still competitive in terms of time efficiency

Page 6: February 12, 2007 WALCOM '2007 1/22 DiskTrie: An Efficient Data Structure Using Flash Memory for Mobile Devices N. M. Mosharaf Kabir Chowdhury Md. Mostofa

February 12, 2007 WALCOM '2007 6/22

Flash Memory

Common memory that is extensively used in mobile/handheld devices

Unique read/write/erase behavior than other programmable memories

NOR flash memory supports random access and provides byte level addressing

NAND flash memory is faster and provides block level access

Page 7: February 12, 2007 WALCOM '2007 1/22 DiskTrie: An Efficient Data Structure Using Flash Memory for Mobile Devices N. M. Mosharaf Kabir Chowdhury Md. Mostofa

February 12, 2007 WALCOM '2007 7/22

Trie

A trivial trie is an m-ary tree

Keys are stored in the leaf level; each unique path from the root to a leaf corresponds to a unique key

Its search time can be considered as O(1)

Inner Nodes

Leaf Nodes

AG T

C

Page 8: February 12, 2007 WALCOM '2007 1/22 DiskTrie: An Efficient Data Structure Using Flash Memory for Mobile Devices N. M. Mosharaf Kabir Chowdhury Md. Mostofa

February 12, 2007 WALCOM '2007 8/22

Binary Trie and Path Compression

11011

100

10

Binary encoding ensures every node to have a maximum degree of two

Depth of the trie increases

Path-compression is used to reduce this

Binary Trie Path-compressed Binary Trie

Page 9: February 12, 2007 WALCOM '2007 1/22 DiskTrie: An Efficient Data Structure Using Flash Memory for Mobile Devices N. M. Mosharaf Kabir Chowdhury Md. Mostofa

February 12, 2007 WALCOM '2007 9/22

Patricia Trie & LPC Trie

12

2

1

Patricia trie is similar to path-compressed one but needs less memory

Finally, level and path-compressed trie reduces the depth but the trie itself does not remain binary anymore

Nilsson and Tikkanen has shown that an LPC trie has expected average depth of Θ(log*n)

Patricia Trie Level and Path-compressed Trie

2

1 12

Page 10: February 12, 2007 WALCOM '2007 1/22 DiskTrie: An Efficient Data Structure Using Flash Memory for Mobile Devices N. M. Mosharaf Kabir Chowdhury Md. Mostofa

February 12, 2007 WALCOM '2007 10/22

DiskTrie Idea

Static external memory implementation of the LPC-trie

Pre-build the trie in a computer and then transfer it to flash memory

Three distinct phases – Creation in computer Placement in flash memory Retrieval

Page 11: February 12, 2007 WALCOM '2007 1/22 DiskTrie: An Efficient Data Structure Using Flash Memory for Mobile Devices N. M. Mosharaf Kabir Chowdhury Md. Mostofa

February 12, 2007 WALCOM '2007 11/22

Creation and Placement

All the strings are lexicographically sorted and placed contiguously in flash memory

Nodes of the DiskTrie are placed separately from the strings and leaf nodes contain pointers to actual strings they represent

Page boundaries are always maintained in case of NAND memory

All the child nodes of a parent node are placed in sequence to reduce the number of pointers

Page 12: February 12, 2007 WALCOM '2007 1/22 DiskTrie: An Efficient Data Structure Using Flash Memory for Mobile Devices N. M. Mosharaf Kabir Chowdhury Md. Mostofa

February 12, 2007 WALCOM '2007 12/22

Retrieval

Deals with two types of operations: Lookup Prefix-Matching

Lookup starts from the root and proceeds until the search string is exhausted

Each time a single node is retrieved from the disk in case of NOR flash memory and a whole block for NAND type

Page 13: February 12, 2007 WALCOM '2007 1/22 DiskTrie: An Efficient Data Structure Using Flash Memory for Mobile Devices N. M. Mosharaf Kabir Chowdhury Md. Mostofa

February 12, 2007 WALCOM '2007 13/22

Lookup Algorithm

procedure Lookup (str){- currentNode ← root- while ( str is not exhausted & currentNode is NOT a

leaf node)- Select childNode using str- currentNode ← childNode

- end while

- if ( error )- return false

- end if

- return CompareStrings (str, currentNode→str)}

Page 14: February 12, 2007 WALCOM '2007 1/22 DiskTrie: An Efficient Data Structure Using Flash Memory for Mobile Devices N. M. Mosharaf Kabir Chowdhury Md. Mostofa

February 12, 2007 WALCOM '2007 14/22

Retrieval (Cont.)

For Prefix-Matching operation, the searching takes place in two phases: Identification of a prospective leaf node to find the

longest common prefix Identification of the sub-trie or tries that contain

the results

Page 15: February 12, 2007 WALCOM '2007 1/22 DiskTrie: An Efficient Data Structure Using Flash Memory for Mobile Devices N. M. Mosharaf Kabir Chowdhury Md. Mostofa

February 12, 2007 WALCOM '2007 15/22

Illustration of the Prefix-Matching Operation

X

root

P ends here X

root

X0 X7X6X5X4X3X2X1

P ends here

(a) ‘P’ ends in a node (b) ‘P’ ends in an arc

Page 16: February 12, 2007 WALCOM '2007 1/22 DiskTrie: An Efficient Data Structure Using Flash Memory for Mobile Devices N. M. Mosharaf Kabir Chowdhury Md. Mostofa

February 12, 2007 WALCOM '2007 16/22

Prefix-Matching Algorithm

procedure Prefix-Matching (P){- currentNode ← root- while ( P is not exhausted & currentNode is NOT a leaf

node)- Select childNode using str- currentNode ← childNode

- end while

- if ( error )- return NULL

- end if

- lNode ← left-most node in the probable region- rNode ← right-most node in the probable region

- return all strings in the range}

Page 17: February 12, 2007 WALCOM '2007 1/22 DiskTrie: An Efficient Data Structure Using Flash Memory for Mobile Devices N. M. Mosharaf Kabir Chowdhury Md. Mostofa

February 12, 2007 WALCOM '2007 17/22

Results- Storage Requirement DiskTrie needs two sets of components to be stored in the external

memory: Actual Strings, and The data structure itself

Linear storage space to store all the key strings

A Patricia trie holding n strings has (2n – 1) nodes

Hence, storage requirement for the total data structure is also linear

While storing the nodes, block boundaries must be maintained. It results in some wastage

Page 18: February 12, 2007 WALCOM '2007 1/22 DiskTrie: An Efficient Data Structure Using Flash Memory for Mobile Devices N. M. Mosharaf Kabir Chowdhury Md. Mostofa

February 12, 2007 WALCOM '2007 18/22

Results (Cont.)- Complexity of the Operations Lookup

Fetch only those nodes from the disk that are on the path to the goal node

The number of disk accesses is bounded by the depth of the trie, which is in turn Θ(log*n).

log*n is the iterative logarithm function and defined as, log*1 = 0 log*n = 1 + log*(ceil (log n)); for n > 1

Minimal internal memory required

Page 19: February 12, 2007 WALCOM '2007 1/22 DiskTrie: An Efficient Data Structure Using Flash Memory for Mobile Devices N. M. Mosharaf Kabir Chowdhury Md. Mostofa

February 12, 2007 WALCOM '2007 19/22

Results (Cont.)

Prefix-Matching Probable range of the strings starting with the same prefix

is identified using methods similar to Lookup. It takes Θ(log*n) disk accesses

In case of a successful search, it takes O(n/B) more disk accesses to retrieve the resultant strings if NAND memory is used (B is the block read size)

Sorted placement of the strings saves a lot of string comparisons

Internal memory requirement is minimal

Page 20: February 12, 2007 WALCOM '2007 1/22 DiskTrie: An Efficient Data Structure Using Flash Memory for Mobile Devices N. M. Mosharaf Kabir Chowdhury Md. Mostofa

February 12, 2007 WALCOM '2007 20/22

Limitations

Wastage of space in each disk block while storing the DiskTrie nodes

In some cases, same disk blocks are accessed more than once

Page 21: February 12, 2007 WALCOM '2007 1/22 DiskTrie: An Efficient Data Structure Using Flash Memory for Mobile Devices N. M. Mosharaf Kabir Chowdhury Md. Mostofa

February 12, 2007 WALCOM '2007 21/22

Future Directions

More efficient storage management, specially removing the inherent wastage to maintain boundary property

Take advantage of spatial locality

Page 22: February 12, 2007 WALCOM '2007 1/22 DiskTrie: An Efficient Data Structure Using Flash Memory for Mobile Devices N. M. Mosharaf Kabir Chowdhury Md. Mostofa

February 12, 2007 WALCOM '2007 22/22