unit 06 : index and distributed caching comp 5323 web database technologies and applications 2014
TRANSCRIPT
![Page 1: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/1.jpg)
Unit 06 : Index and Distributed Caching
COMP 5323Web Database Technologies and
Applications 2014
![Page 2: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/2.jpg)
• This PowerPoint is prepared for educational purpose and is strictly used in the classroom lecturing.
• We have adopted the "Fair Use" doctrine in this PowerPoint which allows limited copying of copyrighted works for educational and research purposes.
Doctrine of Fair Use
![Page 3: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/3.jpg)
Learning Objectives
• Understand different path index techniques to improve the performance.
• Learn a distributed memory caching system which improves the performance of web database applications
![Page 4: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/4.jpg)
Outline
1.Index for Semi-structured data2.Distributed Caching
![Page 5: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/5.jpg)
1 Index for Semi-structured Data
![Page 6: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/6.jpg)
6
Why is Indexing Needed?
• Allows fast access to data by replicating portions of the data in special purpose structures.
• Despite the additional cost (storage, maintenance and complexity) they have shown to be useful in evaluating queries.
![Page 7: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/7.jpg)
Index Types
• Structural index– Accessing all elements of given name– Ancestor-descendant and parent-child
relationship between elements
• Content index– Accessing elements containing given
keywords– Supporting most text search functionalities
![Page 8: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/8.jpg)
Classical Content Index
• Classically based on inverted lists – For each term, gives the
doc.ID + localization• Several variations allows
different search types– Offset, Relative, Proximity
• Generally stored in a B+-Tree to optimize search for a given word
• Size is an important issue– Memory and Disk
• (word, localization)– Fixed entry (word
repeated)
• (word, Frequency, (localization)*)– Variable length entry
Words Localization
- t1 : doc1-100, doc1-300, doc3-200, …
- t2 : doc2-30, doc4-70, …
- t3 : doc4-87, doc5-754, …
Short Reference: http://www.igi-global.com/dictionary/inverted-index/15654
![Page 9: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/9.jpg)
9
Problem with XML
• Support of element addressing– Doc.ID should include
NodeId (Xpath) + Offset• Index size becomes very
large– XPath are long
• Support of typed data– Integer, float, simple types
of XML schema– Requires classical indexes
for certain elements
• Query processing– Structural joins– Text search– Exact search
• Support of updates– Incremental updates
would be a plus
![Page 10: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/10.jpg)
Path-based approach
• Represent XML document into tree or graph structure
• Index XML document directly– Without the support of DTD
• Mainly use the memory as the index storage• Properties– Keep the structural information to improve query
performance– Easy to support query with regular path expression
![Page 11: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/11.jpg)
Different Approaches• Patricia Trie
– Cooper et al. 2001 • DataGuides
– J. McHugh et al., 1997• T-Index
– Tova Milo and Dan Sucin• APEX (Adaptive Path Index for XML Data)
– C. W. Chung et al., 2002• Dewey Structure• K-ary Table• Path Table• OrdPath
![Page 12: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/12.jpg)
Partricia Tries
• A compact representation of a trie in which any node that is an only child is merged with its parent.
• Also known as radix tree
![Page 13: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/13.jpg)
Partricia Tries
• Cooper et al. 2001 • Idea:– Partitioned Partricia Tries to index strings– Encode XPath expressions as strings
(encode names, encode atomic values)
<book> <author>Whoever</author> <author>Not me</author> <title>No Kidding</title></book>
B A 1 WhoeverB A 2 Not meB T No Kidding
![Page 14: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/14.jpg)
DataGuides
• World-Wide Web demonstrates that much of the information available online is semistructured.
• Graph-based data model called OEM, for Object Exchange Model
![Page 15: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/15.jpg)
A sample OEM database
![Page 16: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/16.jpg)
Lore Language (Lorel Query)
• Select Restaurant.Entree – returns all entrees served by any restaurant,
the set of objects {6, 10, 11}
• Select Restaurant.Name• where Restaurant.Entree = “Burger”– The answer to the query is the single object 5.
![Page 17: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/17.jpg)
DataGuide
![Page 18: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/18.jpg)
T-index
![Page 19: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/19.jpg)
1-Index
1-Index DataGuides
![Page 20: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/20.jpg)
2-Index
![Page 21: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/21.jpg)
APEX
![Page 22: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/22.jpg)
Representation of XML Data Structure
![Page 23: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/23.jpg)
DataGuide
![Page 24: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/24.jpg)
APEX
HAPEX GAPEX
![Page 25: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/25.jpg)
Dewey - Structure• Each node is assigned a label that represents the path from
the document’s root to the node.• Each component of the label represents the local order of an
ancestor node.• Nodes with the same number of delimiters (“.”) in their label
are in the same level.Bib
book paper
paperauthor
Tim Sarah
author
(0)(0)
(0.0)(0.0)
(0.0.0)(0.0.0)
(0.0.0.0)(0.0.0.0)
(0.1)(0.1)
(0.2)(0.2)
(0.2.0)(0.2.0)
(0.2.0.0)(0.2.0.0)
![Page 26: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/26.jpg)
Dewey – Supported Queries (1/3)
• Ancestors / Descendants– Node “X” is an ancestor of node “Y” if the label of
node “X” is a substring of the label of node “Y”.
Bib
book paper
paperauthor
Tim Sarah
author
(0)(0)
(0.0)(0.0)
(0.0.0)(0.0.0)
(0.0.0.0)(0.0.0.0)
(0.1)(0.1)
(0.2)(0.2)
(0.2.0)(0.2.0)
(0.2.0.0)(0.2.0.0)
![Page 27: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/27.jpg)
Dewey – Supported Queries (2/3)• Parent / Child– Node “X” is parent of node “Y” if:
- The label of node “X” is a substring of the label of node “Y”
- And frags(X) = frags(Y) – 1, where frags(X) is the number of delimiters of the label of node X and frags(Y) is the number of delimiters of label of node Y.
Bib
book paper
paperauthor
Tim Sarah
author
(0)(0)
(0.0)(0.0)
(0.0.0)(0.0.0)
(0.0.0.0)(0.0.0.0)
(0.1)(0.1)
(0.2)(0.2)
(0.2.0)(0.2.0)
(0.2.0.0)(0.2.0.0)
![Page 28: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/28.jpg)
Dewey – Supported Queries (3/3)• Siblings– Nodes “X” and “Y” are siblings if:
- They have the same number of delimiters in their labels - And X.prefix = Y.prefix, where prefix is the label of the
node without its positional identifier
Bib
book paper
paperauthor
Tim Sarah
author
(0)(0)
(0.0)(0.0)
(0.0.0)(0.0.0)
(0.0.0.0)(0.0.0.0)
(0.1)(0.1)
(0.2)(0.2)
(0.2.0)(0.2.0)
(0.2.0.0)(0.2.0.0)
![Page 29: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/29.jpg)
Dewey – Updates• Insertion of new node– The label of the nodes in the subtree rooted at the
following sibling need to be updated– O(n) nodes need relabeling, where n is the number of
nodes of the XML fileBib
book paper
paperauthor
TimSarah
author
(0)(0)
(0.0)(0.0)
(0.0.0)(0.0.0)
(0.0.0.0)(0.0.0.0)
(0.1)(0.1)
(0.2)(0.2)
(0.2.0)(0.2.0)
(0.2.0.0)(0.2.0.0)
paper(0.2)(0.2)
(0.3)(0.3)
(0.3.0)(0.3.0)
(0.3.0.0)(0.3.0.0)
![Page 30: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/30.jpg)
Dewey• Not efficient for dynamic XML files with many updates– Need to re-label many nodes
• As the depth of the tree increases:– Label size of a node increases rapidly • Storage size increases rapidly
– It becomes more costly to infer the supported queries between any two nodes (the string prefix matching becomes longer)
• Overflow problem– The original fixed length of bits assigned to store the
size of the label is not enough.
![Page 31: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/31.jpg)
Document Tree
real node
virtual node
aa
3-ary tree
cc cc
dd ee ff gg
hh ii jj
ee ee ee ee ee
ee
ee
• Lee et al, ACM DL 1996.• Represent each document as a k-ary complete tree and assign a UID to each node
![Page 32: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/32.jpg)
K-ary table
• Each document is assigned k, which is the maximum number of siblings in the document tree.
• Each element has an entry (row) in the K-ary table
• When a query is issued, the result set has pointers to the K-ary table.
![Page 33: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/33.jpg)
Level and Element Type Number
• Level– Level means the level in the document tree– It gives a clue how many parent function is applied to
get to a target element• Element type number– A unique number is assigned to each element type in
DTD It enables to filter out unnecessary elements and accumulate the correct frequencies
• Element location– The unique position of an element instance in a
document tree
![Page 34: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/34.jpg)
Element Labeling
Document(1)
Para(5)
Abstract(3)
Chapter(4)
Section(6)
Para(7)
Title(2)
![Page 35: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/35.jpg)
UID
element UID element UID
D1 C1 C2 S1 S2
1 2 3 4 5
S3 S4 P1 P2 P3
8 9 14 15 16
Result of assigning UIDs
parent(i) = [(i-2)/k+1]
dd
3-ary tree
cc cc
ss ss ss ss
pp pp pp
ee ee ee ee ee
ee
• Unique element identifier
![Page 36: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/36.jpg)
36
XML Index Path Table (Oracle)
BaseRid Path OrderKey
Value Locator NumValue
Rid1 po
Rid1 po.data 1 11
Rid1 po.data.item
1.1 “foo” 17
Rid1 po.data.pkg
1.2 “123” 32 123
Rid1 po.data.item
1.3 “bar” 46
<po> <data> <item>foo</item> <pkg>123</pkg> <item>bar</item> </data></po>
Some Typos
![Page 37: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/37.jpg)
37
OrdPath
• ORDPATHs: Insert-Friendly XML Node Labels– Patrick O’Neil, Elizabeth O’Neil1, Shankar Pal,
Istvan Cseri, Gideon Schaller, Nigel Westbury– SIGMOD 2004– SQL Server 2005 implementation
![Page 38: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/38.jpg)
38
OrdPath
• Aims to provide efficient insertion at any position of an XML tree, and also supports extremely high performance query plans for native XML queries.
• Tree modifications– new may be inserted– sub-trees be deleted– sub-trees may be moved around within the
tree
![Page 39: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/39.jpg)
39
OrdPath• Encodes the parent-child relationship by extending the
parent’s ORDPATH label with a component for the child. – E.g.: 1.5.3.9 might be the parent ORDPATH, 1.5.3.9.1 the
child. • The various child components reflect the children’s relative
sibling order, so that byte-by-byte comparison of the ORDPATH labels of two nodes yields the proper document order.
• A new node (possibly a root node of a sub-tree) can be inserted under any designated parent node in an existing tree. – Its label is generated using an additional intermediate
“careting” component that falls between the components of its left and right siblings.
![Page 40: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/40.jpg)
40
OrdPath• At the beginning– Only positive, odd integers are assigned during an initial
load; even-numbered and negative integer component values are reserved for later insertions into an existing tree
• Inserting in the middle– Even numbers are used as carets only. Do not count as
components that increase the depth of the nodes.– E.g. new nodes in between 3.5.5 and 3.5.7
• New siblings: 3.5.6.1, 3.5.6.2, …• A subtree: 3.5.6.1, 3.5.6.1.1, 3.5.6.3, 3.5.6.3.1, 3.5.6.3.3,
3.5.6.3.3.1, 3.5.6.3.3.3, 3.5.6.3.5, 3.5.6.5, 3.5.6.5.1
![Page 41: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/41.jpg)
XML
![Page 42: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/42.jpg)
42
ORDPATH Label of Nodes
BOOK1
Section1.3
Figure1.3.5
Title1.3.1
Section1.5
Title1.5.1
Figure1.5.5
@ISBN1.1
CAPTION1.3.5.1
Nobody…1.3.3
tree frogs1.5.7
All right…1.5.3
![Page 43: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/43.jpg)
43
Infoset Table
![Page 44: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/44.jpg)
2 Memcached
![Page 45: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/45.jpg)
What is memcached briefly?
• memcached is a high-performance, distributed memory object caching system, generic in nature
• It is a key-based cache daemon that stores data and objects wherever dedicated or spare RAM is available for very quick access
• It is a dumb distributed hash table. It does not provide redundancy, failover or authentication. If needed the client has to handle that.
![Page 46: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/46.jpg)
Why was memcached made?
• It was originally developed by Danga Interactive to enhance the speed of LiveJournal.com
• It dropped the database load to almost nothing, yielding faster page load times for users, better resource utilization, and faster access to the databases on a memcache miss
• http://www.danga.com/memcached/
![Page 47: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/47.jpg)
47
Memcached
![Page 48: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/48.jpg)
Where does memcached reside?
• Memcache is not part of the database but sits outside it on the server(s).
• Over a pool of servers
![Page 49: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/49.jpg)
Architecture
![Page 50: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/50.jpg)
When should I use memcached?
• When your database is optimized to the hilt and you still need more out of it.– Lots of SELECTs are using resources that could be better
used elsewhere in the DB.– Locking issues keep coming up
• When table listings in the query cache are torn down so often it becomes useless
• To get maximum “scale out” of minimum hardware
![Page 51: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/51.jpg)
Hit-rate Management
• anything what is more expensive to fetch from elsewhere, and has sufficient hitrate, can be placed in memcached– How often will object or data be used?– How expensive is it to generate the data?– What is the expected hitrate?– Will the application invalidate the data itself, or will TTL be
used? – How much development work has to be done to embed it?
![Page 52: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/52.jpg)
Why use memcached?
• To reduce the load on the database by caching data BEFORE it hits the database
• Can be used for more then just holding database results (objects) and improve the entire application response time
• Feel the need for speed– Memcache is in RAM - much faster then hitting
the disk or the database
![Page 53: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/53.jpg)
Why not use memcached?
• Memcache is held in RAM. This is a finite resource.
• Adding complexity to a system just for complexities sake is a waste. If the system can respond within the requirements without it - leave it alone
![Page 54: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/54.jpg)
What are the limits of memcached?
• Keys can be no more then 250 characters• Stored data can not exceed 1M (largest typical
slab size)• There are generally no limits to the number of
nodes running memcache• There are generally no limits the the amount of
RAM used by memcache over all nodes– 32 bit machines do have a limit of 4GB though
![Page 55: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/55.jpg)
Platform
• You can build and install memcached from the source code directly, or you can use an existing operating system package or installation.– on a RedHat, Fedora or CentOS host, use yum:
• root-shell> yum install memcached– on a Debian or Ubuntu host, use apt-get:
• root-shell> apt-get install memcached– on a Gentoo host, use emerge:
• root-shell> emerge install memcached– on OpenSolaris, use the pkg for SUNWmemcached:
• root-shell> pkg install SUNWmemcached
![Page 56: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/56.jpg)
Port 11211
• Get the source from the website: http://www.danga.com/memcached/download.bml– Memcache has a dependancy on libevent so make
sure you have that also.
• Decompress, cd into the dir• ./configure;make;make install;• Memcached listens on port 11211 by default,
this can be changed with –p option.
![Page 57: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/57.jpg)
How do I start memcached?
• Memcached can be run as a non-root user if it will not be on a restricted port (<1024) - though the user can not have a memory limit restriction
• shell> memcached • Default configuration - Memory: 64MB, all
network interfaces, port:11211, max simultaneous connections: 1024
![Page 58: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/58.jpg)
Memcached options
• You can change the default configuration with various options.– -u <user> : run as user if started as root– -m <num> : maximum <num> MB memory to use for items
• If more then available RAM - will use swap• Don’t forget 4G limit on 32 bit machines
– -d : Run as a daemon– -l <ip_addr> : Listen on <ip_addr>; default to INDRR_ANY– -p <num> : port
![Page 59: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/59.jpg)
How can I connect to memcached?
• Memcached uses a protocol that many languages implement with an API.
• Languages that implement it:– Perl, PHP, Python, Ruby, Java, C#, C, Lua, Postgres,
MySQL, Chicken Scheme
• And yes - because it is a protocol you can even use telnet– shell> telnet localhost 11211
![Page 60: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/60.jpg)
Memcached protocol
• Three types of commands– Storage - ask the server to store some data
identified by a key• set, add, replace, append, prepend and cas
– Retrieval - ask the server to retrieve data corresponding to a set of keys• get, gets
![Page 61: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/61.jpg)
Memcached protocol (con’t)– All others that don’t involve unstructured data • Deletion:delete • Statistics: stats, • flush_all: always succeeds, invalidate all existing items
immediately (by default) or after the expiration specified.• version, verbosity, quit
![Page 62: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/62.jpg)
PHP and Memcached
• Make sure you have a working Apache/PHP install
• PHP has a memcached extension available through pecl.
• Installation:– shell> pecl install memcache
• Make sure the pear is installed (debian: apt-get install php-pear)
• Make sure that you also have php5-dev installed for phpize.– shell> apt-get install php5-dev
![Page 63: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/63.jpg)
PHP Script Example• Information about the PHP API at
http://www.php.net/memcache
<?php // make a memcache object $memcache = new Memcache; // connect to memcache $memcache->connect('localhost', 11211) or die ("Could not connect"); //get the memcache version $version = $memcache->getVersion(); echo "Server's version: ".$version."<br/>\n";
![Page 64: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/64.jpg)
PHP Script (con’t)// test data$tmp_object = new stdClass;$tmp_object->str_attr = 'test';$tmp_object->int_attr = 123;// set the test data in memcache$memcache->set('key', $tmp_object, false, 10) or die ("Failed to
save data at the server");echo "Store data in the cache (data will expire in 10 seconds)<br/>\
n";// get the data$get_result = $memcache->get('key');echo "Data from the cache:<br/>\n";echo ‘<pre>’, var_dump($get_result), ‘</pre>’;
MEMCACHE_COMPRESSED
![Page 65: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/65.jpg)
PHP Script (con’t)// modify the data$tmp_object->str_attr = ‘boo’;$memcache->replace(‘key’, $tmp_object, false, 10) or die(“Failed to
save new data to the server<br/>\n”);Echo “Stored data in the cache changed<br/>\n”;// get the new data$get_result = $memcache->get(‘key’);Echo “New data from the cache:<br/>\n”;Echo ‘<pre>’, var_dump($get_result), “</pre>\n”;// delete the data$memcache->delete(‘key’) or die(“Data not deleted<br/>\n”);
![Page 66: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/66.jpg)
MySQL’s memcached
• The API is consistent with the other API’s– Connect: mysql> SELECT
memc_servers_set('192.168.0.1:11211, 192.168.0.2:11211');• The list of servers used by the memcached UDFs is not
persistent over restarts of the MySQL server.– Set: mysql> SELECT memc_set('myid', 'myvalue');– Retreive: mysql> SELECT memc_get('myid');
![Page 67: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/67.jpg)
Possible ways to secure memcached
• It has no authentication system - so protection is important
• Run as a non-priveledged user to minimize potential damage
• Specify the ip address to listen on using -l– 127.0.0.1, 192.168.0.1, specific ip address
• Use a non-standard port• Use a firewall
![Page 68: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/68.jpg)
Memcached
![Page 69: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/69.jpg)
69
• Slab AllocationSlab Allocation: : When you start to store data into the cache, memcached does not allocate the memory for the data on an item by item basis. Instead, a slab allocation is used to optimize memory usage and prevent memory fragmentation when information expires from the cache.
• Lazy Expiration + LRULazy Expiration + LRU• Lazy ExpirationLazy Expiration: : When an item is requested (a get request)
Memcached checks the expiration time to see if the item is still valid before returning it to the client.
• LRU LRU ((least recently used): Memcached is LRU per slab class, : Memcached is LRU per slab class, but not globally LRU. but not globally LRU.
Memcached Memory Management
![Page 70: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/70.jpg)
70
Slab Allocation
![Page 71: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/71.jpg)
71
Memcached Distributed Architecture
////Obtain a server ID based on Obtain a server ID based on Key Key valuevalueint getServerId(char *key, int serverTotal) {int getServerId(char *key, int serverTotal) { int c, hash = 0;int c, hash = 0; while (c = *key++) {while (c = *key++) { hash += c;hash += c; }} return hash % serverTotal;return hash % serverTotal;}}
////a list of serversa list of serversnode[0] => 192.168.0.1:11211node[0] => 192.168.0.1:11211node[1] => 192.168.0.2:11211node[1] => 192.168.0.2:11211node[2] => 192.168.0.3:11211node[2] => 192.168.0.3:11211
////get id get id int id = getServerId("test", 3);int id = getServerId("test", 3);
////get ip address and port numberget ip address and port numbernode[id] == node[1]node[id] == node[1]
![Page 72: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/72.jpg)
Memcached Distributed Architecture
![Page 73: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/73.jpg)
SQL Server Cache
• SQL Server cache mechanism :- Query plans- pages from the database files
• but it does NOT cache:- exact results from a query
REFERENCEhttp://searchsqlserver.techtarget.com/tip/SQL-Server-memory-configurations-for-procedure-cache-and-buffer-cache
![Page 74: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/74.jpg)
Reference
• DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases
• Index Structure for Path Expressions• APEX: An Adaptive Path Index for XML Data
![Page 75: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/75.jpg)
Reference Only• https://blog.couchbase.com/memcached-144-
windows-32-bit-binary-now-available• Memcached should run on Linux. It may not
work on some windows
![Page 76: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/76.jpg)
Reference
![Page 77: Unit 06 : Index and Distributed Caching COMP 5323 Web Database Technologies and Applications 2014](https://reader033.vdocuments.site/reader033/viewer/2022051018/56649de55503460f94add77e/html5/thumbnails/77.jpg)
Reference
• 0: <flag> • 60: timeout• 6: length