tdx: a high-performance table-driven xml parser
DESCRIPTION
TDX: a High-Performance Table-Driven XML Parser. Wei Zhang Robert van Engelen. Department of C omputer Science Florida State University. Outline. Motivation Introduction Recent Work Table-Driven XML Parsing – TDX TDX Construction Toolkit Results and Preliminary Conclusion. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: TDX: a High-Performance Table-Driven XML Parser](https://reader036.vdocuments.site/reader036/viewer/2022081506/56812e02550346895d93641c/html5/thumbnails/1.jpg)
TDX: a High-Performance Table-Driven XML Parser
Wei Zhang
Robert van Engelen
Department of Computer Science
Florida State University
![Page 2: TDX: a High-Performance Table-Driven XML Parser](https://reader036.vdocuments.site/reader036/viewer/2022081506/56812e02550346895d93641c/html5/thumbnails/2.jpg)
2
Outline
Motivation Introduction Recent Work Table-Driven XML Parsing – TDX TDX Construction Toolkit Results and Preliminary Conclusion
![Page 3: TDX: a High-Performance Table-Driven XML Parser](https://reader036.vdocuments.site/reader036/viewer/2022081506/56812e02550346895d93641c/html5/thumbnails/3.jpg)
3
Motivation
Enhance performance for XML-based Web Services
Provide flexibility Offer high-level modularity
![Page 4: TDX: a High-Performance Table-Driven XML Parser](https://reader036.vdocuments.site/reader036/viewer/2022081506/56812e02550346895d93641c/html5/thumbnails/4.jpg)
4
Roadmap
Motivation Introduction Recent Work Table-Driven XML parsing – TDX TDX construction Tool Kit Experiment Results and Preliminary
Conclusion
![Page 5: TDX: a High-Performance Table-Driven XML Parser](https://reader036.vdocuments.site/reader036/viewer/2022081506/56812e02550346895d93641c/html5/thumbnails/5.jpg)
5
Introduction
Validating XML Parsing Three stages
• Well-formedsness• Validation• Data conversion
Frequent access to schema Separation introduces
overhead and requires frequent access to schema
well-formedness
data conversion
validation
XMLXML
application
![Page 6: TDX: a High-Performance Table-Driven XML Parser](https://reader036.vdocuments.site/reader036/viewer/2022081506/56812e02550346895d93641c/html5/thumbnails/6.jpg)
6
Introduction (cont’d) Schema-specific XML parsing (SSP)
Merging well-formedness and validation No requirement to frequent access to
schema Separation stage of data conversion in
implemented SSP
Well-formedness
Data Conversion
Validation
![Page 7: TDX: a High-Performance Table-Driven XML Parser](https://reader036.vdocuments.site/reader036/viewer/2022081506/56812e02550346895d93641c/html5/thumbnails/7.jpg)
7
Roadmap
Motivation Introduction Recent Work Table-Driven XML parsing – TDX TDX construction Tool Kit Experiment Results and Preliminary
Conclusion
![Page 8: TDX: a High-Performance Table-Driven XML Parser](https://reader036.vdocuments.site/reader036/viewer/2022081506/56812e02550346895d93641c/html5/thumbnails/8.jpg)
8
Recent Work
Chiu: “A compiler-based cpproach to schema-specific XML parsing” Merging parsing and validation by
constructing PDA No namespace support Conversion from NFA to DFA may result in
exponentially growing space requirement
![Page 9: TDX: a High-Performance Table-Driven XML Parser](https://reader036.vdocuments.site/reader036/viewer/2022081506/56812e02550346895d93641c/html5/thumbnails/9.jpg)
9
Recent Work(cont'd)
van Engelen: “Constructing finite automata for high-performance web services” Integrates parsing and validation into one
stage by parsing actions encoded by DFA Cannot process cyclic XML schema
![Page 10: TDX: a High-Performance Table-Driven XML Parser](https://reader036.vdocuments.site/reader036/viewer/2022081506/56812e02550346895d93641c/html5/thumbnails/10.jpg)
10
Recent Work(cont'd)
van Engelen: ”The gSOAP toolkit for web services and peer-to-peer Computing Networks ” Namespace support Merging parsing and validation Implementing a recursive-decent parsing Disadvantages of recursive-descent
• Code size and function calling overhead
![Page 11: TDX: a High-Performance Table-Driven XML Parser](https://reader036.vdocuments.site/reader036/viewer/2022081506/56812e02550346895d93641c/html5/thumbnails/11.jpg)
11
Roadmap
Motivation Introduction Recent Work Table-Driven XML parsing – TDX TDX construction Tool Kit Experiment Results and Preliminary
Conclusion
![Page 12: TDX: a High-Performance Table-Driven XML Parser](https://reader036.vdocuments.site/reader036/viewer/2022081506/56812e02550346895d93641c/html5/thumbnails/12.jpg)
12
Table-XML Parsing (TDX) LL(1) grammar can be derived from
schema XML documents can be parsed and
validated using LL(1) grammar Well-formedness (parsing) can be verified
through grammar rules Validation can be accomplished using
semantic actions Application-specific events can also be
encoded as semantic actions
![Page 13: TDX: a High-Performance Table-Driven XML Parser](https://reader036.vdocuments.site/reader036/viewer/2022081506/56812e02550346895d93641c/html5/thumbnails/13.jpg)
13
Illustrating Example<schema> <element name=“book” type=“bookType”> <complexType name=“bookType”> <sequence> <element name=“title” type=“string”> <element name=“author” type=“string”> </sequence> </complexType></schema>
LL(1) Grammar:s ‘<book>’ t ‘</book>’ t t1 t2
t1 ‘<title>’ DATA //imp_s(s.val) ‘</title>’
t2 ‘<author>’ DATA //imp_s(s.val) ‘</author>’
![Page 14: TDX: a High-Performance Table-Driven XML Parser](https://reader036.vdocuments.site/reader036/viewer/2022081506/56812e02550346895d93641c/html5/thumbnails/14.jpg)
14
Illustrating Example (cont'd)
<book>
<title>
XML Tech
</title>
<author>
Bob
</author>
</book>
s
(a) An XML Instance
t
t1 t
2
imp_s(“XML Tech”)
DATA
imp_s(“Bob”)
(b) Predictive Parsing
DATA
‘<book>’ ‘</book>’
‘<title>’ ‘</title>’‘<author>’ ‘<author>’
![Page 15: TDX: a High-Performance Table-Driven XML Parser](https://reader036.vdocuments.site/reader036/viewer/2022081506/56812e02550346895d93641c/html5/thumbnails/15.jpg)
15
Roadmap Recent Work Table-Driven XML parsing – TDX
Illustrating example Architecture Token generation Mapping schema to LL(1) Parsing table Parsing engine Scanner/tokenizer
TDX construction Tool Kit Experiment Results and Preliminary
Conclusion
![Page 16: TDX: a High-Performance Table-Driven XML Parser](https://reader036.vdocuments.site/reader036/viewer/2022081506/56812e02550346895d93641c/html5/thumbnails/16.jpg)
16
TDX - Architecture
<XML>TokenCDATA
Tokens
LL(1)Parsing Table
Ll(1) GrammarProductions and Actions
Events
Error: invalid
Modules
application
Scanner/Tokenizer
(DFA)
Parsing Engine(TDX)
![Page 17: TDX: a High-Performance Table-Driven XML Parser](https://reader036.vdocuments.site/reader036/viewer/2022081506/56812e02550346895d93641c/html5/thumbnails/17.jpg)
17
Roadmap Recent Work Table-Driven XML parsing – TDX
Illustrating example Architecture Token generation Mapping schema to LL(1) Parsing table Parsing engine Scanner/Tokenizer
TDX construction Tool Kit Experiment Results and Preliminary
Conclusion
![Page 18: TDX: a High-Performance Table-Driven XML Parser](https://reader036.vdocuments.site/reader036/viewer/2022081506/56812e02550346895d93641c/html5/thumbnails/18.jpg)
18
Token Generation Defined by
<namespace, tag>• Element name (opening and closing)• Attribute name
some data type• Such as Enumeration
Namespace binding Identical tag names under different namespaces are
represented as different tokens Normalized tokens
![Page 19: TDX: a High-Performance Table-Driven XML Parser](https://reader036.vdocuments.site/reader036/viewer/2022081506/56812e02550346895d93641c/html5/thumbnails/19.jpg)
19
Roadmap Recent Work Table-Driven XML parsing – TDX
Illustrating example Architecture Token generation Mapping schema to LL(1) Parsing table Parsing engine Scanner/Tokenizer
TDX construction Tool Kit Experiment Results and Preliminary
Conclusion
![Page 20: TDX: a High-Performance Table-Driven XML Parser](https://reader036.vdocuments.site/reader036/viewer/2022081506/56812e02550346895d93641c/html5/thumbnails/20.jpg)
20
Mapping Schema to LL(1) Grammar
Structural constraints are mapped to rules Validation constraints are mapped to
semantic actions Note that many types of validation constraints
are mapped to rules• Such as occurrence, enumeration
![Page 21: TDX: a High-Performance Table-Driven XML Parser](https://reader036.vdocuments.site/reader036/viewer/2022081506/56812e02550346895d93641c/html5/thumbnails/21.jpg)
21
Mapping Example(1)
<simpleType name=“state”> <restriction base=“string”> <enumeration value=“OFF”/> <enumeration value=“ON”/> </restriction> </simpleType>
state “OFF” | “ON”
<simpleType name=“value”> <restriction base="integer"> <minInclusive value="10"/> <maxInclusive value="250"/> </restriction></simpleType>
value DATA//imp_i(char *s)
![Page 22: TDX: a High-Performance Table-Driven XML Parser](https://reader036.vdocuments.site/reader036/viewer/2022081506/56812e02550346895d93641c/html5/thumbnails/22.jpg)
22
<complexType name=“example”> <choice> <element name=“id” type=“id_type” minOccurs=“0”/> <element name=“value” type=“value_type” minOccurs=“2”
maxOccurs=“unbounded”/> </choice></complexType>
Mapping Example(2)
c1 ‘<id>’ id_type ‘</id>’ example c1 | c2
c2 c’2 c’2 c’’2
<sequence> example c1 c2
c’2 ‘<value>’ value_type ‘</value>’
c1
c’’2 c’’2 c’2 c’’2
![Page 23: TDX: a High-Performance Table-Driven XML Parser](https://reader036.vdocuments.site/reader036/viewer/2022081506/56812e02550346895d93641c/html5/thumbnails/23.jpg)
23
Roadmap Recent Work Table-Driven XML parsing – TDX
Illustrating example Architecture Token generation Mapping schema to LL(1) Parsing table Parsing engine Scanner/Tokenizer
TDX construction Tool Kit Experiment Results and Preliminary
Conclusion
![Page 24: TDX: a High-Performance Table-Driven XML Parser](https://reader036.vdocuments.site/reader036/viewer/2022081506/56812e02550346895d93641c/html5/thumbnails/24.jpg)
24
LL(1) Parsing Table
Constructed from LL(1) grammar Indexed by nonterminals and terminals Contains either index of grammar
production or error entry
![Page 25: TDX: a High-Performance Table-Driven XML Parser](https://reader036.vdocuments.site/reader036/viewer/2022081506/56812e02550346895d93641c/html5/thumbnails/25.jpg)
25
Roadmap Recent Work Table-Driven XML parsing – TDX
Illustrating example Architecture Token generation Mapping schema to LL(1) Parsing table Parsing engine Scanner/Tokenizer
TDX construction Tool Kit Experiment Results and Preliminary
Conclusion
![Page 26: TDX: a High-Performance Table-Driven XML Parser](https://reader036.vdocuments.site/reader036/viewer/2022081506/56812e02550346895d93641c/html5/thumbnails/26.jpg)
26
Parsing Engine
Schema Independent Maintains
Parsing table Production table Action table Stack
![Page 27: TDX: a High-Performance Table-Driven XML Parser](https://reader036.vdocuments.site/reader036/viewer/2022081506/56812e02550346895d93641c/html5/thumbnails/27.jpg)
27
Roadmap Recent Work Table-Driven XML parsing – TDX
Illustrating example Architecture Token generation Mapping schema to LL(1) Parsing table Parsing engine Scanner/Tokenizer
TDX construction Tool Kit Experiment Results and Preliminary
Conclusion
![Page 28: TDX: a High-Performance Table-Driven XML Parser](https://reader036.vdocuments.site/reader036/viewer/2022081506/56812e02550346895d93641c/html5/thumbnails/28.jpg)
28
Scanner/Tokenizer Constructed from schema Schema provides DFA states
information Element name
• Has attribute? Attribute name
Root element needs special care Schema information
![Page 29: TDX: a High-Performance Table-Driven XML Parser](https://reader036.vdocuments.site/reader036/viewer/2022081506/56812e02550346895d93641c/html5/thumbnails/29.jpg)
29
Scanner/Tokenizer example
<book xmlns:x ="http://www.x.org" xmlns:y ="http://www.y.org" targetnamespace ="http://www.x.org"> <title>XML Bible</title> <author> <name> Bob </name> <y:title> professor</y:title> </author></book>
<"www.y.org", "title">
<"www.x.org", "title">
DATA
<"www.x.org", "/title">
![Page 30: TDX: a High-Performance Table-Driven XML Parser](https://reader036.vdocuments.site/reader036/viewer/2022081506/56812e02550346895d93641c/html5/thumbnails/30.jpg)
30
Roadmap
Motivation introduction Recent Work Table-Driven XML parsing – TDX TDX construction Tool Kit Experiment Results and Preliminary
Conclusion
![Page 31: TDX: a High-Performance Table-Driven XML Parser](https://reader036.vdocuments.site/reader036/viewer/2022081506/56812e02550346895d93641c/html5/thumbnails/31.jpg)
31
TDX Construction Toolkit
Service.wsdl wsdl2TDX
Service_flex.l
Service_TDX.h
tab.yy.c
Service_TDX.c
flex
![Page 32: TDX: a High-Performance Table-Driven XML Parser](https://reader036.vdocuments.site/reader036/viewer/2022081506/56812e02550346895d93641c/html5/thumbnails/32.jpg)
32
Roadmap
Motivation introduction Recent Work Table-Driven XML parsing – TDX TDX construction Tool Kit Experiment Results and Preliminary
Conclusion
![Page 33: TDX: a High-Performance Table-Driven XML Parser](https://reader036.vdocuments.site/reader036/viewer/2022081506/56812e02550346895d93641c/html5/thumbnails/33.jpg)
33
Experiment Setup
Compare with DFA-based Parser gSOAP 2.7 eXpat 1.2 Xerces 2.7.0
Memory-resident XML message Elapsed real time using timeofday()
![Page 34: TDX: a High-Performance Table-Driven XML Parser](https://reader036.vdocuments.site/reader036/viewer/2022081506/56812e02550346895d93641c/html5/thumbnails/34.jpg)
34
Parsing Performance(1)
0
50
100
150
200
250
300
350
TDX TDX -Cfa DFA DFA -Cfa eXpat gSOAP Xerces
EchoString Array Size = 1024B
Tim
e(u
s)
validation
decoding+validation
parsing
parsing+validation
![Page 35: TDX: a High-Performance Table-Driven XML Parser](https://reader036.vdocuments.site/reader036/viewer/2022081506/56812e02550346895d93641c/html5/thumbnails/35.jpg)
35
Parsing Performance (2)
1
10
100
1000
10000
100000
1 10 100 1000 10000EchoString Array Size
Tim
e(u
s)
XercesgSOAPeXpatTDXDFA
![Page 36: TDX: a High-Performance Table-Driven XML Parser](https://reader036.vdocuments.site/reader036/viewer/2022081506/56812e02550346895d93641c/html5/thumbnails/36.jpg)
36
Conclusion
Enhance parsing speed Flexible framework
Encoding value-based validation and application-specific events as semantic rules
Combining structural, syntactic and semantic constraints in one pass
High-level of modularity