xml and databases - computer science and engineeringcs4317/10s1/lectures/01_handout.pdf · xml and...
TRANSCRIPT
XM
L a
nd D
atabase
s
Sebast
ian M
aneth
NIC
TA
and U
NS
W
Le
ctu
re 1
Intr
od
uctio
n to
XM
L
CS
E@
UN
SW
-
Se
me
ste
r 1
, 2
01
0
XM
L a
nd D
atabase
s
Sebast
ian M
aneth
NIC
TA
and U
NS
W
(today:
Kim
Nguye
n)
Le
ctu
re 1
Intr
od
uctio
n to
XM
L
CS
E@
UN
SW
-
Se
me
ste
r 1
, 2
01
0
3
àS
imila
r to
HT
ML
(B
ern
ers
-Le
e, C
ER
N à
W3
C)
use
yo
ur
ow
n ta
gs.
àA
mo
un
t/p
op
ula
rity
of
XM
Ld
ata
is g
row
ing
ste
ad
ily(f
ast
er
tha
n c
om
pu
ting
po
we
r)
XM
L
4
àS
imila
r to
HT
ML
(B
ern
ers
-Le
e, C
ER
N à
W3
C)
use
yo
ur
ow
n ta
gs.
àA
mo
un
t/p
op
ula
rity
of
XM
Ld
ata
is g
row
ing
ste
ad
ily(f
ast
er
tha
n c
om
pu
ting
po
we
r)
HT
ML
pa
ge
s a
re
*tin
y* (
cou
ple
of K
byt
es)
XM
Ld
ocu
me
nts
ca
n b
e
hu
ge
(GB
yte
s)
XM
L
Data
base
s
5
XM
L a
nd
Da
tab
ase
s
Th
is c
ou
rse
will
àin
tro
du
ce y
ou
to
the
w
orl
d o
f X
ML
an
d to
th
e c
ha
llen
ge
so
f d
ea
ling
with
XM
L in
a R
DM
S.
So
me
of
the
se c
ha
llen
ge
s a
re
Exi
stin
g (
DB
) te
chn
olo
gy
can
no
tb
e a
pp
lied
to X
ML
da
ta.
6
XM
L a
nd
Da
tab
ase
s
Th
is c
ou
rse
will
àin
tro
du
ce y
ou
to
the
w
orl
d o
f X
ML
an
d to
th
e c
ha
llen
ge
so
f d
ea
ling
with
XM
L in
a R
DM
S.
So
me
of
the
se c
ha
llen
ge
s a
re
Exi
stin
g (
DB
) te
chn
olo
gy
can
no
tb
e a
pp
lied
to X
ML
da
ta.
can
ha
nd
le h
ug
e a
mo
un
ts o
f d
ata
sto
red
in r
ela
tio
ns
àst
ora
ge
ma
na
ge
me
nt
àin
de
x st
ruct
ure
sà
join
/so
rt a
lgo
rith
ms
à…
7
XM
L a
nd
Da
tab
ase
s
Th
is c
ou
rse
will
àin
tro
du
ce y
ou
to
the
w
orl
d o
f X
ML
an
d to
th
e c
ha
llen
ge
so
f d
ea
ling
with
XM
L in
a R
DM
S.
So
me
of
the
se c
ha
llen
ge
s a
re
Exi
stin
g (
DB
) te
chn
olo
gy
can
no
tb
e a
pp
lied
to X
ML
da
ta.
can
ha
nd
le h
ug
e a
mo
un
ts o
f d
ata
sto
red
in r
ela
tio
ns
àst
ora
ge
ma
na
ge
me
nt
àin
de
x st
ruct
ure
sà
join
/so
rt a
lgo
rith
ms
à…
XM
L d
ata
.
tre
est
ruct
ure
d
8
XM
L a
nd
Da
tab
ase
s
Th
is c
ou
rse
will
àin
tro
du
ce y
ou
to
the
w
orl
d o
f X
ML
an
d to
th
e c
ha
llen
ge
so
f d
ea
ling
with
XM
L in
a R
DM
S.
So
me
of
the
se c
ha
llen
ge
s a
re
Exi
stin
g (
DB
) te
chn
olo
gy
can
no
tb
e a
pp
lied
to X
ML
da
ta.
can
ha
nd
le h
ug
e a
mo
un
ts o
f d
ata
sto
red
in r
ela
tio
ns
àst
ora
ge
ma
na
ge
me
nt
àin
de
x st
ruct
ure
sà
join
/so
rt a
lgo
rith
ms
à…
tre
est
ruct
ure
d
Ho
w to
sto
rea
tr
ee
, in
a D
B t
ab
le/r
ela
tion
??
1
23
4
56
no
de
fch
ild n
sib
1
2
-
2
-
33
-4
4
5
-
5
-
66
--
XM
L d
ata
.
1
23
4
56
no
de
fch
ild n
sib
1
2
-
2
-
33
-4
4
5
-
5
-
66
--
tre
eta
ble
“sh
red
din
g”
9
XM
L a
nd
Da
tab
ase
s
Th
is c
ou
rse
will
àin
tro
du
ce y
ou
to
the
w
orl
d o
f X
ML
an
d to
th
e c
ha
llen
ge
so
f d
ea
ling
with
XM
L in
a R
DM
S.
So
me
of
the
se c
ha
llen
ge
s a
re
Exi
stin
g (
DB
) te
chn
olo
gy
can
no
tb
e a
pp
lied
to X
ML
da
ta.
àh
ow
do
we
sto
re t
ree
s?
àca
n w
e b
en
efit
fro
m i
nd
ex
str
uc
ture
s?
àh
ow
to
imp
lem
en
t tre
e n
av
iga
tio
n?
Ad
ditio
na
l ch
alle
ng
es p
ose
d b
y W
3C
’s X
Qu
ery
pro
po
sa
l.
àa
no
tion
of
ord
er
àa
co
mp
lex
ty
pe
/ s
ch
em
a s
ys
tem
àp
oss
ibili
ty to
co
nst
ruct
ne
w tr
ee
no
de
s o
n t
he
fly
.
tre
est
ruct
ure
d
XM
L d
ata
.
10
XM
L a
nd
Da
tab
ase
s
Th
is c
ou
rse
will
àin
tro
du
ce y
ou
to
the
w
orl
d o
f X
ML
an
d to
th
e c
ha
llen
ge
so
f d
ea
ling
with
XM
L in
a R
DM
S.
So
me
of
the
se c
ha
llen
ge
s a
re
Exi
stin
g (
DB
) te
chn
olo
gy
can
no
tb
e a
pp
lied
to X
ML
da
ta.
àh
ow
do
we
sto
re t
ree
s?
àca
n w
e b
en
efit
fro
m i
nd
ex
str
uc
ture
s?
àh
ow
to
imp
lem
en
t tre
e n
av
iga
tio
n?
Ad
ditio
na
l ch
alle
ng
es p
ose
d b
y W
3C
’s X
Qu
ery
pro
po
sa
l.
àa
no
tion
of
ord
er
àa
co
mp
lex
ty
pe
/ s
ch
em
a s
ys
tem
àp
oss
ibili
ty to
co
nst
ruct
ne
w tr
ee
no
de
s o
n t
he
fly
.
tre
est
ruct
ure
d
XM
L d
ata
.
XM
L =
T
hre
at
to D
ata
ba
se
s…
?!
11
XM
L a
nd
Da
tab
ase
s
Th
is c
ou
rse
will
àin
tro
du
ce y
ou
to
the
w
orl
d o
f X
ML
an
d to
th
e c
ha
llen
ge
so
f d
ea
ling
with
XM
L in
a R
DM
S.
So
me
of
the
se c
ha
llen
ge
s a
re
Exi
stin
g (
DB
) te
chn
olo
gy
can
no
tb
e a
pp
lied
to X
ML
da
ta.
àh
ow
do
we
sto
re t
ree
s?
àca
n w
e b
en
efit
fro
m i
nd
ex
str
uc
ture
s?
àh
ow
to
imp
lem
en
t tre
e n
av
iga
tio
n?
Ad
ditio
na
l ch
alle
ng
es p
ose
d b
y W
3C
’s X
Qu
ery
pro
po
sa
l.
àa
no
tion
of
ord
er
àa
co
mp
lex
ty
pe
/ s
ch
em
a s
ys
tem
àp
oss
ibili
ty to
co
nst
ruct
ne
w tr
ee
no
de
s o
n t
he
fly
.
tre
est
ruct
ure
d
XM
L d
ata
.
XM
L =
T
hre
at
to D
ata
ba
se
s!!
12
XM
L a
nd
Da
tab
ase
s
Th
is c
ou
rse
will
àin
tro
du
ce y
ou
to
the
w
orl
d o
f X
ML
an
d to
th
e c
ha
llen
ge
so
f d
ea
ling
with
XM
L in
a R
DM
S.
Yo
u w
ill le
arn
ab
ou
t
àT
ree
str
uct
ure
d d
ata
(XM
L)
àX
ML
pa
rse
rs &
effi
cie
nt m
em
ory
re
pre
sen
tatio
n
àQ
ue
ry la
ng
ua
ge
sfo
r X
ML (
XP
ath
, X
Qu
ery
, X
SLT
…)
àE
ffici
en
t e
valu
atio
n u
sin
g fi
nite
-sta
te a
uto
ma
ta
àM
ap
pin
g X
ML
to d
ata
ba
ses
àA
dva
nce
d to
pic
s (q
ue
ry o
ptim
iza
tion
s,
acc
ess
co
ntr
ol,
up
da
te la
ng
ua
ge
s…
)
13
Th
is c
ou
rse
will
àin
tro
du
ce y
ou
to
the
w
orl
d o
f X
ML
an
d to
th
e c
ha
llen
ge
so
f d
ea
ling
with
XM
L in
a R
DM
S.
Yo
u w
ill le
arn
ab
ou
t
àT
ree
str
uct
ure
d d
ata
(XM
L)
àX
ML
pa
rse
rs &
effi
cie
nt m
em
ory
re
pre
sen
tatio
n
àQ
ue
ry L
an
gu
ag
es
for
XM
L (
XP
ath
, X
Qu
ery
, X
SLT
…)
àE
ffici
en
t e
valu
atio
n u
sin
g fi
nite
-sta
te a
uto
ma
ta
àM
ap
pin
g X
ML
to d
ata
ba
ses
àA
dva
nce
d T
op
ics
(qu
ery
op
timiz
atio
ns,
a
cce
ss c
on
tro
l,u
pd
ate
la
ng
ua
ge
s…
)
Yo
u w
ill N
OT
lea
rn a
bo
ut
àH
ack
ing
CG
I scr
ipts
àH
TM
Là
Java
Scr
ipt
… Yo
u N
EE
D t
o p
rog
ram
in
àJa
va o
rà
C+
+
XM
L a
nd
Da
tab
ase
s
14
Ab
ou
t X
ML
àX
ML is t
he
Wo
rld
Wid
e W
eb
Co
nso
rtiu
m’s
(W
3C
, http://www.w3.org/
) E
xte
nsi
ble
Ma
rku
p L
an
gu
ag
e
àW
e h
op
e to
co
nvi
nce
yo
u th
at
XM
L is
no
t ye
t a
no
the
r h
ype
d T
LA
,b
ut
is u
sefu
l te
chn
olo
gy.
àY
ou
will
be
com
e b
est
frie
nd
s w
ith o
ne
of t
he
mo
st
imp
ort
an
t d
ata
str
uctu
res in
Co
mp
utin
g S
cie
nce
, th
e
tre
e.
X
ML
is a
ll a
bo
ut
tre
e-s
ha
pe
d d
ata
.
àY
ou
will
lea
rn to
ap
ply
a n
um
be
r o
f clo
sely
re
late
d
XM
L s
tan
da
rds
:
> R
ep
rese
ntin
g d
ata
:
XM
Lits
elf,
DT
D, X
ML
Sc
he
ma
, X
ML
dia
lect
s>
In
terf
ace
s to
co
nn
ect
PL
s to
XM
L:
D
OM
, S
AX
> L
an
gu
ag
es
to q
ue
ry/tra
nsf
orm
XM
L:
X
Pa
th,
XQ
ue
ry, X
SLT
.
15
Ab
ou
t X
ML
We
will
talk
ab
ou
t a
lgo
rith
ms a
nd
pro
gra
mm
ing
te
ch
niq
ue
sto
effi
cie
ntly
ma
nip
ula
te X
ML
da
ta:
àR
eg
ula
r e
xp
res
sio
ns
can
be
use
d to
va
lid
ate
XM
L d
ata
àF
init
e s
tate
au
tom
ata
lie a
t th
e h
ea
rt o
f hig
hly
effi
cie
nt
XP
ath
im
ple
me
nta
tio
ns
àTre
e t
rav
ers
als
ma
y b
e u
sed
to p
rep
roce
ss X
ML
tre
es
in o
rde
r to
sup
po
rt X
Pa
th e
va
lua
tio
n,
to s
tore
XM
L t
ree
sin
da
tab
ase
s, e
tc.
In th
e e
nd
, yo
u s
ho
uld
be
ab
le to
dig
est
th
e th
ick
pile
of
rela
ted
W3
CX
__
_ s
tan
da
rds.
(lik
e, X
Qu
ery
, X
Po
inte
r, X
Lin
k, X
HT
ML
, X
Incl
ud
e, X
ML B
ase
, X
ML S
ch
em
a,
…
16
Co
urs
e O
rga
niz
atio
n
Le
ctu
re
Tu
esd
ay,
15
:00
–1
8:0
0C
he
mic
alS
c M
17
(e
x A
pp
lied
Sc)
(K
-F1
0-M
17
)
Le
ctu
rer
Se
ba
stia
n M
an
eth
Co
nsu
lt
F
rid
ay,
11
:00
-12
:00
(E
50
8,
L5
) A
ll e
ma
il to
cs
43
17
@cs
e.u
nsw
.ed
u.a
u
Tu
tori
al
Tu
esd
ay,
12
:00
-14
:00
@ Q
ua
dra
ng
le G
04
0 (
K-E
15
-G0
40
) -
-b
efo
rele
ctu
reW
ed
ne
sda
y, 1
2:0
0-1
4:0
0 @
Qu
ad
ran
gle
G0
22
(K
-E1
5-G
02
2)
We
dn
esd
ay,
14
:00
-16
:00
@ Q
ua
dra
ng
le G
02
2 (
K-E
15
-G0
22
) T
hu
rsd
ay,
14
:00
-16
:00
@ Q
ua
dra
ng
le G
04
0 (
K-E
15
-G0
40
) T
hu
rsd
ay,
16
:00
-18
:00
@ Q
ua
dra
ng
le G
02
2 (
K-E
15
-G0
22
)
Tu
tors
?,
Kim
Ng
uye
n
All
em
ail
to
cs4
31
7@
cse
.un
sw.e
du
.au
17
Co
urs
e O
rga
niz
atio
n
Le
ctu
re
Tu
esd
ay,
15
:00
–1
8:0
0C
he
mic
alS
c M
17
(e
x A
pp
lied
Sc)
(K
-F1
0-M
17
)
Le
ctu
rer
Se
ba
stia
n M
an
eth
Co
nsu
lt
F
rid
ay,
11
:00
-12
:00
(E
50
8,
L5
) A
ll e
ma
il to
cs
43
17
@cs
e.u
nsw
.ed
u.a
u
Bo
ok
No
ne
!
Co
urs
e s
lide
s o
f M
arc
Sch
oll,
Un
i Ko
nst
an
zh
ttp
://w
ww
.inf.
un
i-ko
nst
an
z.d
e/d
bis
/te
ach
ing
/ws0
50
6/d
ata
ba
se-x
ml/X
ML
DB
.pd
f
Th
eo
ry /
PL
orie
nte
d, b
oo
k d
raft
:h
ttp
://a
rbre
.is.s
.u-t
oky
o.a
c.jp
/~h
ah
oso
ya/x
mlb
oo
k/
Su
gg
este
d r
ea
din
g m
ate
ria
l:
18
Co
urs
e O
rga
niz
atio
n
Le
ctu
re
Tu
esd
ay,
15
:00
–1
8:0
0C
he
mic
alS
c M
17
(e
x A
pp
lied
Sc)
(K
-F1
0-M
17
)
Le
ctu
rer
Se
ba
stia
n M
an
eth
Co
nsu
lt
F
rid
ay,
11
:00
-12
:00
(E
50
8,
L5
) A
ll e
ma
il to
cs
43
17
@cs
e.u
nsw
.ed
u.a
u
Pro
gra
mm
ing
As
sig
nm
en
ts
5 a
ssig
nm
en
ts, d
ue
eve
ry o
the
r M
on
da
y. (
1st
is d
ue
1
5th
M
arc
h2
nd
is d
ue
29
thM
arc
h…
)
Pe
r a
ssig
nm
en
t:
10
po
ints
(to
tal:
60
po
ints
) (+
2 b
on
us
po
ints
)
Fin
al
Ex
am
:
Fin
al e
xam
:
40
po
ints
(m
ust
ge
t 1
6/4
0to
pa
ss,
tha
t is
40
%)
19
Ou
tlin
e -
Ass
ign
me
nts
Yo
u c
an
fre
ely
ch
oo
se to
pro
gra
m y
ou
r a
ssig
nm
en
ts in
àC
/ C
++
, o
rà
Java
Ho
we
ver,
yo
ur
cod
e
mu
st
co
mp
ile
wit
h
gc
c/
g+
+, ja
va
c,
as
inst
alle
d o
n C
SE
lin
ux
syst
em
s!
Su
bm
it co
de
(u
sin
g give
) b
y M
on
da
y 2
3:5
9 (
eve
ry o
the
r w
ee
k)
Ass
ign
me
nt 4
(ha
rde
r) g
ets
fou
r w
ee
ks /
cou
nts
do
ub
le (
20
Po
ints
)d
ue
da
te 1
7th
Ma
y
20
Ou
tlin
e -
Ass
ign
me
nts
1.
Re
ad
XM
L,
usi
ng
DO
M p
ars
er.
Cre
ate
do
cum
en
t sta
tistic
s
2.
SA
X P
ars
e in
to m
em
ory
str
uct
ure
: Tre
e v
s D
AG
3.
Ma
p X
ML
into
RD
BM
S
4.
XP
ath
eva
lua
tion
ove
r m
ain
me
mo
ry s
tru
ctu
res
(+
str
ea
min
g su
pp
ort
)
5.
XP
ath
into
SQ
L T
ran
sla
tion
13
da
ys
2 w
ee
ks
2 w
ee
ks(+
1 w
ee
k b
rea
k)4
we
eks
2 w
ee
ks
21
Ou
tlin
e -
Le
ctu
res
1.
Intr
od
uct
ion
to X
ML
, E
nco
din
gs,
Pa
rse
rs2
.M
em
ory
Re
pre
sen
tatio
ns
for
XM
L: S
pa
ce v
sA
cce
ss S
pe
ed
3.
RD
BM
S R
ep
rese
nta
tion
of
XM
L4
.D
TD
s, S
che
ma
s, R
eg
ula
r E
xpre
ssio
ns,
Am
big
uity
5.
No
de
Se
lect
ing
Qu
erie
s: X
Pa
th6
.E
ffici
en
t X
Pa
th E
valu
atio
n7
.X
Pa
th P
rop
ert
ies:
ba
ckw
ard
axe
s, c
on
tain
me
nt t
est
8.
Str
ea
min
g E
valu
atio
n: h
ow
mu
ch m
em
ory
do
yo
u n
ee
d?
9.
XP
ath
Eva
lua
tion
usi
ng
RD
BM
S1
0.
Pro
pe
rtie
s o
f XP
ath
11
.X
SLT
12
.X
Qu
ery
13
.W
rap
up
, Exa
m P
rep
ara
tion
, Op
en
qu
est
ion
s, e
tc
22
Le
ctu
re 1
XM
L I
ntr
od
uct
ion
Le
ctu
re 1
XM
L I
ntr
od
uct
ion
23
Ou
tlin
e
1.
Th
ree
m
oti
va
tio
ns
fo
r X
ML
(1)
re
ligio
us
(2)
pra
ctic
al
(3)
th
eo
retic
al /
ma
the
ma
tica
l
2.
We
ll-f
orm
ed
XM
L
3.
Ch
ara
cte
r E
nc
od
ing
s
4.
Pa
rse
rs f
or
XM
L
àp
ars
ing
into
DO
M
(Do
cum
en
t Ob
ject
Mo
de
l)
24
XM
L I
ntr
od
uct
ion
Relig
iou
sm
otiv
atio
n f
or
XM
L:
to h
ave
one
langua
ge
to s
peak
about
data
.
25
26
àX
ML
is a
Data
Exch
an
ge F
orm
at
1974
SG
ML
(C
harles
Gold
farb
at I
BM
Rese
arc
h)
1989
HT
ML
(T
im B
ern
ers
-Lee a
t CE
RN
/Geneva
)
1994
Bern
ers
-Lee fo
unds
Web C
onso
rtiu
m (
W3C
)
1996
XM
L(W
3C
dra
ft, v
1.0
in 1
998)
XM
L M
otiv
atio
n
(h
isto
rica
l)
27
àX
ML
is a
Data
Exch
an
ge F
orm
at
1974
SG
ML
(C
harles
Gold
farb
at I
BM
Rese
arc
h)
1989
HT
ML
(T
im B
ern
ers
-Lee a
t CE
RN
/Geneva
)
1994
Bern
ers
-Lee fo
unds
Web C
onso
rtiu
m (
W3C
)
1996
XM
L(W
3C
dra
ft, v
1.0
in 1
998)
XM
L M
otiv
atio
n
(h
isto
rica
l)
http://www.w3.org/TR/REC-xml/
28
(2)
Pra
ctic
al
XM
L=
da
ta
Text
file
Tom Henzinger
EPFL
… Helmut Seidl
TU Munich
29
Tom Henzinger
EPFL
… Helmut Seidl
TU Munich
Text
file
<Related>
<colleague>
<name>Tom Henzinger</name>
<affil>EPFL</affil>
<email>
</email>
</colleague>
… <friend>
<name>Helmut Seidl</name>
<affil>TU Munich</affil>
<email>[email protected]
</email>
</friend>
</Related>
XM
L d
ocu
men
t
“mark
it
up!”
(2)
Pra
ctic
al
XM
L=
da
ta
+
str
uc
ture
30
Tom Henzinger
EPFL
… Helmut Seidl
TU Munich
Text
file
<Related>
<colleague>
<name>Tom Henzinger</name>
<affil>EPFL</affil>
<email>
</email></colleague>
</colleague>
… <friend>
<name>Helmut Seidl</name>
<affil>TU Munich</affil>
<email>[email protected]
</email>
</friend>
</Related>
XM
L d
ocu
men
t
“mark
it
up!”
(2)
Pra
ctic
al
XM
L=
da
ta
+
str
uc
ture
Is th
is a
go
od
“te
mp
late
”??
Wh
at
ab
ou
t la
st/
firs
t n
am
e?
Se
ve
ral a
ffil’
s / e
ma
il’s…
?
31
XM
L D
ocu
me
nts
àO
rdin
ary
text
file
s (
UT
F-8
, U
TF
-16
, U
CS
-4 …
)
àO
rig
ina
tes
fro
m t
ype
sett
ing
/Do
cPro
cess
ing
co
mm
un
ity
àId
ea
of
lab
ele
d b
racke
ts (
“ma
rk u
p”)
fo
r str
uctu
re is n
ot
ne
w!
(alre
ad
y u
se
d b
y C
ho
msky in
th
e 1
96
0’s
)
àB
rack
ets
de
scrib
e a
tre
e s
tru
ctu
re
àA
llow
s a
pp
lica
tion
s fr
om
diff
ere
nt v
en
do
rs to
exc
ha
ng
e d
ata
!
ès
tan
da
rdiz
ed
, e
xtr
em
ely
wid
ely
ac
ce
pte
d!
32
XM
L D
ocu
me
nts
àO
rdin
ary
text
file
s (
UT
F-8
, U
TF
-16
, U
CS
-4 …
)
àO
rig
ina
tes
fro
m t
ype
sett
ing
/Do
cPro
cess
ing
co
mm
un
ity
àId
ea
of
lab
ele
d b
racke
ts (
“ma
rk u
p”)
fo
r str
uctu
re is n
ot
ne
w!
(alre
ad
y u
se
d b
y C
ho
msky in
th
e 1
96
0’s
)
àB
rack
ets
de
scrib
e a
tre
e s
tru
ctu
re
àA
llow
s a
pp
lica
tion
s fr
om
diff
ere
nt v
en
do
rs to
exc
ha
ng
e d
ata
!
ès
tan
da
rdiz
ed
, e
xtr
em
ely
wid
ely
ac
ce
pte
d!
So
cia
l Im
plic
atio
ns!
All
scie
nce
s (b
iolo
gy,
ge
og
rap
hy,
me
teo
rolo
gy,
astr
olo
gy…
)
ha
ve
ow
n X
ML “
dia
lects
” to
sto
re t
he
ird
ata
op
tima
lly
33
XM
L D
ocu
me
nts
àO
rdin
ary
text
file
s (
UT
F-8
, U
TF
-16
, U
CS
-4 …
)
àO
rig
ina
tes
fro
m t
ype
sett
ing
/Do
cPro
cess
ing
co
mm
un
ity
àId
ea
of
lab
ele
d b
racke
ts (
“ma
rk u
p”)
fo
r str
uctu
re is n
ot
ne
w!
(alre
ad
y u
se
d b
y C
ho
msky in
th
e 1
96
0’s
)
àB
rack
ets
de
scrib
e a
tre
e s
tru
ctu
re
àA
llow
s a
pp
lica
tion
s fr
om
diff
ere
nt v
en
do
rs to
exc
ha
ng
e d
ata
!
ès
tan
da
rdiz
ed
, e
xtr
em
ely
wid
ely
ac
ce
pte
d!
Pro
ble
mh
igh
ly v
erb
ose
, lo
ts o
f re
pe
titiv
e m
ark
up
, la
rge
file
s
34
XM
L D
ocu
me
nts
àO
rdin
ary
text
file
s (
UT
F-8
, U
TF
-16
, U
CS
-4 …
)
àO
rig
ina
tes
fro
m t
ype
sett
ing
/Do
cPro
cess
ing
co
mm
un
ity
àId
ea
of
lab
ele
d b
racke
ts (
“ma
rk u
p”)
fo
r str
uctu
re is n
ot
ne
w!
(alre
ad
y u
se
d b
y C
ho
msky in
th
e 1
96
0’s
)
àB
rack
ets
de
scrib
e a
tre
e s
tru
ctu
re
àA
llow
s a
pp
lica
tion
s fr
om
diff
ere
nt v
en
do
rs to
exc
ha
ng
e d
ata
!
ès
tan
da
rdiz
ed
, e
xtr
em
ely
wid
ely
ac
ce
pte
d!
Co
ntr
a..
hig
hly
ve
rbo
se, l
ots
of
rep
etit
ive
ma
rku
p,
larg
e fi
les
Pro
..w
e h
ave
a s
tan
da
rd!
A S
tan
da
rd!
A S
TAN
DA
RD
! à
JY
ou
ne
ve
r n
ee
d to
write
a p
ars
er
ag
ain
! U
se
XM
L!
J
35
XM
L D
ocu
me
nts
… in
ste
ad
of
writin
g a
pa
rse
r, y
ou
sim
ply
fix
yo
ur
ow
n “
XM
L d
iale
ct”
,
by d
escri
bin
g a
ll “a
dm
issib
le te
mp
late
s”
(+
ma
yb
e e
ve
n th
e s
pe
cific
da
ta ty
pe
s th
at m
ay
ap
pe
ar
insi
de
).
Yo
u d
o t
his
, u
sin
g a
n X
ML T
yp
e d
efin
itio
n la
ng
ua
ge
such
a
s D
TD
o
r R
ela
x N
G (
Oa
sis)
.
Of c
ou
rse
, su
ch ty
pe
de
finiti
on
lan
gu
ag
es
are
SIM
PL
E,
be
cau
se y
ou
wa
nt t
he
pa
rse
rs to
be
effi
cie
nt!
Th
ey
are
sim
ilar
to E
BN
F. à
con
text
-fre
e g
ram
ma
r w
ith
re
g.
exp
r’s in
the
rig
ht-
ha
nd
sid
es.
J
36
XM
L D
ocu
me
nts
Ele
me
nt n
am
es
an
d th
eir c
on
ten
t
Exa
mp
le D
TD
(Do
cum
en
t Typ
e D
esc
rip
tion
)
Related à
(colleague | friend | family)*
colleague à
(name,affil*,email*)
friend à
(name,affil*,email*)
family à
(name,affil*,email*)
name à
(#PCDATA)
…
37
XM
L D
ocu
me
nts
Re
late
d
frie
nd
…
c
olle
ag
ue
fa
mily
…
na
me
affi
l e
ma
iln
am
e e
ma
il e
ma
il
…
Vic
tor
..
Ele
me
nt n
am
es
an
d th
eir c
on
ten
t
Exa
mp
le D
TD
(D
ocu
me
nt T
ype
De
scri
ptio
n)
Related à
(colleague | friend | family)*
colleague à
(name,affil*,email*)
friend à
(name,affil*,email*)
family à
(name,affil*,email*)
name à
(#PCDATA)
…
38
XM
L D
ocu
me
nts
Exa
mp
le D
TD
Related à
(colleague | friend | family)*
colleague à
(name,affil*,email*)
friend à
(name,affil*,email*)
family à
(name,affil*,email*)
name à
(#PCDATA)
…
Re
late
d
frie
nd
…
c
olle
ag
ue
fa
mily
…
na
me
affi
l e
ma
iln
am
e e
ma
il e
ma
il
…
Vic
tor
..
Ele
me
nt n
am
es
an
d th
eir c
on
ten
t
frie
nd
…
c
olle
ag
ue
fa
mily
“Ele
me
nt n
od
e”
39
XM
L D
ocu
me
nts
Re
late
d
frie
nd
…
c
olle
ag
ue
fa
mily
…
na
me
affi
l e
ma
il
…
Vic
tor
..
Ele
me
nt n
am
es
an
d th
eir c
on
ten
t
frie
nd
…
c
olle
ag
ue
fa
mily
“Ele
me
nt n
od
e”
Vic
tor
..
“Te
xt n
od
e”
Exa
mp
le D
TD
Related à
(colleague | friend | family)*
colleague à
(name,affil*,email*)
friend à
(name,affil*,email*)
family à
(name,affil*,email*)
name à
(#PCDATA)
…
na
me
em
ail
em
ail
40
XM
L D
ocu
me
nts
Re
late
d
frie
nd
…
c
olle
ag
ue
fa
mily
…
na
me
affi
l e
ma
il
…
Vic
tor
..
Ele
me
nt n
am
es
an
d th
eir c
on
ten
t
frie
nd
…
c
olle
ag
ue
fa
mily
“Ele
me
nt n
od
e”
Vic
tor
..
“Te
xt n
od
e”
Te
rmin
olo
gy
do
cum
en
t is
va
lid
wrt
th
e D
TD
“It v
ali
da
tes”
Te
rmin
olo
gy
do
cum
en
t is
va
lid
wrt
th
e D
TD
“It v
ali
da
tes”
Te
rmin
olo
gy
do
cum
en
t is
Exa
mp
le D
TD
Related à
(colleague | friend | family)*
colleague à
(name,affil*,email*)
friend à
(name,affil*,email*)
family à
(name,affil*,email*)
name à
(#PCDATA)
…
na
me
em
ail
em
ail
41
XM
L D
ocu
me
nts
Wh
at e
lse
: (
be
sid
es
ele
me
nt
an
d t
ext
no
de
s)
àa
ttrib
ute
sà
pro
cess
ing
inst
ruct
ion
sà
com
me
nts
àn
am
esp
ace
sà
en
tity
refe
ren
ces
(tw
o k
ind
s)
42
XM
L D
ocu
me
nts
Wh
at e
lse
: (
be
sid
es
ele
me
nt
an
d t
ext
no
de
s)
àa
ttrib
ute
sà
pro
cess
ing
inst
ruct
ion
s
àco
mm
en
tsà
na
me
spa
ces
àe
ntit
y re
fere
nce
s (t
wo
kin
ds)
<fa
mily
re
l=“b
roth
er”
,ag
e=
“25
”><
na
me
>… <
/fam
ily>
43
XM
L D
ocu
me
nts
Wh
at e
lse
:
àa
ttrib
ute
sà
pro
cess
ing
inst
ruct
ion
s
àco
mm
en
tsà
na
me
spa
ces
àe
ntit
y re
fere
nce
s (t
wo
kin
ds)
<fa
mily
re
l=“b
roth
er”
,ag
e=
“25
”><
na
me
>… <
/fam
ily>
<?
ph
p s
ql (“
SE
LE
CT
* F
RO
M …
”) …
?>
Se
e2
.6 P
roce
ssin
g In
stru
ctio
ns
44
XM
L D
ocu
me
nts
Wh
at e
lse
:
àa
ttrib
ute
sà
pro
cess
ing
inst
ruct
ion
s
àco
mm
en
ts
<!--
some comment -->
àn
am
esp
ace
sà
en
tity
refe
ren
ces
(tw
o k
ind
s)
<fa
mily
re
l=“b
roth
er”
,ag
e=
“25
”><
na
me
>… <
/fam
ily>
<?
ph
p s
ql (“
SE
LE
CT
* F
RO
M …
”) …
?>
Se
e2
.6 P
roce
ssin
g In
stru
ctio
ns
45
XM
L D
ocu
me
nts
Wh
at e
lse
:
àa
ttrib
ute
sà
pro
cess
ing
inst
ruct
ion
s
àco
mm
en
ts
<!--
some comment -->
àn
am
esp
ace
sà
en
tity
refe
ren
ces
(tw
o k
ind
s)
<fa
mily
re
l=“b
roth
er”
,ag
e=
“25
”><
na
me
>… <
/fam
ily>
<?
ph
p s
ql (“
SE
LE
CT
* F
RO
M …
”) …
?>
Se
e2
.6 P
roce
ssin
g In
stru
ctio
ns
<!-
-th
e 'p
rice
' ele
me
nt's
na
me
spa
ce is
htt
p://
eco
mm
erc
e.o
rg/s
che
ma
-->
<
ed
i:price
xm
lns:
ed
i='h
ttp:/
/eco
mm
erc
e.o
rg/s
che
ma
' un
its=
'Eu
ro'>
32
.18
</e
di:p
rice
> 46
XM
L D
ocu
me
nts
Wh
at e
lse
:
àa
ttrib
ute
sà
pro
cess
ing
inst
ruct
ion
s
àco
mm
en
ts
<!--
some comment -->
àn
am
esp
ace
sà
en
tity
refe
ren
ces
(tw
o k
ind
s)
<fa
mily
re
l=“b
roth
er”
,ag
e=
“25
”><
na
me
>… <
/fam
ily>
<?
ph
p s
ql (“
SE
LE
CT
* F
RO
M …
”) …
?>
Se
e2
.6 P
roce
ssin
g In
stru
ctio
ns
<!-
-th
e 'p
rice
' ele
me
nt's
na
me
spa
ce is
htt
p://
eco
mm
erc
e.o
rg/s
che
ma
-->
<
ed
i:price
xm
lns:
ed
i='h
ttp:/
/eco
mm
erc
e.o
rg/s
che
ma
' un
its=
'Eu
ro'>
32
.18
</e
di:p
rice
>
47
XM
L D
ocu
me
nts
Wh
at e
lse
:
àa
ttrib
ute
sà
pro
cess
ing
inst
ruct
ion
s
àco
mm
en
ts
<!--
some comment -->
àn
am
esp
ace
sà
en
tity
refe
ren
ce
s(t
wo
kin
ds)
<fa
mily
re
l=“b
roth
er”
,ag
e=
“25
”><
na
me
>… <
/fam
ily>
<?
ph
p s
ql (“
SE
LE
CT
* F
RO
M …
”) …
?>
Se
e2
.6 P
roce
ssin
g In
stru
ctio
ns
<!-
-th
e 'p
rice
' ele
me
nt's
na
me
spa
ce is
htt
p://
eco
mm
erc
e.o
rg/s
che
ma
-->
<
ed
i:price
xm
lns:
ed
i='h
ttp:/
/eco
mm
erc
e.o
rg/s
che
ma
' un
its=
'Eu
ro'>
32
.18
</e
di:p
rice
>
ch
ara
cte
r re
fere
nce
Typ
e <
key>
less
-th
an
</k
ey>
(&
#x3
C;)
to
sa
ve o
ptio
ns.
48
XM
L D
ocu
me
nts
Wh
at e
lse
:
àa
ttrib
ute
sà
pro
cess
ing
inst
ruct
ion
s
àco
mm
en
ts
<!--
some comment -->
àn
am
esp
ace
sà
en
tity
refe
ren
ce
s(t
wo
kin
ds)
<fa
mily
re
l=“b
roth
er”
,ag
e=
“25
”><
na
me
>… <
/fam
ily>
<?
ph
p s
ql (“
SE
LE
CT
* F
RO
M …
”) …
?>
Se
e2
.6 P
roce
ssin
g In
stru
ctio
ns
<!-
-th
e 'p
rice
' ele
me
nt's
na
me
spa
ce is
htt
p://
eco
mm
erc
e.o
rg/s
che
ma
-->
<
ed
i:price
xm
lns:
ed
i='h
ttp:/
/eco
mm
erc
e.o
rg/s
che
ma
' un
its=
'Eu
ro'>
32
.18
</e
di:p
rice
>
ch
ara
cte
r re
fere
nce
Typ
e <
key>
less
-th
an
</k
ey>
(&
#x3
C;)
to
sa
ve o
ptio
ns.
Th
is d
ocu
me
nt w
as
pre
pa
red
on
&d
ocd
ate
;an
d
49
Ea
rly
Ma
rku
p
Th
e te
rm m
ark
up
ha
s b
ee
n c
oin
ed
by
the
ty
pe
se
ttin
gco
mm
un
ity,
no
tb
y co
mp
ute
r sc
ien
tist.
With
the
ad
ven
t o
f prin
ting
pre
ss,
write
rs a
nd
ed
itors
use
d
(oft
en
ma
rgin
al)
no
tes
to in
stru
ct p
rin
ters
toà
Se
lect
ce
rta
in fo
nts
àL
et
pa
ssa
ge
s o
f te
xt s
tan
d o
ut
àIn
de
nt
a li
ne
of t
ext
, e
tc
Pro
ofr
ea
de
rs u
se a
sp
eci
al s
et
of
sym
bo
ls, t
he
ir s
pe
cia
l ma
rku
p
lan
gu
ag
e,
to id
en
tify
typ
os,
form
att
ing
glit
che
s, a
nd
sim
ilar
err
on
eo
us
fra
gm
en
ts o
f te
xt.
Th
e m
ark
up
lan
gu
ag
e is
de
sig
ne
d to
be
ea
sily
re
cog
niz
ab
le in
the
act
ua
l flo
w o
f te
xt.
50
Ea
rly
Ma
rku
p
Co
mp
ute
r sc
ien
tists
ad
op
ted
the
ma
rku
p id
ea
–o
rig
ina
lly to
a
nn
ota
te p
rog
ram
so
urc
e c
od
e:
àD
esi
gn
the
ma
rku
p la
ng
ua
ge
su
ch t
ha
t its
co
nst
ruct
s a
re
ea
sil
y r
ec
og
niz
ab
leb
y a
ma
chin
e.
àA
pp
roa
ch
es
(1)
Ma
rku
p is
wri
tte
n u
sin
g a
sp
ec
ial
se
t o
f c
ha
rac
ters
, dis
join
t fro
mth
e s
et o
f ch
ara
cte
rs t
ha
t fo
rm t
he
toke
ns
of
the
pro
gra
m(2
) M
ark
up
occ
urs
in p
lace
s in
th
e s
ou
rce
file
wh
ere
pro
gra
m c
od
e
ma
y n
ot
ap
pe
ar
(pro
gra
m l
ay
ou
t).
Exa
mp
le:
Fo
rtra
n 7
7 fi
xed
fo
rm s
ou
rce
:
àF
ort
ran
sta
tem
en
ts s
tart
in c
olu
mn
7 a
nd
do
no
t e
xce
ed
co
lum
n 7
2,
àa
Fo
rtra
n s
tate
me
nt
lon
ge
r th
an
66
ch
ars
ma
y b
e c
on
tinu
ed
on
th
e n
ext
lin
eIf
a c
ha
ract
er
no
t in
{ 0
,!,_
} is
pla
ce in
co
lum
n 6
of
the
co
ntin
uin
g li
ne
àco
mm
en
t lin
es s
tart
with
a “
C”
or
“*”
in c
olu
mn
1,
àN
um
eric
lab
els
(D
O,
FO
RM
AT
sta
tem
en
ts)
ha
ve to
be
pla
ced
in c
olu
mn
s 1
-5.
51
52
Sa
mp
le M
ark
up
Ap
plic
atio
n
A C
om
ic S
trip
Fin
de
r
àN
ext
8 s
lide
s fr
om
Ma
rc S
ch
oll’
s 2
00
5 le
ctu
re.
53
54
55
56
57
58
59
60
61
To
da
y,
XM
L h
as
ma
ny
frie
nd
s:
Qu
ery
Lan
gu
ag
es
XP
ath
, X
SLT,
XQ
uery
, …
(most
ly b
y W
3C
)
Imp
lem
en
tati
on
s(P
ars
ers
, Valid
ato
rs, T
ransl
ato
rs)
SA
X,
Xala
n, G
ala
x, X
erx
es, …
(by IB
M/A
pache, M
icro
soft
, O
racle
, S
un…
)
Cu
rren
t Is
su
es
-D
B/P
L s
upport
(
“data
bin
din
g”,
JB
ind,
Casto
r, Z
eus…
)
-st
ora
ge s
upport
(c
om
pre
ssio
n,
data
optim
izatio
n)
62
XM
L: t
ypic
al u
sag
e s
cen
ario
<Pro
duct>
<pro
duct_
id>
m1
01
</pro
duct_
id>
<nam
e>
Sony w
alk
man <
/nam
e>
<curr
ency>
AU
D <
/curr
ency>
<pri
ce>
20
0.0
0 <
/pri
ce>
<gst>
10
% <
/gst>
</Pro
duct>
…
X M L
XM
L
Sty
les
he
et
XM
L
Sty
les
he
et
Pre
sen
tatio
nF
orm
at i
nfo
XM
L S
tyle
sh
ee
t
Pre
sen
tatio
nF
orm
at i
nfo
XM
L S
tyle
sh
ee
t
On
e d
ata
so
urc
eà
seve
ral
(dyn
am
ic g
en
.) v
iew
s
Do
cum
en
t str
uct
ure
De
f. o
f p
rice
, g
st,
…
DT
D,
XM
L S
che
ma
63
XM
L: w
he
re is
it u
sed
?
-D
ocu
me
nt
form
ats
:S
VG
(ve
cto
r im
ag
es)
, Op
en
Do
cum
en
tFo
rma
t (O
pe
nO
ffice
, Go
og
leD
ocs
),D
ocB
oo
k(T
ext
fo
rma
ttin
g),
EP
UB
(e
lect
ron
ic b
oo
ks),
XH
TM
L (
we
b),
…
-P
roto
col f
orm
ats
:S
OA
P (
We
bS
erv
ice
s), X
MP
P (
Jab
be
r, G
oo
gle
Ta
lk),
AJA
X (
cust
om
XM
L
me
ssa
ge
s u
sed
fo
r in
tera
ctiv
e w
eb
site
s: F
ace
bo
ok,
Gm
ail,
Tw
ee
ter,
…),
RS
S f
ee
ds (
use
d fo
r b
log
s/n
ew
s s
ite
s),
…
-C
ust
om
da
ta f
orm
at:
Bio
-in
form
atic
s, L
ing
uis
tics,
Co
nfig
ura
tion
file
s, G
eo
gra
ph
ic d
ata
(O
pe
nS
tre
etM
ap
, Go
og
le m
ap
s),
Bib
liog
rap
hic
Re
sou
rce
s (M
ed
Lin
e, A
DC
)
64
Reg
ula
r T
ree l
an
gu
ag
es
(R
EG
T).
Many
chara
cteriza
tions:
R
eg. Tre
e G
ram
mars
Tre
e A
uto
mata
MS
O L
ogic
Nic
e p
ropert
ies:
Clo
sed u
nder
inte
rsection (u
nio
n, com
ple
ment)
Decid
able
equiv
ale
nce
CF
RE
GT
(as
bra
cke
ted
str
ing
s)
anb
nc* a
* bncn
(3)
Th
eo
retic
al /
Ma
the
ma
tica
l Co
nte
xt-F
ree
:
E ::=E ‘+’ E
E ::= E ‘x’ E
E ::= [1-9][0-9]*
65
2.
We
ll-F
orm
ed
XM
L
http://www.w3.org/TR/REC-xml/
Fro
m th
e W
3C
XM
L re
com
me
nd
atio
n
“A t
extu
al o
bje
ct
is w
ell-f
orm
ed
XM
Lif,
(1)
take
n a
s a
wh
ole
, it m
atc
he
s t
he
pro
du
cti
on
la
be
led
do
cu
me
nt
(2)
it m
ee
ts a
ll th
e w
ell-f
orm
ed
ne
ss
co
ns
tra
ints
giv
en
in
th
is s
pe
cific
atio
n ..”
http://www.w3.org/TR/REC-xml/
do
cu
me
nt=
sta
rt s
ymb
ol o
f a
co
nte
xt-f
ree
gra
mm
ar
(“
XM
L g
ram
ma
r”)
à(1
) co
nta
ins
the
co
nte
x-f
ree
pro
pe
rtie
so
f w
ell-
form
ed
XM
Là
(2)
con
tain
s th
e
co
nte
xt-
de
pe
nd
en
t p
rop
ert
ies
of
we
ll-fo
rme
d X
ML
Th
ere
are
10
WF
Cs
(we
ll-fo
rme
dn
ess
co
nst
rain
ts).
E.g
.: E
lem
en
t Ty
pe
Ma
tch
“Th
e N
am
e in
an
ele
me
nt’s e
nd
ta
g m
ust
ma
tch
the
ele
me
nt n
am
e in
th
e s
tart
ta
g.”
66
2.
We
ll-F
orm
ed
XM
L
http://www.w3.org/TR/REC-xml/
Fro
m th
e W
3C
XM
L re
com
me
nd
atio
n
“A t
extu
al o
bje
ct
is w
ell-f
orm
ed
XM
Lif,
(1)
take
n a
s a
wh
ole
, it m
atc
he
s t
he
pro
du
cti
on
la
be
led
do
cu
me
nt
(2)
it m
ee
ts a
ll th
e w
ell-f
orm
ed
ne
ss
co
ns
tra
ints
giv
en
in
th
is s
pe
cific
atio
n ..”
http://www.w3.org/TR/REC-xml/
do
cu
me
nt=
sta
rt s
ymb
ol o
f a
co
nte
xt-f
ree
gra
mm
ar
(“
XM
L g
ram
ma
r”)
à(1
) co
nta
ins
the
co
nte
x-f
ree
pro
pe
rtie
so
f w
ell-
form
ed
XM
Là
(2)
con
tain
s th
e
co
nte
xt-
de
pe
nd
en
t p
rop
ert
ies
of
we
ll-fo
rme
d X
ML
Th
ere
are
10
WF
Cs
(we
ll-fo
rme
dn
ess
co
nst
rain
ts).
E.g
.: E
lem
en
t Ty
pe
Ma
tch
“Th
e N
am
e in
an
ele
me
nt’s e
nd
ta
g m
ust
ma
tch
the
ele
me
nt n
am
e in
th
e s
tart
ta
g.”
àW
hy
is t
his
no
tco
nte
xt-f
ree
?
67
2.
We
ll-F
orm
ed
XM
LC
on
text
-fre
e g
ram
ma
r in
EB
NF
= S
yste
m o
f p
rod
uct
ion
ru
les
o
f th
e fo
rm
lhs ::= rhs
lhs
a n
on
term
ina
l sym
bo
l (e
.g,
do
cu
me
nt)
rhs
a s
trin
g o
ver
no
nte
rmin
al a
nd
te
rmin
al s
ymb
ols
.
Ad
diti
on
ally
(E
BN
F),
we
ma
y u
se r
eg
ula
r e
xpre
ssio
ns
in rhs
. S
uch
as:
r* denoting , r, rr, rrr, …
zero
or
mo
re r
ep
ititio
ns
r+ denoting rr*
on
e o
r m
ore
re
piti
tion
sr? denoting r |
op
tion
al r
[abc] denoting a | b | c
cha
ract
er
cla
ss[a-z] denoting a | b | … | z
cha
ract
er
cla
ss
lhs ::= rhs
68
XM
L G
ram
ma
r -
EB
NF
-sty
le[1]
document
::=
prologelementMisc*
[2]
Char
::=
a Unicode character
[3]
S::=
(‘ ’ | ‘\t’ | ‘\n’ | ‘\r’)+
[4] NameChar ::= (Letter | Digit | ‘.’ | ‘-’ | ‘:’
[5]
Name
::=
(Letter| '_' | ':') (NameChar)*
[22]
prolog
::=
XMLDecl? Misc* (doctypedeclMisc*)?
[23]
XMLDecl
::=
'<?xml' VersionInfoEncodingDecl? SDDecl? S? '?>‘
[24]VersionInfo
::=
S'version'Eq("'"VersionNum"'"|'"'VersionNum'"')
[25]
Eq
::=
S? '=' S?
[26]VersionNum
::=
'1.0‘
[39]
element
::=
EmptyElemTag
| STagcontentEtag
[40]
STag
::=
'<' Name(SAttribute)* S? '>'
[41]Attribute
::=
NameEqAttValue
[42]
ETag
::=
'</' NameS? '>‘
[43]
content
::=
(element| Reference| CharData?)*
[44]EmptyElemTag::=
'<' Name(SAttribute)* S? '/>‘
[67]Reference
::=
EntityRef| CharRef
[68]EntityRef
::=
'&' Name';‘
[84] Letter ::= [a-zA-Z]
[88] Digit ::= [0-9]
69
XM
L G
ram
ma
r -
EB
NF
-sty
le
As
usu
al,
the
XM
L g
ram
ma
r ca
n b
e s
yste
ma
tica
lly tr
an
sfo
rme
d in
to
a p
rog
ram
, an
XM
L p
ars
er,
to b
e u
sed
to c
he
ck th
e s
ynta
x o
f XM
L in
pu
t
Pa
rsin
g X
ML
1.
Sta
rtin
g w
ith t
he
sym
bo
l document
, th
e p
ars
er
use
s th
e lhs::=rhs
rule
s to
exp
an
d s
ymb
ols
, co
nst
ruct
ing
a p
ars
e t
ree
.
2.
Le
ave
s o
f th
e p
ars
e tr
ee
are
ch
ara
cte
rs w
hic
h h
ave
no
fu
rth
er
exp
an
sio
n
3.
Th
e X
ML
inp
ut i
s p
ars
ed
su
cce
ssfu
lly if
it p
erf
ect
ly m
atc
he
s th
e
pa
rse
tre
e’s
fro
nt
(co
nca
ten
ate
th
e p
ars
e tre
e’s
le
ave
s fro
m le
ft-t
o-r
igh
t,w
hile
re
mo
vin
g
sym
bo
ls).
70
Ex
am
ple
1
Pa
rse
tre
e f
or
XM
L in
pu
t
<bubble speaker=“phb”>Um... No.</bubble>
71
Ex
am
ple
2
Pa
rse
tre
e f
or
XM
L in
pu
t
<?xml version=“1.0”?><foo/>
72
2.
We
ll-F
orm
ed
XM
L
Te
rmin
olo
gy
•ta
g n
am
es
name,email,author
, …
•s
tart
ta
g<name>
,
en
d t
ag</name>
•e
lem
en
ts<name> … </name>
, <author age=“99”> … </author>
•e
lem
en
ts m
ay
be
ne
ste
d
•e
mp
ty e
lem
en
t<red></red>
is a
bb
revi
ate
d a
s <red/>
•a
n X
ML
do
cum
en
t: s
ing
le r
oo
t e
lem
en
t
<someTag> …. </someTag>
•w
ell-f
orm
ed
co
ns
tra
ints
àb
eg
in/e
nd
tag
s m
atc
hà
no
att
rib
ute
na
me
ma
y a
pp
ea
r m
ore
th
an
on
cein
a s
tart
ta
g o
r e
mp
ty e
lem
en
t ta
gà
a p
ars
ed
en
tity
mu
st n
ot
con
tain
a r
ecu
rsiv
ere
fere
nce
to it
self,
eith
er
dire
ctly
or
ind
ire
ctly
co
nte
xt-
de
pe
nd
en
t
pro
pe
rtie
s
73
We
ll-fo
rme
dX
ML
(fra
gm
en
ts)
<Staff>
<Name>
<FirstName> Sebastian </FirstName>
<LastName> Maneth </LastName>
</Name>
<Login> smaneth </Login>
<Ext> 2481 </Ext>
</Staff>
a Staff
ele
me
nt
<Name>
<FirstName> Sebastian </FirstName>
<LastName> Maneth </LastName>
</Name>
a Name
ele
me
nt
No
n-w
ell-
form
ed
XM
L
<foo> oops </bar>
<foo> oops </Foo>
<foo> oops .. <EOT>
<a><b><c></a></b></c>
74
We
ll-fo
rme
dX
ML
(fra
gm
en
ts)
<Staff>
<Name>
<FirstName> Sebastian </FirstName>
<LastName> Maneth </LastName>
</Name>
<Login> smaneth </Login>
<Ext> 2481 </Ext>
</Staff>
a Staff
ele
me
nt
<Name>
<FirstName> Sebastian </FirstName>
<LastName> Maneth </LastName>
</Name>
a Name
ele
me
nt
No
n-w
ell-
form
ed
XM
L
<foo> oops </bar>
<foo> oops </Foo>
<foo> oops .. <EOT>
<a><b><c></a></b></c>
Qu
es
tio
ns
Ho
wca
n y
ou
imp
lem
en
tth
e th
ree
we
ll-fo
rme
d c
on
stra
ints
?
Wh
en
, d
urin
g p
ars
ing
, d
o y
ou
ap
ply
the
ch
eck
s?
75
Ch
ara
cte
r E
nco
din
g
•F
or
a c
om
pu
ter, a
ch
ara
cte
r lik
e X
is n
oth
ing
bu
t a
n 8
(1
6/3
2)
bit
nu
mb
er
wh
ose
va
lue
is
inte
rpre
ted
as
the
ch
ara
cte
r X
, w
he
n n
ee
de
d.
•P
rob
lem
: m
an
y su
ch
nu
mb
er à
cha
ract
er
ma
pp
ing
s, th
e s
o c
alle
de
nc
od
ing
sa
re in
use
tod
ay.
•D
ue
to
the
hu
ge
am
ou
nt o
f ch
ara
cte
rs n
ee
de
d b
y th
e g
lob
al c
om
pu
ting
co
mm
un
ity to
da
y (
La
tin
, H
eb
rew
, A
rab
ic, G
ree
k,
Ja
pa
ne
se
, C
hin
ese
…),
co
nfl
icti
ng
in
ters
ec
tio
ns
be
twe
en
en
cod
ing
s a
re c
om
mo
n.
Ex
am
ple
0xcb 0xe4 0xd3
iso
-88
59
-7Æ
0xcb 0xe4 0xd3
iso
-88
59
-15
Ë
ä Ó
76
Un
ico
de
•T
he
Un
ico
de
http://www.unicode.org
Initi
ativ
e a
ims
to d
efin
e a
ne
w
en
cod
ing
tha
t tr
ies
to e
mb
race
all
cha
ract
er
ne
ed
s.
•T
he
Un
ico
de
en
co
din
g c
on
tain
s c
ha
racte
rs o
f “a
ll” la
ng
ua
ge
s o
f th
e w
orl
d
plu
s s
cie
ntific,
ma
the
ma
tica
l, te
ch
nic
al, b
ox d
raw
ing
, …
sym
bo
ls
•R
an
ge
of
the
Un
ico
de
en
cod
ing
: 0x0000-0x10FFFF (=16*65536)
àC
od
es
tha
t fit
into
th
e fi
rst
16
bits
(d
en
ote
d U+0000-U+FFFF
)e
nco
de
the
mo
st w
ide
ly u
sed
lan
gu
ag
es
an
d t
he
ir c
ha
ract
ers
(Ba
sic
Mu
ltilin
gu
al
Pla
ne
, B
MP
)
àC
od
es
U+
00
00
-U+
00
7F
ha
ve b
ee
n a
ssig
ne
d to
ma
tch
the
7-b
it A
SC
IIe
nco
din
g w
hic
h is
pe
rva
sive
tod
ay.
77
Un
ico
de
Tra
nsf
orm
atio
n F
orm
ats
Cu
rre
nt C
PU
s o
pe
rate
mo
st e
ffici
en
tly o
n 3
2-b
it w
ord
s(1
6-b
it w
ord
s, b
yte
s)
Un
ico
de
thu
s d
eve
lop
ed
Un
ico
de
Tra
nsf
orm
atio
n F
orm
ats
(U
TF
s) w
hic
h d
efin
eh
ow
a U
nic
od
e c
ha
ract
er
cod
e b
etw
ee
n U+0000
an
d U+10FFFF
is t
o b
em
ap
pe
d in
to a
32
-bit
wo
rd (
16-b
it w
ord
, b
yte
).
UT
F-3
2
àS
imp
ly m
ap
exa
ctly
to
the
co
rre
spo
nd
ing
32
-bit
valu
eà
Fo
r e
ach
Un
ico
de
ch
ara
cte
r in
UT
F-3
2:
w
ast
e o
f a
t le
ast
11
bits
!
78
Un
ico
de
Tra
nsf
orm
atio
n F
orm
ats
Cu
rre
nt C
PU
s o
pe
rate
mo
st e
ffici
en
tly o
n 3
2-b
it w
ord
s(1
6-b
it w
ord
s, b
yte
s)
Un
ico
de
thu
s d
eve
lop
ed
Un
ico
de
Tra
nsf
orm
atio
n F
orm
ats
(U
TF
s) w
hic
h d
efin
eh
ow
a U
nic
od
e c
ha
ract
er
cod
e b
etw
ee
n U+0000
an
d U+10FFFF
is t
o b
em
ap
pe
d in
to a
32
-bit
wo
rd (
16-b
it w
ord
, b
yte
).
UT
F-3
2
àS
imp
ly m
ap
exa
ctly
to
the
co
rre
spo
nd
ing
32
-bit
valu
eà
Fo
r e
ach
Un
ico
de
ch
ara
cte
r in
UT
F-3
2:
w
ast
e o
f a
t le
ast
11
bits
!
UT
F-1
6
Ma
p a
Un
ico
de
ch
ara
cte
r in
to o
ne
or
two
16
-bit
wo
rds
àU
+0
00
0 t
o
U+
FF
FF
m
ap
exa
ctly
to t
he
co
rre
spo
nd
ing
16
-bit
valu
eà
ab
ove
U+
FF
FF
: su
btr
act
0x0
10
00
0 a
nd
the
n f
ill th
e
□‘s
in
1101 10□□
□□□□□□□□
1101 11□□
□□□□□□□□
E.g
. U
nic
od
e c
ha
ract
er U+012345
(0x012345 -
0x010000 = 0x02345
)
UT
F-1
6: 1101 1000 0000 1000
1101 1111 0100 0101
79
Un
ico
de
Tra
nsf
orm
atio
n F
orm
ats
•UT
F-1
6 w
ork
s co
rre
ctly
, b
eca
use
the
ch
ara
cte
r co
de
s b
etw
ee
n
1101 10□□□□□□□□□□and
1101 11□□□□□□□□□□
(with
ea
ch □
rep
lace
d b
y a
0)
are
left
un
ass
ign
ed
in U
nic
od
e!!!
(ra
ng
e 0
xD8
00
–0
xDF
FF
is r
ese
rve
d)
80
Un
ico
de
Tra
nsf
orm
atio
n F
orm
ats
•UT
F-1
6 w
ork
s co
rre
ctly
, b
eca
use
the
ch
ara
cte
r co
de
s b
etw
ee
n
1101 10□□□□□□□□□□and
1101 11□□□□□□□□□□
(with
ea
ch □
rep
lace
d b
y a
0)
are
left
un
ass
ign
ed
in U
nic
od
e!!!
(ra
ng
e 0
xD8
00
–0
xDF
FF
is r
ese
rve
d)
•UT
F-1
6is
th
e o
nly
en
cod
ing
in J
ava
: a
char
is 1
6 b
its
larg
e, n
ot
8 a
s in
C/C
++
WA
RN
ING
: a
“sin
gle
” ch
ara
cte
r w
hic
h u
se
s tw
o 1
6 b
its f
ield
s is m
ad
e u
p o
f 2
ch
ars
!
Exa
mp
le, th
e s
trin
g “
”
ta
ke
s t
wo
16
bit f
ield
s:
String s = “\uD834\uDD1E”;
s.length(); // returns 2
s.charAt(1); //returns \uDD1E, it’s an INVALID character.
s.codePointAt(1); //returns the integer 0xD834DD1E
g “
81
UT
F-8
Ma
ps
a u
nic
od
e c
ha
ract
er
into
1
, 2
, 3
, o
r 4
b
yte
s.
Un
ico
de
ra
ng
e
B
yte
se
qu
en
ce
U+000000 à
U+00007F 0□□□□□□□
U+000080 à
U+0007FF 110□□□□□10□□□□□□
U+000800 à
U+00FFFF 1110□□□□10□□□□□□10□□□□□□
U+010000 à
U+10FFFF 11110□□□10□□□□□□10□□□□□□10□□□□□□
Sp
are
bits
(□
) a
re f
ille
d fr
om
rig
ht
to le
ft. P
ad
to t
he
left
with
0-b
its.
E.g
. U+00A9
in U
TF
-8is
11000010 10101001
U+2260
in U
TF
-8is
11100010 10001001 10100000
82
UT
F-8
àF
or
a U
TF
-8 m
ulti
-byt
e s
eq
ue
nce
, th
e le
ng
th o
f th
e s
eq
ue
nce
is
eq
ua
l to
the
nu
mb
er
of
lea
din
g 1
-bits
(in
th
e f
irst
byt
e)
àC
ha
ract
er
bo
un
da
rie
s a
re s
imp
le to
de
tect
àU
TF
-8 e
nco
din
g d
oe
s n
ot a
ffect
(b
ina
ry)
sort
ord
er
àTe
xt p
roce
ssin
g s
oft
wa
re d
esi
gn
ed
to d
ea
l with
7-b
it A
SC
IIre
ma
ins
fun
ctio
na
l.
(esp
eci
ally
tru
e f
or
the
C p
rog
ram
min
g la
ng
ua
ge
an
d it
s st
rin
g (char[]
)re
pre
sen
tatio
n)
83
XM
L a
nd
Un
ico
de
àA
co
nfo
rmin
g X
ML
pa
rse
r is
re
qu
ire
dto
co
rre
ctly
pro
cess
UT
F-8
an
dU
TF
-16
en
cod
ed
do
cum
en
ts. (
Th
e W
3C
XM
L R
eco
mm
en
da
tion
p
red
ate
s th
e U
TF
-32
de
finiti
on
)
àD
ocu
me
nts
tha
t u
se a
diff
ere
nt e
nco
din
g m
ust
an
no
un
ce s
o u
sin
g th
eX
ML
text
de
cla
ratio
n, e
.g.,
<?xml encoding=“iso-8859-15”?>
or
<?xml encoding=“utf-32”?>
àO
the
rwis
e, a
n X
ML
pa
rse
r is
en
cou
rag
ed
to
gu
es
s th
e e
nco
din
g w
hile
rea
din
g th
e v
ery
firs
t b
yte
s o
f th
e in
pu
t X
ML
do
cum
en
t:
He
ad
of
do
c (
by
tes
)
E
nc
od
ing
gu
es
s
0x00 0x3C 0x00 0x3F
UT
F-1
6
(little
En
dia
n)
0x3C 0x00 0x3F 0x00
UT
F-1
6
(b
ig E
nd
ian
)0x3c 0x3F 0x78 0x6D
UT
F-8
(o
r A
SC
II)
No
tice
: <
= U
+0
03
C,
? =
U+
00
3F,
x =
U+
00
78
, m =
U+
00
6D
84
XM
L a
nd
Un
ico
de
àW
ha
t d
oe
s “
gu
ess t
he
en
co
din
g”
me
an
? U
nd
er
wh
ich
circu
msta
nce
s
do
es
the
pa
rse
r k
no
wit
ha
s d
ete
rmin
ed
the
co
rre
ct e
nco
din
g?
Are
the
re c
ase
s w
he
n it
ca
nN
OT
de
term
ine
the
co
rre
ct e
nco
din
g?
àW
ha
t a
bo
ut e
ffici
en
cy o
f th
e U
TF
s? F
or
diff
ere
nt
text
s, c
om
pa
re th
e
spa
ce r
eq
uir
em
en
t in
UT
F-8
/16
an
d U
TF
-32
ag
ain
st e
ach
oth
er.
Wh
ich
ch
ara
cte
rs d
o y
ou
fin
d a
bo
ve 0
xFF
FF
in U
nic
od
e?
Ca
n y
ou
ima
gin
e a
sce
na
rio
wh
ere
UT
F-3
2 is
fa
ste
rth
an
UT
F-8
/16
?
Qu
es
tio
ns
85
Th
e X
ML
Pro
cess
ing
Mo
de
l
àO
n th
e p
hy
sic
alsi
de
, X
ML
de
fine
s n
oth
ing
bu
t a
fla
t te
xt
form
at,
i.e.,
it
de
fine
s a
se
t o
f (e
.g. U
TF
-8/1
6)
cha
ract
er
seq
ue
nce
s b
ein
g
we
ll-fo
rme
d X
ML
.
àA
pp
lica
tion
s th
at
wa
nt
to a
na
lyze
an
d tr
an
sfo
rm X
ML
da
ta in
an
ym
ea
nin
gfu
l wa
y w
ill fi
nd
pro
cess
ing
fla
t ch
ara
cte
r se
qu
en
ces
ha
rda
nd
ine
ffici
en
t!
àT
he
ne
stin
g o
f XM
L e
lem
en
ts a
nd
att
rib
ute
s, h
ow
eve
r, d
efin
es
a lo
gic
al
tre
e-l
ike
str
uc
ture
.
Sta
ff
Nam
e
First
Nam
eLas
tNam
e
Log
inE
xtensi
on
86
Th
e X
ML
Pro
cess
ing
Mo
de
l
àV
irtu
ally
all
XM
L a
pp
lic
ati
on
s o
pe
rate
on
th
e lo
gic
al
tre
e v
iew
wh
ich
is p
rovi
de
d to
th
em
by
an
XM
L p
roce
sso
r (i.e
., “
pa
rse
& s
tore
”).
àX
ML p
roce
sso
rs a
re w
ide
ly a
va
ilab
le (
e.g
., A
pa
ch
e’s
Xe
rce
s).
Ho
w is
th
e X
ML
pro
cess
or
sup
po
sed
to co
mm
un
ica
te th
e
XM
L tr
ee
str
uct
ure
to
th
e a
pp
lica
tion
?
87
Th
e X
ML
Pro
cess
ing
Mo
de
l
àV
irtu
ally
all
XM
L a
pp
lic
ati
on
s o
pe
rate
on
th
e lo
gic
al
tre
e v
iew
wh
ich
is p
rovi
de
d to
th
em
by
an
XM
L p
roce
sso
r (i.e
., “
pa
rse
& s
tore
”).
àX
ML p
roce
sso
rs a
re w
ide
ly a
va
ilab
le (
e.g
., A
pa
ch
e’s
Xe
rce
s).
Ho
w is
th
e X
ML
pro
cess
or
sup
po
sed
to co
mm
un
ica
te th
e
XM
L tr
ee
str
uct
ure
to
th
e a
pp
lica
tion
?
àF
or
ma
ny P
L’s
th
ere
are
“d
ata
bin
din
g”
too
ls.
Giv
es
very
fle
xib
le w
ay
to g
et
PL
vie
w o
f th
e X
ML
tre
e s
tru
ctu
re.
88
Th
e X
ML
Pro
cess
ing
Mo
de
l
àV
irtu
ally
all
XM
L a
pp
lic
ati
on
s o
pe
rate
on
th
e lo
gic
al
tre
e v
iew
wh
ich
is p
rovi
de
d to
th
em
by
an
XM
L p
roce
sso
r (i.e
., “
pa
rse
& s
tore
”).
àX
ML p
roce
sso
rs a
re w
ide
ly a
va
ilab
le (
e.g
., A
pa
ch
e’s
Xe
rce
s).
Ho
w is
th
e X
ML
pro
cess
or
sup
po
sed
to co
mm
un
ica
te th
e
XM
L tr
ee
str
uct
ure
to
th
e a
pp
lica
tion
?
àF
or
ma
ny P
L’s
th
ere
are
“d
ata
bin
din
g”
too
ls.
Giv
es
very
fle
xib
le w
ay
to g
et
PL
vie
w o
f th
e X
ML
tre
e s
tru
ctu
re.
Bu
t firs
t, le
t’s s
ee
wh
at
the
sta
nd
ard
sa
ys…
89
Th
e X
ML
Pro
cess
ing
Mo
de
l
àV
irtu
ally
all
XM
L a
pp
lic
ati
on
s o
pe
rate
on
th
e lo
gic
al
tre
e v
iew
wh
ich
is p
rovi
de
d to
th
em
by
an
XM
L p
roce
sso
r(i.e
., “
pa
rse
& s
tore
”).
àX
ML p
roce
sso
rs a
re w
ide
ly a
va
ilab
le (
e.g
., A
pa
ch
e’s
Xe
rce
s).
Ho
w is
th
e X
ML
pro
cess
or
sup
po
sed
to co
mm
un
ica
te th
e
XM
L tr
ee
str
uct
ure
to
th
e a
pp
lica
tion
?
àth
rou
gh
a fi
xed
inte
rfa
ce o
f acc
ess
or
fun
ctio
ns:
Th
e X
ML
Info
rma
tion
Se
t
Th
e a
cce
sso
r fu
nct
ion
s o
pe
rate
on
diff
ere
nt
typ
es
of n
od
e o
bje
cts:
NO
DE
DO
CE
LE
MA
TT
RC
HA
R
90
Acce
sso
r F
un
ctio
ns (“
no
de
pro
pe
rtie
s”)
Node type Property Comment
DOC childern: DOCàELEM root element
base-uri: DOCàSTRING
version : DOCàSTRING
ELEM localname : ELEMàSTRING
children : ELEMà[NODE] [..]=sequence
attributes: ELEMà[ATTR] type
parent : ELEMàNODE
ATTR localname : ATTRàSTRING
value : ATTRàSTRING
owner : ATTRàELEM
CHAR code : CHARàUNICODE a single character
parent : CHARàELEM
XM
L In
form
atio
n S
et
-http://www.w3.org/TR/xml-infoset
91
Info
rma
tion
se
t o
f a
sa
mp
le d
ocu
me
nt
<?xml version=“1.0”?>
<forecast date=“Thu, May 16”>
<condition>sunny</condition>
<temperature unit=“Celsius”>23</temperature>
</forecast>
children(d) = e1 code(c1) = U+0073 (=‘s’)
base-uri(d) = “file:/…” parent(c1)= e2
version(d) = “1.0” . . .
code(c5) = U+0079 (=‘y’)
localname(e1) = “forecast” parent(c5)= e2
children(e1) = [e2,e3]
attributes(e1)= [a1] localname(e3) =“temperature”
parent(e1) = d children(e3) =[c6,c7]
attributes(e3)=[a2]
localname(a1) = “date” parent(e3) =a1
value(a1) = “Thu, May 16” . . .
owner(a1) = e1
localname(e2) = “condition”
children(e2) = [c1,c2,c3,c4,c5]
attributes(e2)= []
parent(e2) = e1
92
Qu
es
tio
ns
(1)
A N
OD
E ty
pe
ca
n b
e o
ne
of
DO
C,
EL
EM
, AT
TR
, o
r C
HA
R.
In th
e t
wo
pla
ces
of
the
pro
pe
rty
fun
ctio
ns
wh
ere
NO
DE
ap
pe
ars
, w
hic
h o
f th
e f
ou
r ty
pe
s m
ay
actu
ally
ap
pe
ar
the
re?
Fo
r in
sta
nce
, is
this
allo
we
d?
localname(e1) = “condition”
children(e1) = [c1,e2,c2]
(2)
Wh
at
ab
ou
t WH
ITE
SP
AC
E?
Wh
ere
in a
n X
ML
do
cum
en
t do
es
it m
att
er,
an
d w
he
re n
ot?
Wh
ere
in th
e I
nfo
set
Exa
mp
le (
pre
vio
us
slid
e)
are
the
re
turn
s a
nd
ind
en
tatio
ns
of
the
do
cum
en
t?(d
id w
e d
o a
mis
take
? If
so
, w
ha
t is
the
co
rre
ct In
fose
t?)
93
Qu
ery
ing
th
e In
fos
et
Usi
ng
the
In
fose
t, w
e c
an
an
aly
ze a
giv
en
XM
L d
ocu
me
nt i
n m
an
y w
ays
.F
or
inst
an
ce:
àF
ind
all
EL
EM
no
de
s w
ith lo
caln
am
e=bubble
, o
wn
ing
an
AT
TR
no
de
with
lo
caln
am
e=speaker
an
d v
alu
e=Dilbert
.
àL
ist
all panel
EL
EM
no
de
s co
nta
inin
g a
bu
bb
le s
po
ken
by “Dogbert”
àS
tart
ing
in p
an
el 2
(A
TT
R no
), f
ind
all
bu
bb
les
follo
win
g th
ose
sp
oke
n b
y “
Alic
e”
Su
ch q
ue
rie
s a
pp
ea
r ve
ry o
fte
n a
nd
ca
n c
on
ven
ien
tly b
e d
esc
rib
ed
usi
ng
XP
ath
qu
eri
es:
à//bubble/[@speaker=“Dilbert”]
à//panel[.//bubble[@speaker=“Dogbert”]]
à//panel[@no=“2”]//bubble[@speaker=“Alice”]/following::bubble
94
3.
Pa
rse
rs f
or
XM
L
Two
diff
ere
nt
ap
pro
ach
es:
(1)
Pa
rse
r st
ore
s d
ocu
me
nt i
nto
a f
ixe
d (
sta
nd
ard
) d
ata
str
uct
ure
(e
.g.,
an
In
fose
t co
mp
lian
t da
ta s
tru
ctu
re,
such
as
DO
M)
parser.parse(“foo.xml”);
doc = parser.getDocument ();
(2)
Pa
rse
r tr
igg
ers
“e
ve
nts
”. D
oe
s n
ot sto
re!
Use
r h
as
to w
rite
ow
n c
od
e o
n h
ow
to s
tore
/ p
roce
ss t
he
eve
nts
trig
ge
red
by
the
pa
rse
r.
DO
M =
Do
cum
en
t Ob
ject
Mo
de
l
àW
3C
sta
nd
ard
, se
e http://www.w3.org/TR/REC-DOM-Level-1/
parser.parse(“foo.xml”);
doc = parser.getDocument ();
95
DO
M –
Do
cum
en
t O
bje
ct M
od
el
àL
an
gu
ag
e a
nd
pla
tfo
rm-in
de
pe
nd
en
t vie
w o
f X
ML
àD
OM
AP
Is e
xist
fo
r m
an
y P
Ls
(Ja
va,
C+
+, C
, P
erl,
Pyth
on
, …
)
DO
M r
elie
s o
n t
wo
ma
in c
on
cep
ts
(1)
Th
e X
ML
pro
cess
or
con
stru
cts
the
c
om
ple
te X
ML
do
cu
me
nt
tre
e(in
-me
mo
ry)
(2)
Th
e X
ML
ap
plic
atio
n is
sue
s D
OM
lib
rary
ca
lls to
ex
plo
rea
nd
m
an
ipu
late
the
XM
L tr
ee
, or
to g
en
era
ten
ew
XM
L tr
ee
s.
Ad
van
tag
es
•
ea
sy t
o u
se•
on
ce in
me
mo
ry,
no
tric
ky is
sue
s w
ith X
ML
syn
tax
an
ymo
re•
all
DO
M tr
ee
s se
ria
lize
to w
ell-
form
ed
XM
L (e
ven
aft
er
arb
itra
ry u
pd
ate
s)!
96
DO
M –
Do
cum
en
t O
bje
ct M
od
el
àL
an
gu
ag
e a
nd
pla
tfo
rm-in
de
pe
nd
en
t vie
w o
f X
ML
àD
OM
AP
Is e
xist
fo
r m
an
y P
Ls
(Ja
va,
C+
+, C
, P
erl,
Pyth
on
, …
)
DO
M r
elie
s o
n t
wo
ma
in c
on
cep
ts
(1)
Th
e X
ML
pro
cess
or
con
stru
cts
the
c
om
ple
te X
ML
do
cu
me
nt
tre
e(in
-me
mo
ry)
(2)
Th
e X
ML
ap
plic
atio
n is
sue
s D
OM
lib
rary
ca
lls to
ex
plo
rea
nd
m
an
ipu
late
the
XM
L tr
ee
, or
to g
en
era
ten
ew
XM
L tr
ee
s.
Dis
ad
van
tag
e
Use
s L
OT
S o
f m
em
ory
!!
Ad
van
tag
es
•
ea
sy t
o u
se•
on
ce in
me
mo
ry,
no
tric
ky is
sue
s w
ith X
ML
syn
tax
arise
an
ymo
re•
all
DO
M tr
ee
s se
ria
lize
to w
ell-
fro
me
d X
ML
(eve
n a
fte
r a
rbitr
ary
up
da
tes)
!
97
DO
M L
eve
l 1 (
Co
re)
No
de
Pro
ce
ssin
gIn
str
uctio
nC
ha
racte
rDa
taA
ttr
Ele
me
nt
Do
cu
me
nt
Text
Co
mm
en
t
CD
ATA
sect
ion
Na
me
No
de
Ma
pN
od
eL
ist
Ch
ara
cte
r st
rin
gs
(DO
M t
ype
DO
MS
trin
g)
are
de
fine
d to
be
en
cod
ed
usi
ng
UT
F-1
6 (
e.g
., J
ava
DO
M r
ere
sen
ts ty
pe
DO
MS
trin
gu
sin
gits
String
typ
e).
98
DO
M L
eve
l 1 (
Co
re)
So
me
me
tho
ds
DOM type Method Comment
Node nodeName
nodeValue
parentNode : Node
firstChild : Node leftmost child
nextSibling : Node returns NULL for root elem
or last child or attributes
childNodes : NodeList
attributes : NamedNodeMap
ownerDocument: Document
replaceChild : Node
Document createElement : Element creates element with
given tag name
createComment : Comment
getElementsByTagName: NodeList list of all Elem nodes
in document order
: DOMString redefined in subclasses
99
DO
M L
eve
l 1 (
Co
re)
Na
me
,Va
lue
, an
d a
ttrib
ute
sd
ep
en
d o
n t
he
ty
pe
o
f th
e c
urr
en
t no
de
.
10
0
DO
M L
eve
l 1 (
Co
re)
So
me
de
tails
Cre
atin
g a
n element/attribute u
sin
g createElement/createAttribute
do
es
no
tw
ire
the
ne
w n
od
e w
ith th
e X
ML
tre
e s
tru
ctu
re y
et.
àC
all
insertBefore, replaceChild
, …
,
to w
ire
a n
od
e a
t a
n e
xp
licity p
ositio
n
DO
M ty
pe
NodeList
ma
kes
up
fo
the
lack
of
colle
ctio
n d
ata
typ
es
in m
ost
pro
gra
mm
ing
lan
gu
ag
es
DO
M ty
pe
NamedNodeMap
rep
rese
nts
an
association table
(no
de
s m
ay
be
a
cce
sse
d b
y n
am
e)
Exa
mp
le:
v0
a1
a2
bubble
speaker
togetAttributes
“speaker” à
a1
“to” à
a2
A N
am
ed
No
de
Ma
p
Me
tho
ds:
getNamedItem
, setNamedItem
,…
bubble.
getAttributes().
getNamedItem(“ … “)
10
1
E.g
. F
ind
all
occ
urr
en
ces
of Dogbert
spe
aki
ng
(a
ttrib
ute
speaker
of e
lem
en
t bubble
)
10
2
Qu
es
tio
ns
Giv
en
an
XM
L fil
e o
f, s
ay,
50
K, h
ow
larg
e is
its D
OM
re
pre
sen
tatio
n in
ma
in m
em
ory
?
Ho
w m
uch
larg
er, in
the
worst case
, is
a D
OM
re
pre
sen
tatio
n
with
re
spe
ct to
th
e s
ize
of t
he
XM
L d
ocu
me
nt?
(d
iffic
ult!
)
Ho
w c
ou
ld w
e d
ecr
ea
se th
e m
em
ory
ne
ed
of
DO
M,
wh
ilep
rese
rvin
g it
s fu
nct
ion
alit
y?
10
3
Ne
xt
“Eve
n trig
ge
r” P
ars
ers
fo
r X
ML
:
àB
uild
yo
ur
ow
n X
ML
da
ta s
tru
ctu
rea
nd
fill
it
up
as t
he
pa
rse
r tr
igg
ers
in
pu
t “e
ve
nts
”.
EN
DL
ect
ure
1