processing web data with xml and xslt - part 2 · xml vs html vs xhtml processing web data with xml...

21
Processing Web Data With XML And XSLT - Part 2 Open4Tech Summer School, 2020 © 2020 Syncro Soft SRL. All rights reserved. Bogdan Dumitru, Syncro Soft [email protected]

Upload: others

Post on 10-Aug-2020

29 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Processing Web Data With XML And XSLT - Part 2 · XML vs HTML vs XHTML Processing Web Data With XML And XSLT XML – standard (specification) for describing structure and content

Processing Web Data With XML And XSLT - Part 2

Open4Tech Summer School, 2020

© 2020 Syncro Soft SRL. All rights reserved.

Bogdan Dumitru, Syncro [email protected]

Page 2: Processing Web Data With XML And XSLT - Part 2 · XML vs HTML vs XHTML Processing Web Data With XML And XSLT XML – standard (specification) for describing structure and content

Note:

• GitHub repository:https://github.com/dumitrubogdanmihai/processing-web-data-with-xml-and-xslt

• Questions

Processing Web Data With XML And XSLT

Page 3: Processing Web Data With XML And XSLT - Part 2 · XML vs HTML vs XHTML Processing Web Data With XML And XSLT XML – standard (specification) for describing structure and content

Let's recap

Processing Web Data With XML And XSLT

In the first part we:

• saw how browsers render web pages• talked about Selenium• created a web crawler

Page 4: Processing Web Data With XML And XSLT - Part 2 · XML vs HTML vs XHTML Processing Web Data With XML And XSLT XML – standard (specification) for describing structure and content

Today's Agenda

• XML• XPath• XSLT• Live Coding

Processing Web Data With XML And XSLT

Page 5: Processing Web Data With XML And XSLT - Part 2 · XML vs HTML vs XHTML Processing Web Data With XML And XSLT XML – standard (specification) for describing structure and content

About XML

Processing Web Data With XML And XSLT

eXtensible Markup Language

• It is a markup language– text is surrounded by tags

(that provide semantics)• doesn't define a set of elements

Page 6: Processing Web Data With XML And XSLT - Part 2 · XML vs HTML vs XHTML Processing Web Data With XML And XSLT XML – standard (specification) for describing structure and content

About XML - Syntax

Processing Web Data With XML And XSLT

XML Syntax Rules

• Prolog must be on the first line (if is present)• Must have only one root element• All start tags must have a closing tag

– or to be self-closing tags• Entities

– < > & ' "

• Comments– <!-- TODO: fix it! -->

Page 7: Processing Web Data With XML And XSLT - Part 2 · XML vs HTML vs XHTML Processing Web Data With XML And XSLT XML – standard (specification) for describing structure and content

About XML – Verification

Processing Web Data With XML And XSLT

Well-formed XML vs Valid XML

• Well-formed = conform the syntax rules– e.g: no missing end tags, no overlapping tags

• Valid = conform the schema rules– e.g: no more elements that the schema declares

Page 8: Processing Web Data With XML And XSLT - Part 2 · XML vs HTML vs XHTML Processing Web Data With XML And XSLT XML – standard (specification) for describing structure and content

XML Strong Points

Processing Web Data With XML And XSLT

• Semantics– data is wrapped in semantics– XML vocabularies

• Validation– controlled structure

• Reuse– data isn't duplicated– XInclude

Page 9: Processing Web Data With XML And XSLT - Part 2 · XML vs HTML vs XHTML Processing Web Data With XML And XSLT XML – standard (specification) for describing structure and content

Where it is used?

Processing Web Data With XML And XSLT

Wherever any of the following needs arise:

• semantic content• content reuse• well-structured content• content validation

Page 10: Processing Web Data With XML And XSLT - Part 2 · XML vs HTML vs XHTML Processing Web Data With XML And XSLT XML – standard (specification) for describing structure and content

XML vs HTML vs XHTML

Processing Web Data With XML And XSLT

XML– standard (specification) for describing structure and content– extensible– can explain what data means

• HTML (hypertext markup language)– non extensible (fixed tags set)– can't explain what data means

• XHTML– HTML that conform to XML standards (well formed HTML)

Page 11: Processing Web Data With XML And XSLT - Part 2 · XML vs HTML vs XHTML Processing Web Data With XML And XSLT XML – standard (specification) for describing structure and content

XML-related Technologies

Processing Web Data With XML And XSLT

The XML world is really big!

• XML• XPath• XSLT• XQuery• XSD, DTD• SVG• etc.

Page 12: Processing Web Data With XML And XSLT - Part 2 · XML vs HTML vs XHTML Processing Web Data With XML And XSLT XML – standard (specification) for describing structure and content

XML DOM

Processing Web Data With XML And XSLT

XML Document Object Model

Page 13: Processing Web Data With XML And XSLT - Part 2 · XML vs HTML vs XHTML Processing Web Data With XML And XSLT XML – standard (specification) for describing structure and content

XPath

Processing Web Data With XML And XSLT

XML Path Language

• Select a set of nodes within an XML document• Highly used in XSLT• Any CSS selector can be written in XPath

Page 14: Processing Web Data With XML And XSLT - Part 2 · XML vs HTML vs XHTML Processing Web Data With XML And XSLT XML – standard (specification) for describing structure and content

XPath – Syntax 1

Processing Web Data With XML And XSLT

• Basic syntax– / - root element– //div - all div elements within document– /html/body/div - div elements within body– /html/body/../ - body – /html/body/* - body children– */@class – the “class” attributes– . – the current element

Page 15: Processing Web Data With XML And XSLT - Part 2 · XML vs HTML vs XHTML Processing Web Data With XML And XSLT XML – standard (specification) for describing structure and content

XPath – Syntax 2

Processing Web Data With XML And XSLT

• Axes– //p/following-sibling:: - elements placed after p– //p/preceding-sibling:: - elements placed before p– //p/descendant:: - descendents of p

Page 16: Processing Web Data With XML And XSLT - Part 2 · XML vs HTML vs XHTML Processing Web Data With XML And XSLT XML – standard (specification) for describing structure and content

XPath – Syntax 3

Processing Web Data With XML And XSLT

• Operators– “|”

● //book | //magazine– “=”

● //book[@price=9.8]– “or”

● //book[@price>=9.8 or @price<=10]

Page 17: Processing Web Data With XML And XSLT - Part 2 · XML vs HTML vs XHTML Processing Web Data With XML And XSLT XML – standard (specification) for describing structure and content

XSLT

Processing Web Data With XML And XSLT

Extensible Stylesheet Language Transformations

• Transform/remodel XML documents

Page 18: Processing Web Data With XML And XSLT - Part 2 · XML vs HTML vs XHTML Processing Web Data With XML And XSLT XML – standard (specification) for describing structure and content

XSLT

Processing Web Data With XML And XSLT

Basic concepts

• <template>– <value-of>– context (.)– <copy>

• <apply-templates>

Page 19: Processing Web Data With XML And XSLT - Part 2 · XML vs HTML vs XHTML Processing Web Data With XML And XSLT XML – standard (specification) for describing structure and content

XSLT

Processing Web Data With XML And XSLT

Page 20: Processing Web Data With XML And XSLT - Part 2 · XML vs HTML vs XHTML Processing Web Data With XML And XSLT XML – standard (specification) for describing structure and content

Let's Code

Processing Web Data With XML And XSLT

• We'll extract extract useful data from the files generated from previous course.

Page 21: Processing Web Data With XML And XSLT - Part 2 · XML vs HTML vs XHTML Processing Web Data With XML And XSLT XML – standard (specification) for describing structure and content

THANK YOU!Any questions?

Bogdan [email protected]

© 2020 Syncro Soft SRL. All rights reserved.