02-xml basics (1) - compatibility...

Post on 27-Jun-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

웹 기술 및 응용

XML Basics

Instructor: Prof. Young-guk HaDept. of Computer Science & Engineering

2019년 2학기

2웹기술응용설계

목차

q Introduction to XML

q XML Document Structure and Basic

Syntax

Introduction to XML

4웹기술응용설계

XML (eXtensible Markup Language) 개요 (1)

q What “extensible” means in XMLØ “Capable of being extended”Ø Means that you can define your own markups

q Markups (Tags)Ø Information added to content of a text that enhances its

meaning

o Demarcates or labels parts of a text

Ø Types of markups in HTML

o Semantic Markup: describes the meaning of content

• E.g.) <TITLE>, <BODY>

o Stylistic Markup: describes how to present the content

• E.g.) <FONT>, <B>

o Structural Markup: describes the structure of content

• E.g.) <P>

5웹기술응용설계

XML (eXtensible Markup Language) 개요 (2)

q Markup language

Ø A set of markups that can be placed in a text for a specific purpose

Ø E.g., HTML, WML, VRML, SensorML, MathML, VoiceXML, …

q XML

Ø Extensible markup language = meta-markup language

Ø A set of rules to build a markup language and to handle the documents

o I.e., family of technologies to describe how to define tags, transform documents, retrieve data, present data, and so on

q XML document

Ø A document having its content demarcated by XML tags

Ø Set of new tag definitions with XML tags

6웹기술응용설계

XML의 역사

1970

1986

1991

1998

SGML

HTML

XML

q 1986: SGML (Standard Generalized

Markup Language)

à International Standard

(ISO)

q 1998: XML 1.0

à De Facto Standard (W3C)

q 2004: XML 1.1

q 2006: XML 1.1 (2nd Edition)

q 2008: XML 1.0 (5th Edition)

GML (IBM)

WWW

7웹기술응용설계

Example of XML Document (1)

q All XML documents are

made up of markups and

contentsØ Semi-structured

documents

Ø Markups and contents

complement each other

Ø Markups create an

information entity with

partitions

Ø Markups create an

labeled data in a handy

package

<?xml version=“1,0”?>

<letter priority=“important”>

<to>John</to><subject>CS760</subject><message>

Don’t forget to attendthe class<emphasis>on Friday </emphasis>

Good luck to you.</message>

<from>Tomas</from></letter>

8웹기술응용설계

Example of XML Document (2)

1 실 세계의 BMW 차

BMW 차에 대한 XML 문서

XML 저작 도구:BMW 차에 대한 XML 문서 작성

3

2

BMW

9웹기술응용설계

XML vs. HTML (1)

q HTML은 미리 정의된 tag만을 사용, XML은 tag를 확장가능

q HTML tag들은 주로 content를 화면에 보여주기 위한 방법 제공, XML

tag들은 문서의 구조화 혹은 content에 대한 labeling 방법 제공

q XML은 tag 명칭의 대/소문자를 구분

우편번호라는 사실을 알기 어려움

화양동

화양동

10웹기술응용설계

XML vs. HTML (2)

q XML 문서

Ø XML tag를 이용해서 labeling함으로써 content의 의미를 표현 가능

<zip>450-3490</zip>화양동

11웹기술응용설계

XML vs. Other Electronic Documents

q HWP 및 MS Word 문서

Ø 비표준화된 전용의 이진 파일 형태로 저장

Ø 문서 구조 정보가 없고 문서 내용과 스타일이 혼합

Ø 외부 프로그램에서 문서 사용 및 처리의 자동화가 어려움

q XML 문서

Ø 일반 text 파일 형태로 저장하여 모든 컴퓨팅 플랫폼에서 판독 가능

Ø 문서를 구조, 내용 및 스타일로 각각 분리하여 관리

o 문서 구조: DTD나 XML Schema를 기반으로 정의 (document model)

o 문서 내용: document model에 맞추어 content 작성 (valid XML document)

o 문서 스타일: 문서 내용을 표현하기 위한 스타일 정의 (XSL, CSS)

Ø 외부 프로그램에서 문서 사용 및 자동화된 처리가 용이함

12웹기술응용설계

Benefits of XML Documents

q 다른 전자 문서와 비교한 XML 문서의 장점

Ø 데이터의 독립성

o 문서의 구조(DTD, XML Schema)와 내용(document)을 분리

Ø 다양한 표현

o 동일한 문서 내용을 다양하게 표현이 가능 (CSS, XSL)

Ø 데이터 교환이 용이

o Text 및 개방형 웹 표준 기반

Ø 데이터 검색 기능 강화

o Semi-structured 문서로서 데이터 검색이 용이 (XPath, XQuery)

Ø 문서 구조의 변형(transform)이 용이

o E.g., XML 문서 à HTML 문서 (XSLT)

o E.g., XML 문서 à MS Word, HWP, PDF 등 binary 문서 (XSLT-FO)

13웹기술응용설계

XML Technology Family

문서구조

DTDXML SchemaSOX…

문서 링크

XPathXPointerXlink…

파생 언어

WMLXHTMLMathML…

문서 스타일

XSLTXSLT-FOXSL, CSS…

문서 API

SAXDOMJDOM…

서비스

SOAPWSDLUDDI…

저장 및 검색

XML-DBMSNXDXQuery…

보안

EncryptionSignature…

XML

XML Document Structure and Basic Syntax

15웹기술응용설계

XML 기본 용어 (1)

q Element

Ø Labeled container of content

Ø Basic building block of XML documents

<to type = “name”>

Hong Gildong

</to>

Element“to” 속성 (Attribute)

시작 태그 (Start tag)

내용 (Content)

마침 태그 (End tag)

16웹기술응용설계

XML 기본 용어 (2)

q 적절한 문서 (Well-formed document)

Ø 브라우저나 다른 프로그램에 의해 처리될 수 있도록 해주는 최소한의 규약인

XML 기본 문법을 준수한 문서

1) It contains only properly-encoded legal Unicode characters

2) None of the special syntax characters such as "<" and "&" appear except

when performing their markup-delineation roles

3) The begin, end, and empty-element tags which delimit the elements are

correctly nested, without missing and overlapping

4) The element tags are case-sensitive; the start and end tags must match

exactly

5) There is a single root element which contains all the other elements

q 유효한 문서 (Valid document)

Ø 해당 문서의 문서 모델에 맞는 문서

o DTD (Document Type Definition)

o XML Schema

17웹기술응용설계

적절한 (Well-Formed) 문서의 예

q 정확히 하나의 최상위 (root) 엘리먼트를 가져야 함

Ø 적절한 문서: <jumin> … </jumin>

q 태그가 올바르게 둘러싸여져야 함 (correctly nested)

Ø 적절한 문서: <jumin><name>kim</name></jumin>

Ø 적절하지 못한 문서: <jumin><name>kim</jumin></name>

q 각 엘리먼트가 시작 태그와 마침 태그를 모두 가져야 함

Ø 적절하지 못한 문서: <name>kim 또는 kim</name>

q 시작 태그명과 마침 태그명이 같아야 함 (대/소문자 구분 포함)

Ø 적절한 문서: <name>kim</name>

Ø 적절하지 못한 문서: <name>kim</age>, <name>kim</Name>

18웹기술응용설계

Well-formed 및 Valid Document 검사

19웹기술응용설계

XML 문서 구조

<memo><to what=“name”>홍길동</to><date>2002/04/05</date><contents>전화 요망</contents><from>허준</from>

</memo>

XMLDeclaration

Document TypeDeclaration(생략가능)

Prolog(생략가능)

Elements(Contents)

<?xml version=“1.0” encoding=“euc-kr”?>

<!DOCUMENT memo [ <!ELEMENT memo (to, …)>

…]>

20웹기술응용설계

간단한 XML 문서 구조의 예

XML 문서 내용(Elements)

XML 선언

21웹기술응용설계

Example of XML Document

XML 선언

XML 문서 내용(Elements)

22웹기술응용설계

Root Element

Tree View of the Example Document Structure

Element

Content

Attribute

23웹기술응용설계

Structure of XML Documents

q XML Document :=

Prolog? Element

q Prolog

Ø Tips off the world that the

document is marked up in

XML

q Element

Ø Root element (Document

element)

Ø Other elements

24웹기술응용설계

Prolog

q Prolog :=

XMLDecl DocTypeDecl?

q Top of XML document is graced

with special information

Ø XML Declaration

o The document is marked up in XML

o Example

<?xml version=“1.0”?>

Ø Document Type Declaration

o Defines name of the root element

o Defines DTD (Document Type

Definition) reference

à document model

25웹기술응용설계

XML Declaration

q XMLDecl := “<?xml” versioninfo encodinginfo? standaloneinfo? “?>”

Ø versiono E.g., version=“1.0”

Ø encodingo “euc-kr”: Korean encoding

o “UTF-8”: 8-bit Unicode (default)

Ø standaloneo “yes”: No external file to load

o “no”: Some files to load (default)

• When there is an External Entity

• When DTD is in an external file

q Examples<?xml version=“1.0”?><?xml version=“1.0” encoding=“euc-kr”?>

* Note “<? … ?>” tag comes from SGML

26웹기술응용설계

Document Type Declaration

q DocTypeDecl :=

“<!DOCTYPE” root-element extID-of-dtd?(“[” internal-subset “]”)?

“>”

q Document Type Declaration

Ø Defines name of the root element

Ø Defines DTD (internal subset)

o For document validity checking

o Defines ELEMENT and ENTITY declarations

q External subset reference

Ø extID-of-dtd refers to an external

subset for document type declaration

* Note “<! … !>” and “[ … ]” tags come from SGML

27웹기술응용설계

Document Type Declaration Example (1)

DTD

Root Element

28웹기술응용설계

Document Type Declaration Example (2)

External ID of DTD

29웹기술응용설계

Element: Building Block of XML Documents

q Element :=<name (att1=“value1” att2=“value2” …)? >

content</name>

q Empty Element :=

<name (att1=“value1” att2=“value2” …)? />

q Example

<Caution class=“info”>Start, End tag should be pair!Name is case-sensitive!Whitespace in content is preserved!Following element is empty element.<EmptyElement/>

</Caution>

30웹기술응용설계

Element: Building Block of XML (cont’d)

q Naming rules

Ø Starts with a letter or underscore (_)

Ø Should not start with “xml”, “Xml”,“xMl”, “xmL”, …, or “XML”

Ø Contains letters, numbers, hyphen (-),

period (.) and underscore (_)

q Positioning rules for well-formed documents

Ø End tag must come after the start tag

Ø Elements should be correctly nested

o There should be no overlapping elements

o An element’s start and end tags must both

reside in the same parent

31웹기술응용설계

Element: Building Block of XML (cont’d)

q Element definition examples

Ø <Err>Case-sensitive</err> à </Err>just do it</Err>

Ø <1st>Don’t Start with Number</1st>à <first> … </first>

Ø <Xml_tag>Don’t Start with “xml”<Xml_tag>

Ø <err></err> à <err></err>

Ø <e rr></err> à <err></err>

Ø <emptyElement/>

o Is equal to <emptyElement></emptyElement>

o Is not equal to <emptyElement> </emptyElement>because whitespaces are preserved in XML content

32웹기술응용설계

Attribute: More Muscle for Elements

q Attribute :=

name = “value” | ‘value’

Ø Gives elements unique properties

Ø There can be many attributes in an element (unordered)

Ø Attributes are separated by whitespaces (not comma)

Ø Attribute names should be unique within an element

Ø If the attribute value itself contains double (or single) quotes we can use

single (or double) quotes around them

q Examples

Ø <letter priority=“high” type=“1”/>== <letter type=“1” priority=“high”/>

Ø <choice test=‘msg=“hi”’> or <choice test=“msg=‘hi’”>

Ø <team person=“sue” person=“joe”>à <team person1=“sue” person2=“joe”>

33웹기술응용설계

Attribute: More Muscle for Elements (cont’d)

q Attribute Value Types (in DTD)Ø ID

o Validating XML parser warns you if the ID doesn’t have a unique value through out the document (attribute “no” in the example below)

Ø IDREF(S)o Validating XML parser warns you if the IDREF points to a nonexistent

element (attribute “with” in the example below)

Ø Other types: ENUMERATED, CDATA, ENTITY(S), NMTOKEN(S)

q Example<part no=“bolt-100”/><part no=“bolt-100”/><part no=“bolt-123”/><part no=“nut-123”>

<compatible with=“bolt-123”/><compatible with=“bolt-456”/>

</part>

34웹기술응용설계

Entity: Placeholder for Content

q Entity

Ø Contains a part of XML document

Ø Something like macro in C (#define): “Declare once, use many times”Ø Doesn’t add anything semantically to the markup

Ø Always eliminate an inconvenience

o From standing in impossible-to-type characters

o To marking the place where a file should be imported (external entity)

q Example in the internal-subset

<!DOCTYPE letter ...[

<!ENTITY w3url “http://www.w3.org/”>]><letter>

<message>Hi. John. W3 URL is &w3url;</message></letter>

<message>

Hi. John. W3 URL is

http://www.w3.org/

</message>

35웹기술응용설계

Entity: Placeholder for Content (cont’d)

Used inDTD

36웹기술응용설계

Entity: Placeholder for Content (cont’d)

q Character Entity

Ø Predefinedo Ampersand(&): ampo Apostrophe(‘): apos

o Greater than(>): gto Less than(<): lto Quotation(“): quot

Ø Numbered (Unicode from #0 to #65536)

o E.g., cedilla(ç): #231

o Alphabetic, syllabic, ideographic scripts

• Latin

• Greek

• 20,000 Han ideographs

• 11,000 Hangul ideographs, ...

Ø Named (user defined)o E.g., <!ENTITY cedilla “&#231;”>

<!ENTITY name “Kim”>

37웹기술응용설계

Entity: Placeholder for Content (cont’d)

q Mixed-Content Entity

Ø Contains content of unlimited length

Ø Can include markup as well as text

o Internal entity

E.g., <!ENTITY phone “<number>042-999-9999</number>”>

o External entity

E.g., <!ENTITY signature SYSTEM “./signature.xml”>

38웹기술응용설계

Entity: Placeholder for Content (cont’d)

q Example

à External entity

39웹기술응용설계

Entity: Placeholder for Content (cont’d)

External entity

imported from“./signature.xml”

40웹기술응용설계

Entity: Placeholder for Content (cont’d)

q External Entity Example

<!ENTITY part1 SYSTEM “./p1.xml”>

<!ENTITY part2 SYSTEM “http://www.bobsbolts.com/p2.xml”>

<!ENTITY part3 SYSTEM “http://www.tomsnuts.com/p3.xml”>

à Local file

à www.bobsbolts.com

à www.tomsnuts.com

41웹기술응용설계

Entity: Placeholder for Content (cont’d)

q Unparsed Entity

Ø Should not be parsed by XML parser

o Tells parser not to load the entity’s content

o Normally used for applications

Ø May contain something other than text

o E.g.) Binary image files

<!ENTITY mypic SYSTEM “./erik.gif” NDATA GIF>

à “GIF” is name of notation data (NDATA) declared as

<!NOTATION GIF SYSTEM “image/gif”>

42웹기술응용설계

Entity: Placeholder for Content (cont’d)

q Parameter Entity

Ø Only occur in the document type declaration section

o Preceded by ‘%’ (not by ‘&’)

Ø Parameter entity references are immediately expanded in

the document type declaration

o E.g., without parameter entity

<!ELEMENT burns (#PCDATA | quote)*>

<!ELEMENT allen (#PCDATA | quote)*>

o E.g., with parameter entity

<!ENTITY % pcont "#PCDATA | quote">

<!ELEMENT burns (%pcont;)*>

<!ELEMENT allen (%pcont;)*>

43웹기술응용설계

Entity: Placeholder for Content (cont’d)

q External Parameter Entity

Ø External parameter entity

<?xml version="1.0" standalone="no"?>

<!DOCTYPE class [

<!ENTITY % professor SYSTEM "http://www.univ.com/professor.dtd">

<!ENTITY % rec_room SYSTEM "http://www.univ.com/lec_room.dtd">

<!ENTITY % student SYSTEM "http://www.univ.com/student.dtd">

%professor;

%lec_room;

%student;

]>

44웹기술응용설계

Miscellaneous Markups

q Comment :=

“<!--” any_text_and_markup “-->”

Ø Tells parser to ignore those regions

Ø Within comments, “--” should not occur

Ø E.g., <!-- <address>59 Sunspot Avene</address> -->

q Processing Instruction :=

“<?”keyword data? “?>”

Ø Container for data targeted toward specific applications or parsers

Ø E.g., <?linebreak?><?xml version=“1.0”?>

45웹기술응용설계

q CDATA Section :=

“<![CDATA[” any_text_and_markup “]]>”

Ø Tells parser the section contains no markup

o Should be treated as a regular text

Ø Within a CDATA section, “]]>” should not occur

Ø E.g.) Using “<“ and “>” in CDATA section

Miscellaneous Markups (cont’d)

with CDATA Section

]]>

46웹기술응용설계

References

q XML 1.0 (Fifth Edition)

Ø W3C Recommendation 26 Nov. 2008

Ø http://www.w3.org/TR/xml

top related