xml & json: interchangeability and case studies
TRANSCRIPT
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato
Consiglio Nazionale delle Ricerche Istituto di Scienze e Tecnologie della Cognizione
Catania
XML & JSON: interchangeability and case studies Part 1: from text to XML/JSON
Semantic web
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
• Classic web enhancement!• Information encoding!• Information ambiguity!• Information transfer systems!• Searching, maintaining and preserving reliable data!
• Methods for data use and exchange!
XML and JSON !
XML and JSON
• Created for the exchange between client and server!
• Readable!
• Hierarchical !
• Many tools that read and use them !
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
XML and JSON: differences
• Longer!
• Need a parser to be interpreted !
• No data type “array”!
XML and JSON!or!
XML vs JSON!Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
• Shorter!
• No parser to be interpreted !
• Native data type “array”!
XML! JSON!
Information encoding
• Communication!
• Character encoding!
• Text storing!
• Text transmission!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
Information encoding
• String!
• Repertoire of characters!
• Charset!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
Definitions!
Information encoding
• Morse!
• Enigma!
• ASCII!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
Information encoding
• Morse!
• Enigma!
• ASCII!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
Information encoding
• Morse!
• Enigma!
• ASCII!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
Information encoding
01001000 01100101 01101100 01101100 01101111 00100000 01010111 01101111 01110010 01101100 01100100!
48 65 6C 6C 6F 20 77 6F 72 6C 64!
Hello world!Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
ASCII!
Information encoding
• From 128 to 256 (from 7 bit to 8 bit)!
• Charsets from IBM, HP, Apple, Microsoft!
• From code page to ISO!
• ISO vs ANSI !
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
ASCII!
Information encoding
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
• 143.859 characters!
• Covering 154 modern and historic scripts!
• Character encoding:!• UTF-32!• UTF-16!• UTF-8!
UNICODE!
Information encoding
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
• 2-4 bytes!
• 3 schemas!• UTF-16!• UTF-16LE (Little Endian)!• UTF-16BE (Big Endian)!
UTF-16!
Information encoding
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
• 1-4 bytes!
• 1.112.064 valid character code points in Unicode!• 1 byte: Standard ASCII!• 2 bytes: Arabic, Hebrew, most European scripts!• 3 bytes: BMP (Basic Multilingual Plane)!• 4 bytes: All Unicode characters!
UTF-8!
Information encoding
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
Mojibake!
The UTF-8-encoded Japanese Wikipedia article for Mojibake as displayed if interpreted as Windows-1252 encoding!
Information encoding
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
• The most common encoding for the World Wide Web!
• Accounting for 97% of all web pages!
• Up to 100% for some languages!
UTF-8!
Data exchange
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
Findable!
Accessible!
Interoperable!
Reusable!
FAIR principles!
Data exchange
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
CSV!
Data exchange
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
CSV Advantages!• CSV is human readable and easy to edit manually!• CSV is simple to implement and parse!• CSV is processed by almost all existing applications!• CSV provides a straightforward information schema!• CSV is faster to handle!• CSV is smaller in size!• CSV is considered to be standard format!• CSV is compact. For XML you start tag and end tag for each column in each row. In CSV you write the column headers only once.!• CSV is easy to generate!
CSV!
CSV Disdvantages!• CSV allows to move most basic data only. Complex configurations cannot be imported and exported this way!• There is no distinction between text and numeric values!• No standard way to represent binary data!• Problems with importing CSV into SQL (no distinction between NULL and quotes)!• Poor support of special characters!• No standard way to represent control characters!• Lack of universal standard!• Feld data may also contain commas or even embedded line-breaks!
Data exchange
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
ISO/OSI!
Data exchange
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
ISO/OSI!
Data exchange
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
ISO/OSI!
Data exchange
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
• www!
• Tim Berners-Lee!
• SGML!
• Netscape vs Microsoft !
HTML - The Web 1.0!
Data exchange
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
• Programming language!
• Standard markup language!
• Web browser!
HTML - The Web 1.0!
Data exchange
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
• Syntax!
• Semantic!
• Representation!
• Behaviour!
HTML - The Web 1.0!
Data exchange
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
HTML - The Web 1.0!
Data exchange
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
HTML - The Web 1.0!
EUPORIA web page source!
Data exchange
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
• eXtensible Markup Language !
• Specification for the definition of markup languages!
• World Wide Web Committee (W3C)!
• HTML as an XML application -> XHTML!
XML - The Web 1.1!
Data exchange
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
• Integrity of data in any XML document!
• Technology to interoperate with any platform!
• Technology to interoperate with any platform!
XML - The Web 1.1!
Data exchange
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
• Sun and Microsoft!
• Java!• object-oriented programming languages !• “write once run anywhere”!
• .NET, C#!• XML to solve the data interoperability puzzle!
The way to JSON: Java, .NET e AJAX !
Data exchange
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
• AJAX: “Asynchronous JavaScript and XML”!
• Communications in background!
• Single-page Application (SPA)!
• JavaScript for everyone!
• Web 2.0!
The way to JSON: Java, .NET e AJAX !
Data exchange
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
• HTML document containing some JavaScript!
• Interoperability across all browsers!
• Interchange data between arbitrary language!
JSON!
Data exchange
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
“XML is the most fully developed means of getting data in and out of an AJAX client, but there’s no
reason you couldn’t accomplish the same effects using a technology like JavaScript Object Notation or
any similar means of structuring data.”!
JSON!
XML vs JSON
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
• eXtensible Markup Language!
• Store and transport data!
• Human- and machine-readable!
XML!
XML vs JSON
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
• XML was designed to carry!
• HTML was designed to display data!
• XML tags are not predefined!
• HTML tags are predefined!
XML vs HTML!
XML vs JSON
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
XML!
XML vs JSON
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
• Documents must have a root element!
• Prolog is optional!
• All elements must have a closing tag!
• Properly nested!
• Attribute values must always be quoted!
• Well formed!
XML syntax rules!
XML vs JSON
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
• An element can contain:!• text!• attributes!• other elements!• or a mix of the above!
• An attribute must be quoted!
• Avoid attributes (if unnecessary):!• attributes cannot contain multiple values (elements can)!• attributes cannot contain tree structures (elements can)!• attributes are not easily expandable (for future changes)!
XML elements and attributes!
XML vs JSON
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
XML elements and attributes!
XML vs JSON
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
• XSLT is style sheet language for XML!
• XSLT is far more sophisticated than CSS!
XML and XSLT!
XML vs JSON
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
XML and XSLT!
XML vs JSON
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
• Describes the structure of an XML document!
• “Well Formed”!
• “Valid”!
XML schema!
XML vs JSON
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
XML example: TEI!
XML vs JSON
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
• JSON: JavaScript Object Notation!
• JSON is a syntax for storing and exchanging data!
• JSON is text, written with JavaScript object notation!
JSON!
XML vs JSON
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
JSON syntax!
XML vs JSON
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
JSON schema!
XML vs JSON
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
JSON example!
XML vs JSON
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
• JSON is like XML because!• Both JSON and XML are "self describing" (human readable)!• Both JSON and XML are hierarchical (values within values)!• Both JSON and XML can be parsed and used by lots of programming languages!
• JSON is unike XML because!
• JSON doesn't use end tag!• JSON is shorter!• JSON is quicker to read and write!• JSON can use arrays!
• XML is much more difficult to parse than JSON!
• JSON is parsed into a ready-to-use JavaScript object!
JSON vs XML!
XML vs JSON
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
JSON vs XML!
XML vs JSON
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
• XML has a schema outside!
• XML more powerful schema!
JSON vs XML!
XML vs JSON
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
JSON and XML!
XML vs JSON
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021
Grazie!!