introduction to database systems cse 414 · 2018. 4. 24. · discussion: why semi-structured data?...

27
Introduction to Database Systems CSE 414 Lecture 13: Json and SQL++ 1 CSE 414 - Spring 2018

Upload: others

Post on 10-Oct-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Database Systems CSE 414 · 2018. 4. 24. · Discussion: Why Semi-Structured Data? •Semi-structured data model is good as data exchange formats –i.e., exchanging

Introduction to Database SystemsCSE 414

Lecture 13: Json and SQL++

1CSE 414 - Spring 2018

Page 2: Introduction to Database Systems CSE 414 · 2018. 4. 24. · Discussion: Why Semi-Structured Data? •Semi-structured data model is good as data exchange formats –i.e., exchanging

Announcements• HW5 + WQ5 will be out tomorrow

– Both due in 1 week

• Midterm in class on Friday, 5/4– Covers everything (HW, WQ, lectures, sections,

readings) up to and including next Monday’s lecture and HW5 + WQ5

– Review session: 5/2 in MUE 153, 5-7pm

• Make sure you are good for AWS– You will need it for HW6

CSE 414 - Spring 18 2

Page 3: Introduction to Database Systems CSE 414 · 2018. 4. 24. · Discussion: Why Semi-Structured Data? •Semi-structured data model is good as data exchange formats –i.e., exchanging

3

JSon Syntax{ "book": [

{"id":"01","language": "Java","author": "H. Javeson","year": 2015

},{"id":"07","language": "C++","edition": "second""author": "E. Sepp","price": 22.25

}]

}

CSE 414 - Spring 18

Page 4: Introduction to Database Systems CSE 414 · 2018. 4. 24. · Discussion: Why Semi-Structured Data? •Semi-structured data model is good as data exchange formats –i.e., exchanging

JSon Data Structures

• Objects, i.e., collections of name-value pairs:– {“name1”: value1, “name2”: value2, …}– “name” is also called a “key”

• Ordered lists of values:– [obj1, obj2, obj3, ...]

CSE 414 - Spring 18 4

Page 5: Introduction to Database Systems CSE 414 · 2018. 4. 24. · Discussion: Why Semi-Structured Data? •Semi-structured data model is good as data exchange formats –i.e., exchanging

JSon Primitive Datatypes• Number

• String– Denoted by double quotes

• Boolean– Either true or false

• nullemptyCSE 414 - Spring 18 5

Page 6: Introduction to Database Systems CSE 414 · 2018. 4. 24. · Discussion: Why Semi-Structured Data? •Semi-structured data model is good as data exchange formats –i.e., exchanging

6

JSon Semantics: a Tree !person

Mary

name address

name address

street no city

Maple 345 Seattle

JohnThai

phone

23456

{“person”:[ {“name”: “Mary”,

“address”: {“street”:“Maple”,“no”:345,“city”: “Seattle”}},

{“name”: “John”,“address”: “Thailand”,“phone”:2345678}}

]}

Page 7: Introduction to Database Systems CSE 414 · 2018. 4. 24. · Discussion: Why Semi-Structured Data? •Semi-structured data model is good as data exchange formats –i.e., exchanging

7

JSon Semantics: a Tree !person

Mary

name address

name address

street no city

Maple 345 Seattle

JohnThai

phone

23456

{“person”:[ {“name”: “Mary”,

“address”: {“street”:“Maple”,“no”:345,“city”: “Seattle”}},

{“name”: “John”,“address”: “Thailand”,“phone”:2345678}}

]}

Object 0Object 1

Recall: arrays are ordered in Json!

Page 8: Introduction to Database Systems CSE 414 · 2018. 4. 24. · Discussion: Why Semi-Structured Data? •Semi-structured data model is good as data exchange formats –i.e., exchanging

8

JSon Data

• JSon is self-describing• Schema elements become part of the data

– Relational schema: person(name,phone)– In Json “person”, “name”, “phone” are part of the

data, and are repeated many times• Consequence: JSon is much more flexible• JSon = semistructured data

CSE 414 - Spring 18

Page 9: Introduction to Database Systems CSE 414 · 2018. 4. 24. · Discussion: Why Semi-Structured Data? •Semi-structured data model is good as data exchange formats –i.e., exchanging

Mapping Relational Data to JSon

CSE 414 - Spring 18 9

name name namephone phone phone“John” 3634 “Sue” “Dirk”6343 6363

Person

person

name phoneJohn 3634Sue 6343Dirk 6363

{“person”: [{“name”: “John”, “phone”:3634},{“name”: “Sue”, “phone”:6343},{“name”: “Dirk”, “phone”:6383}]

}

Page 10: Introduction to Database Systems CSE 414 · 2018. 4. 24. · Discussion: Why Semi-Structured Data? •Semi-structured data model is good as data exchange formats –i.e., exchanging

Mapping Relational Data to JSon

10

Personname phoneJohn 3634Sue 6343

May inline multiple relations based on foreign keys

OrderspersonName date productJohn 2002 GizmoJohn 2004 GadgetSue 2002 Gadget

{“Person”:[{"name": "John","phone":3646,"Orders":[{"date":2002,"product":"Gizmo"},{“date”:2004,"product":"Gadget"}]},{"name": "Sue","phone":6343,"Orders":[{"date":2002,"product":"Gadget"}]}]}

Page 11: Introduction to Database Systems CSE 414 · 2018. 4. 24. · Discussion: Why Semi-Structured Data? •Semi-structured data model is good as data exchange formats –i.e., exchanging

Discussion: Why Semi-Structured Data?

• Semi-structured data model is good as data exchange formats– i.e., exchanging data between different apps– Examples: XML, JSon, Protobuf (protocol buffers)

• Increasingly, systems use them as a data model for databases:– SQL Server supports for XML-valued relations– CouchBase, MongoDB: JSon as data model– Dremel (BigQuery): Protobuf as data model

CSE 414 - Spring 18 11

Page 12: Introduction to Database Systems CSE 414 · 2018. 4. 24. · Discussion: Why Semi-Structured Data? •Semi-structured data model is good as data exchange formats –i.e., exchanging

Query Languages for Semi-Structured Data

• XML: XPath, XQuery (see textbook)– Supported inside many RDBMS (SQL Server, DB2, Oracle)– Several standalone XPath/XQuery engines

• Protobuf: SQL-ish language (Dremel) used internally by google, and externally in BigQuery

• JSon:– CouchBase: N1QL– Asterix: SQL++ (based on SQL)– MongoDB: has a pattern-based language– JSONiq http://www.jsoniq.org/

Page 13: Introduction to Database Systems CSE 414 · 2018. 4. 24. · Discussion: Why Semi-Structured Data? •Semi-structured data model is good as data exchange formats –i.e., exchanging

• AsterixDB– No-SQL database system– Developed at UC Irvine– Now an Apache project, being incorporated into

CouchDB (another No-SQL DB)

• Uses Json as data model• Query language: SQL++

– SQL-like syntax for Json dataCSE 414 - Spring 18 13

They are hiring!

Page 14: Introduction to Database Systems CSE 414 · 2018. 4. 24. · Discussion: Why Semi-Structured Data? •Semi-structured data model is good as data exchange formats –i.e., exchanging

Asterix Data Model (ADM)• Based on the Json standard• Objects:

– {“Name”: “Alice”, “age”: 40}– Fields must be distinct:

{“Name”: “Alice”, “age”: 40, “age”:50}

• Ordered arrays:– [1, 3, “Fred”, 2, 9]– Can contain values of different types

• Multisets (aka bags):– {{1, 3, “Fred”, 2, 9}}– Mostly internal use only but can be used as inputs– All multisets are converted into ordered arrays (in arbitrary

order) when returned at the end

14

Can’t haverepeated fields

Page 15: Introduction to Database Systems CSE 414 · 2018. 4. 24. · Discussion: Why Semi-Structured Data? •Semi-structured data model is good as data exchange formats –i.e., exchanging

Examples

What do these queries return?

CSE 414 - Spring 18 15

SELECT x.phoneFROM [{"name": "Alice", "phone": [300, 150]}] AS x;

SELECT x.phoneFROM {{ {"name": "Alice", "phone": [300, 150]} }} AS x;

-- errorSELECT x.phoneFROM {"name": "Alice", "phone": [300, 150]} AS x;

Can only query frommulti-set or array (not object)

Page 16: Introduction to Database Systems CSE 414 · 2018. 4. 24. · Discussion: Why Semi-Structured Data? •Semi-structured data model is good as data exchange formats –i.e., exchanging

Datatypes

• Boolean, integer, float (various precisions), geometry (point, line, …), date, time, etc

• UUID = universally unique identifierUse it as a system-generated unique key

CSE 414 - Spring 18 16

Page 17: Introduction to Database Systems CSE 414 · 2018. 4. 24. · Discussion: Why Semi-Structured Data? •Semi-structured data model is good as data exchange formats –i.e., exchanging

null v.s. missing• {"age": null} = the value NULL (like in SQL)• {"age": missing} = { } = really missing

SELECT x.b FROM [{"a":1, "b":2}, {"a":3}] AS x;

SELECT x.bFROM [{"a":1, "b":2}, {"a":3, "b":missing }] AS x;

SELECT x.bFROM [{"a":1, "b":2}, {"a":3, "b":null }] AS x;

{"b": 2}{ }

{"b": 2}{"b": null }

{"b": 2}{ }

Answer

Answer

Answer

Page 18: Introduction to Database Systems CSE 414 · 2018. 4. 24. · Discussion: Why Semi-Structured Data? •Semi-structured data model is good as data exchange formats –i.e., exchanging

Finally, a language that we can use!SELECT x.ageFROM Person AS xWHERE x.age > 21GROUP BY x.genderHAVING x.salary > 10000ORDER BY x.name;

FROM Person AS xWHERE x.age > 21GROUP BY x.genderHAVING x.salary > 10000SELECT x.ageORDER BY x.name;

is exactly the same as

FWGHOSlives!!

Page 19: Introduction to Database Systems CSE 414 · 2018. 4. 24. · Discussion: Why Semi-Structured Data? •Semi-structured data model is good as data exchange formats –i.e., exchanging

SQL++ Overview

• Data Definition Language: create a– Type– Dataset (like a relation)– Dataverse (a collection of datasets)– Index

• For speeding up query execution

• Data Manipulation Language: SELECT-FROM-WHERE

CSE 414 - Spring 18 19

Page 20: Introduction to Database Systems CSE 414 · 2018. 4. 24. · Discussion: Why Semi-Structured Data? •Semi-structured data model is good as data exchange formats –i.e., exchanging

DataverseA Dataverse is a Database (i.e., collection of tables)

CREATE DATAVERSE myDBCREATE DATAVERSE myDB IF NOT EXISTS

DROP DATAVERSE myDBDROP DATAVERSE myDB IF EXISTS

USE myDB20

Page 21: Introduction to Database Systems CSE 414 · 2018. 4. 24. · Discussion: Why Semi-Structured Data? •Semi-structured data model is good as data exchange formats –i.e., exchanging

Type

• Defines the schema of a collection• It lists all required fields• Fields followed by ? are optional

• CLOSED type = no other fields allowed• OPEN type = other fields allowed

CSE 414 - Spring 18 21

Page 22: Introduction to Database Systems CSE 414 · 2018. 4. 24. · Discussion: Why Semi-Structured Data? •Semi-structured data model is good as data exchange formats –i.e., exchanging

Closed Types

22

USE myDB;

DROP TYPE PersonType IF EXISTS;

CREATE TYPE PersonType AS CLOSED {

name: string,

age: int,

email: string?

}

{"name": "Alice", "age": 30, "email": "[email protected]"}

{"name": "Bob", "age": 40}

-- not OK:{"name": "Carol", "phone": "123456789"}

Page 23: Introduction to Database Systems CSE 414 · 2018. 4. 24. · Discussion: Why Semi-Structured Data? •Semi-structured data model is good as data exchange formats –i.e., exchanging

Open Types

23

{"name": "Alice", "age": 30, "email": "[email protected]"}

{"name": "Bob", "age": 40}

-- now it’s OK:{"name": "Carol", "age":20, "phone": "123456789"}

USE myDB;DROP TYPE PersonType IF EXISTS;CREATE TYPE PersonType AS OPEN {

name: string,age: int,email: string?

}

Page 24: Introduction to Database Systems CSE 414 · 2018. 4. 24. · Discussion: Why Semi-Structured Data? •Semi-structured data model is good as data exchange formats –i.e., exchanging

Types with Nested Collections

24

USE myDB;

DROP TYPE PersonType IF EXISTS;

CREATE TYPE PersonType AS CLOSED {

Name : string,

phone: [string]

}

{"Name": "Carol", "phone": ["1234”]}{"Name": "David", "phone": [“2345”, “6789”]}{"Name": "Evan", "phone": []}

Page 25: Introduction to Database Systems CSE 414 · 2018. 4. 24. · Discussion: Why Semi-Structured Data? •Semi-structured data model is good as data exchange formats –i.e., exchanging

Datasets

• Dataset = relation

• Must have a type– Can be a trivial OPEN type

• Must have a key– Can also be a trivial one

CSE 414 - Spring 18 25

Page 26: Introduction to Database Systems CSE 414 · 2018. 4. 24. · Discussion: Why Semi-Structured Data? •Semi-structured data model is good as data exchange formats –i.e., exchanging

Dataset with Existing Key

CSE 414 - Spring 18 26

USE myDB;DROP TYPE PersonType IF EXISTS;CREATE TYPE PersonType AS CLOSED {

name: string,email: string?

}

USE myDB;DROP DATASET Person IF EXISTS;CREATE DATASET Person(PersonType) PRIMARY KEY Name;

{“name”: “Alice”}{“name”: “Bob”}…

Page 27: Introduction to Database Systems CSE 414 · 2018. 4. 24. · Discussion: Why Semi-Structured Data? •Semi-structured data model is good as data exchange formats –i.e., exchanging

Dataset with Auto Generated Key

CSE 414 - Spring 18 27

USE myDB;DROP TYPE PersonType IF EXISTS;CREATE TYPE PersonType AS CLOSED {

myKey: uuid,Name : string,email: string?

}

USE myDB;DROP DATASET Person IF EXISTS;CREATE DATASET Person(PersonType)

PRIMARY KEY myKey AUTOGENERATED;

Note: no myKeyinserted as it isautogenerated

{“name”: “Alice”}{“name”: “Bob”}…