introduction to database systems cse 414 · 2018. 4. 24. · discussion: why semi-structured data?...
TRANSCRIPT
Introduction to Database SystemsCSE 414
Lecture 13: Json and SQL++
1CSE 414 - Spring 2018
Announcements• HW5 + WQ5 will be out tomorrow
– Both due in 1 week
• Midterm in class on Friday, 5/4– Covers everything (HW, WQ, lectures, sections,
readings) up to and including next Monday’s lecture and HW5 + WQ5
– Review session: 5/2 in MUE 153, 5-7pm
• Make sure you are good for AWS– You will need it for HW6
CSE 414 - Spring 18 2
3
JSon Syntax{ "book": [
{"id":"01","language": "Java","author": "H. Javeson","year": 2015
},{"id":"07","language": "C++","edition": "second""author": "E. Sepp","price": 22.25
}]
}
CSE 414 - Spring 18
JSon Data Structures
• Objects, i.e., collections of name-value pairs:– {“name1”: value1, “name2”: value2, …}– “name” is also called a “key”
• Ordered lists of values:– [obj1, obj2, obj3, ...]
CSE 414 - Spring 18 4
JSon Primitive Datatypes• Number
• String– Denoted by double quotes
• Boolean– Either true or false
• nullemptyCSE 414 - Spring 18 5
6
JSon Semantics: a Tree !person
Mary
name address
name address
street no city
Maple 345 Seattle
JohnThai
phone
23456
{“person”:[ {“name”: “Mary”,
“address”: {“street”:“Maple”,“no”:345,“city”: “Seattle”}},
{“name”: “John”,“address”: “Thailand”,“phone”:2345678}}
]}
7
JSon Semantics: a Tree !person
Mary
name address
name address
street no city
Maple 345 Seattle
JohnThai
phone
23456
{“person”:[ {“name”: “Mary”,
“address”: {“street”:“Maple”,“no”:345,“city”: “Seattle”}},
{“name”: “John”,“address”: “Thailand”,“phone”:2345678}}
]}
Object 0Object 1
Recall: arrays are ordered in Json!
8
JSon Data
• JSon is self-describing• Schema elements become part of the data
– Relational schema: person(name,phone)– In Json “person”, “name”, “phone” are part of the
data, and are repeated many times• Consequence: JSon is much more flexible• JSon = semistructured data
CSE 414 - Spring 18
Mapping Relational Data to JSon
CSE 414 - Spring 18 9
name name namephone phone phone“John” 3634 “Sue” “Dirk”6343 6363
Person
person
name phoneJohn 3634Sue 6343Dirk 6363
{“person”: [{“name”: “John”, “phone”:3634},{“name”: “Sue”, “phone”:6343},{“name”: “Dirk”, “phone”:6383}]
}
Mapping Relational Data to JSon
10
Personname phoneJohn 3634Sue 6343
May inline multiple relations based on foreign keys
OrderspersonName date productJohn 2002 GizmoJohn 2004 GadgetSue 2002 Gadget
{“Person”:[{"name": "John","phone":3646,"Orders":[{"date":2002,"product":"Gizmo"},{“date”:2004,"product":"Gadget"}]},{"name": "Sue","phone":6343,"Orders":[{"date":2002,"product":"Gadget"}]}]}
Discussion: Why Semi-Structured Data?
• Semi-structured data model is good as data exchange formats– i.e., exchanging data between different apps– Examples: XML, JSon, Protobuf (protocol buffers)
• Increasingly, systems use them as a data model for databases:– SQL Server supports for XML-valued relations– CouchBase, MongoDB: JSon as data model– Dremel (BigQuery): Protobuf as data model
CSE 414 - Spring 18 11
Query Languages for Semi-Structured Data
• XML: XPath, XQuery (see textbook)– Supported inside many RDBMS (SQL Server, DB2, Oracle)– Several standalone XPath/XQuery engines
• Protobuf: SQL-ish language (Dremel) used internally by google, and externally in BigQuery
• JSon:– CouchBase: N1QL– Asterix: SQL++ (based on SQL)– MongoDB: has a pattern-based language– JSONiq http://www.jsoniq.org/
• AsterixDB– No-SQL database system– Developed at UC Irvine– Now an Apache project, being incorporated into
CouchDB (another No-SQL DB)
• Uses Json as data model• Query language: SQL++
– SQL-like syntax for Json dataCSE 414 - Spring 18 13
They are hiring!
Asterix Data Model (ADM)• Based on the Json standard• Objects:
– {“Name”: “Alice”, “age”: 40}– Fields must be distinct:
{“Name”: “Alice”, “age”: 40, “age”:50}
• Ordered arrays:– [1, 3, “Fred”, 2, 9]– Can contain values of different types
• Multisets (aka bags):– {{1, 3, “Fred”, 2, 9}}– Mostly internal use only but can be used as inputs– All multisets are converted into ordered arrays (in arbitrary
order) when returned at the end
14
Can’t haverepeated fields
Examples
What do these queries return?
CSE 414 - Spring 18 15
SELECT x.phoneFROM [{"name": "Alice", "phone": [300, 150]}] AS x;
SELECT x.phoneFROM {{ {"name": "Alice", "phone": [300, 150]} }} AS x;
-- errorSELECT x.phoneFROM {"name": "Alice", "phone": [300, 150]} AS x;
Can only query frommulti-set or array (not object)
Datatypes
• Boolean, integer, float (various precisions), geometry (point, line, …), date, time, etc
• UUID = universally unique identifierUse it as a system-generated unique key
CSE 414 - Spring 18 16
null v.s. missing• {"age": null} = the value NULL (like in SQL)• {"age": missing} = { } = really missing
SELECT x.b FROM [{"a":1, "b":2}, {"a":3}] AS x;
SELECT x.bFROM [{"a":1, "b":2}, {"a":3, "b":missing }] AS x;
SELECT x.bFROM [{"a":1, "b":2}, {"a":3, "b":null }] AS x;
{"b": 2}{ }
{"b": 2}{"b": null }
{"b": 2}{ }
Answer
Answer
Answer
Finally, a language that we can use!SELECT x.ageFROM Person AS xWHERE x.age > 21GROUP BY x.genderHAVING x.salary > 10000ORDER BY x.name;
FROM Person AS xWHERE x.age > 21GROUP BY x.genderHAVING x.salary > 10000SELECT x.ageORDER BY x.name;
is exactly the same as
FWGHOSlives!!
SQL++ Overview
• Data Definition Language: create a– Type– Dataset (like a relation)– Dataverse (a collection of datasets)– Index
• For speeding up query execution
• Data Manipulation Language: SELECT-FROM-WHERE
CSE 414 - Spring 18 19
DataverseA Dataverse is a Database (i.e., collection of tables)
CREATE DATAVERSE myDBCREATE DATAVERSE myDB IF NOT EXISTS
DROP DATAVERSE myDBDROP DATAVERSE myDB IF EXISTS
USE myDB20
Type
• Defines the schema of a collection• It lists all required fields• Fields followed by ? are optional
• CLOSED type = no other fields allowed• OPEN type = other fields allowed
CSE 414 - Spring 18 21
Closed Types
22
USE myDB;
DROP TYPE PersonType IF EXISTS;
CREATE TYPE PersonType AS CLOSED {
name: string,
age: int,
email: string?
}
{"name": "Alice", "age": 30, "email": "[email protected]"}
{"name": "Bob", "age": 40}
-- not OK:{"name": "Carol", "phone": "123456789"}
Open Types
23
{"name": "Alice", "age": 30, "email": "[email protected]"}
{"name": "Bob", "age": 40}
-- now it’s OK:{"name": "Carol", "age":20, "phone": "123456789"}
USE myDB;DROP TYPE PersonType IF EXISTS;CREATE TYPE PersonType AS OPEN {
name: string,age: int,email: string?
}
Types with Nested Collections
24
USE myDB;
DROP TYPE PersonType IF EXISTS;
CREATE TYPE PersonType AS CLOSED {
Name : string,
phone: [string]
}
{"Name": "Carol", "phone": ["1234”]}{"Name": "David", "phone": [“2345”, “6789”]}{"Name": "Evan", "phone": []}
Datasets
• Dataset = relation
• Must have a type– Can be a trivial OPEN type
• Must have a key– Can also be a trivial one
CSE 414 - Spring 18 25
Dataset with Existing Key
CSE 414 - Spring 18 26
USE myDB;DROP TYPE PersonType IF EXISTS;CREATE TYPE PersonType AS CLOSED {
name: string,email: string?
}
USE myDB;DROP DATASET Person IF EXISTS;CREATE DATASET Person(PersonType) PRIMARY KEY Name;
{“name”: “Alice”}{“name”: “Bob”}…
Dataset with Auto Generated Key
CSE 414 - Spring 18 27
USE myDB;DROP TYPE PersonType IF EXISTS;CREATE TYPE PersonType AS CLOSED {
myKey: uuid,Name : string,email: string?
}
USE myDB;DROP DATASET Person IF EXISTS;CREATE DATASET Person(PersonType)
PRIMARY KEY myKey AUTOGENERATED;
Note: no myKeyinserted as it isautogenerated
{“name”: “Alice”}{“name”: “Bob”}…