introduc.on to data management lecture #23 sql nosql j · 3/13/17 1 introduc.on to data management...

13
3/13/17 1 Introduc.on to Data Management Lecture #23 SQL NoSQL (J) Instructor: Mike Carey [email protected] Announcements Homework info: HW #7: Due today (at 5 PM). HW #8 is the end (“NoSQL”)! Due a week from today. Discounted lateness: 5 pts/day for HW #8 The plan: Today: NoSQL & Big Data (a la AsterixDB) Not in book: See paper linked to wiki syllabus! Also see docs on the Apache AsterixDB site.

Upload: others

Post on 16-Oct-2019

22 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduc.on to Data Management Lecture #23 SQL NoSQL J · 3/13/17 1 Introduc.on to Data Management Lecture #23 SQL NoSQL (J) Instructor: Mike Carey mjcarey@ics.uci.edu Announcements

3/13/17

1

Introduc.ontoDataManagement

Lecture#23SQLNoSQL(J)

Instructor:[email protected]

Announcements

•  Homeworkinfo:– HW#7:Duetoday(at5PM).– HW#8istheend(“NoSQL”)!•  Dueaweekfromtoday.•  Discountedlateness:5pts/dayforHW#8

•  Theplan:– Today:NoSQL&BigData(alaAsterixDB)•  Notinbook:Seepaperlinkedtowikisyllabus!•  AlsoseedocsontheApacheAsterixDBsite.

Page 2: Introduc.on to Data Management Lecture #23 SQL NoSQL J · 3/13/17 1 Introduc.on to Data Management Lecture #23 SQL NoSQL (J) Instructor: Mike Carey mjcarey@ics.uci.edu Announcements

3/13/17

2

Last.me:SQL/NoSQLHistoryTalk•  Thepre-rela.onalera•  Therela.onalDBera•  Beyondrowsandcolumns?

1.  Theobject-orientedDBera2.  Theobject-rela.onalDBera3.  TheXMLDBera4.  TheNoSQLDBera

•  Reflec.ons(andthen:AsterixDB)

NowtheNoSQLDBEra?•  NotfromtheDBworld– Distributedsystemsfolks– Alsovariousstartups

•  FromcachesàK/Vusecases– Neededmassivescale-out– OLTP(vs.parallelDB)apps–  Simple,low-latencyAPI– NeedakeyK,butwantnoschemaforV–  Record-levelatomicity,replicaconsistencyvaries

•  Inthecontextofthistalk,NoSQLdoesnotmean– Hadoop(orSQLonHadoop)– Graphdatabasesorgraphanaly.csplagorms

Page 3: Introduc.on to Data Management Lecture #23 SQL NoSQL J · 3/13/17 1 Introduc.on to Data Management Lecture #23 SQL NoSQL (J) Instructor: Mike Carey mjcarey@ics.uci.edu Announcements

3/13/17

3

NoSQLData(JSON-based)

{“id”:“123”,“Customer”:{“custName”:“Fred”,“custCity”:“LA”}“total”:25.97,“Items”:[{“product-sku”:401,“qty”:2,“price”:9.99},{“product-sku”:544,“qty”:1,“price”:3.99}]}

{“sku”:401,“name”:“GarfieldT-Shirt”,“listPrice”:9.99,“size”:“XL”}{“sku”:544,“name”:“USBCharger”,“listPrice”:5.99,“power”:“115V”}

Notethat-Theworld’snotflat,butit’sless<messy/>-We’renowinthe2010’s,.ming-wise

Collec.on(Order) Collec.on(Product)

•  Popularexamples:MongoDB,Couchbase•  Cove.ngthebenefitsofmanyDBgoodies– Secondaryindexingandnon-keyaccess– Declara.vequeries– Aggregatesandnow(ini.allysmall)joins

•  Seemtobeheadingtowards...– BDMS(thinkscalable,OLTP-aimed,parallelDBMS)– Declara.vequeriesandqueryop.miza.on,butappliedtoschema-lessdata

– Returnof(some,op.onal!)schemainforma.on

CurrentNoSQLTrends

Page 4: Introduc.on to Data Management Lecture #23 SQL NoSQL J · 3/13/17 1 Introduc.on to Data Management Lecture #23 SQL NoSQL (J) Instructor: Mike Carey mjcarey@ics.uci.edu Announcements

3/13/17

4

OurExample:ApacheAsterixDB

htp://asterixdb.apache.org/

BigData/WebWarehousing

8

Sowhatwenton–andwhy?

What’sgoingonrightnow?

What’sgoingon…?

Page 5: Introduc.on to Data Management Lecture #23 SQL NoSQL J · 3/13/17 1 Introduc.on to Data Management Lecture #23 SQL NoSQL (J) Instructor: Mike Carey mjcarey@ics.uci.edu Announcements

3/13/17

5

9

Also:Today’sBigDataTangle

(Pig)

SQL

AsterixDB:“OneSizeFitsaBunch”

10

SemistructuredDataManagement

ParallelDatabaseSystems

Data-IntensiveComputing

BDMSDesiderata:•  Flexibledatamodel•  Efficientrun.me•  Fullquerycapability•  Costpropor.onalto

taskathand(!)•  Designedfor

con.nuousdatainges.on

•  Supporttoday’s“BigDatadatatypes”

•  •  • 

Page 6: Introduc.on to Data Management Lecture #23 SQL NoSQL J · 3/13/17 1 Introduc.on to Data Management Lecture #23 SQL NoSQL (J) Instructor: Mike Carey mjcarey@ics.uci.edu Announcements

3/13/17

6

•  BuildanewBigDataManagementSystem(BDMS)–  Runonlargecommodityclusters–  Handlemassquan..esofsemistructureddata–  Openlylayered,forselec.vereusebyothers–  Sharewiththecommunityviaopensource(ASF)

•  Conductscalableinforma.onsystemsresearch,e.g.,–  Large-scalequeryprocessingandworkloadmanagement–  Highlyscalablestorageandindexmanagement–  Fuzzymatching,spa.aldata,date/.medata(allinparallel)–  Novelsupportfor“fastdata”(bothinandout)

•  Trainnextgenera.onof“BigData”graduates11

ProjectGoals

createdataverseTinySocial;usedataverseTinySocial;createtypeMugshotUserTypeas{id:int32,alias:string,name:string,userSince:date.me,address:{street:string,city:string,state:string,zip:string,country:string},friendIds:{{int32}},employment:[EmploymentType]};

ASTERIXDataModel(ADM)

12

createdatasetMugshotUsers(MugshotUserType)primarykeyid;

Highlightsinclude:•  JSON++baseddatamodel•  Richtypesupport(spa.al,temporal,…)•  Records,lists,bags•  Openvs.closedtypes

createtypeEmploymentTypeasopen{organiza.onName:string,startDate:date,endDate:date?};

Page 7: Introduc.on to Data Management Lecture #23 SQL NoSQL J · 3/13/17 1 Introduc.on to Data Management Lecture #23 SQL NoSQL (J) Instructor: Mike Carey mjcarey@ics.uci.edu Announcements

3/13/17

7

createdataverseTinySocial;usedataverseTinySocial;createtypeMugshotUserTypeas{id:int32,alias:string,name:string,user-since:date.me,address:{street:string,city:string,state:string,zip:string,country:string},friend-ids:{{int32}},employment:[EmploymentType]}

createdataverseTinySocial;usedataverseTinySocial;createtypeMugshotUserTypeas{id:int32};

ASTERIXDataModel(ADM)

13

createdatasetMugshotUsers(MugshotUserType)primarykeyid;

Highlightsinclude:•  JSON++baseddatamodel•  Richtypesupport(spa.al,temporal,…)•  Records,lists,bags•  Openvs.closedtypes

createtypeEmploymentTypeasopen{organiza.onName:string,startDate:date,endDate:date?};

createdataverseTinySocial;usedataverseTinySocial;createtypeMugshotUserTypeas{id:int32,alias:string,name:string,user-since:date.me,address:{street:string,city:string,state:string,zip:string,country:string},friend-ids:{{int32}},employment:[EmploymentType]}

createdataverseTinySocial;usedataverseTinySocial;createtypeMugshotUserTypeas{id:int32};createtypeMugshotMessageTypeasclosed{messageId:int32,authorId:int32,.mestamp:date.me,inResponseTo:int32?,senderLoca.on:point?,tags:{{string}},message:string};

ASTERIXDataModel(ADM)

14

createdatasetMugshotUsers(MugshotUserType)primarykeyid;createdatasetMugshotMessages(MugshotMessageType)primarykeymessageId;

Highlightsinclude:•  JSON++baseddatamodel•  Richtypesupport(spa.al,temporal,…)•  Records,lists,bags•  Openvs.closedtypes

createtypeEmploymentTypeasopen{organiza.onName:string,startDate:date,endDate:date?};

Page 8: Introduc.on to Data Management Lecture #23 SQL NoSQL J · 3/13/17 1 Introduc.on to Data Management Lecture #23 SQL NoSQL (J) Instructor: Mike Carey mjcarey@ics.uci.edu Announcements

3/13/17

8

15

{"id":1,"alias":"Margarita","name":"MargaritaStoddard","address":{"street":"234ThomasAve","city":"SanHugo","zip":"98765","state":"CA","country":"USA"},"userSince":date.me("2012-08-20T10:10:00"),"friendIds":{{2,3,6,10}},"employment":[{"organiza.onName":"Codetechno","startDate":date("2006-08-06")}]},{"id":2,"alias":"Isbel","name":"IsbelDull","address":{"street":"345JamesAve","city":"SanHugo","zip":"98765","state":"CA","country":"USA"},"userSince":date.me("2011-01-22T10:10:00"),"friendIds":{{1,4}},"employment":[{"organiza.onName":"Hexviafind","startDate":date("2010-04-27")}]},{"id":3,"alias":"Emory","name":"EmoryUnk","address":{"street":"456JoseAve","city":"SanHugo","zip":"98765","state":"CA","country":"USA"},"userSince":date.me("2012-07-10T10:10:00"),"friendIds":{{1,5,8,9}},"employment":[{"organiza.onName":"geomedia","startDate":date("2010-06-17"),"endDate":date("2010-01-26")}]}...

Ex:MugshotUsersData

createindexmsUserSinceIdxonMugshotUsers(userSince);createindexmsTimestampIdxonMugshotMessages(.mestamp);createindexmsAuthorIdxonMugshotMessages(authorId)typebtree;createindexmsSenderLocIndexonMugshotMessages(senderLoca.on)typertree;createindexmsMessageIdxonMugshotMessages(message)typekeyword;

//---------------------andalso------------------------------------------------------------------------------------

createtypeAccessLogTypeasclosed{ip:string,.me:string,user:string,verb:string,`path`:string,stat:int32,size:int32};createexternaldatasetAccessLog(AccessLogType)usinglocalfs(("path"="{hostname}://{path}"),("format"="delimited-text"),("delimiter"="|"));

createfeedmySocketFeedusingsocket_adaptor(("sockets"="{address}:{port}"),("addressType"="IP"),("type-name"="MugshotMessageType"),("format"="adm"));connectfeedmySocketFeedtodatasetMugshotMessages;

OtherDDLFeatures

16

Externaldatahighlights:•  Equalopportunityaccess•  “Keepeverything!”•  Datainges.on,notstreams

Page 9: Introduc.on to Data Management Lecture #23 SQL NoSQL J · 3/13/17 1 Introduc.on to Data Management Lecture #23 SQL NoSQL (J) Instructor: Mike Carey mjcarey@ics.uci.edu Announcements

3/13/17

9

ASTERIXQueries(SQL++orAQL)•  Q1:Listtheusernameandmessagessentbythoseusers

whojoinedtheMugshotsocialnetworkinacertain.mewindow:

selectuser.nameasuname,(selectvaluemsg.messagefromMugshotMessagesmsgwheremsg.authorId=user.id)asmessagesfromMugshotUsersuserwhereuser.userSince>=date.me('2010-07-22T00:00:00')anduser.userSince<=date.me('2012-07-29T23:59:59');

17

{"uname":"IsbelDull","messages":["likesamsungtheplanisamazing”,"liket-mobileitsplagormismind-blowing"]}{"uname":"EmoryUnk","messages":["lovesprintitsshortcut-menuisawesome:)",...]}

SQL++(cont.)

18

•  Q2:Iden.fyac.veusersandgroup/countthembycountry:

withendTimeascurrent_date.me(),startTimeasendTime-dura.on("P30D")selectuser.address.countryascountry,count(users)asac.veUsersfromMugshotUsersuserwheresomelogrecinAccessLogsaHsfiesuser.alias=logrec.useranddate.me(logrec..me)>=startTimeanddate.me(logrec..me)<=endTimegroupbyuser.address.country; SQL++highlights:

•  Lotsofotherfeatures(seewebsite!)•  Spa.alpredicatesandaggrega.on•  Set-similarity(“fuzzy”)matching

Page 10: Introduc.on to Data Management Lecture #23 SQL NoSQL J · 3/13/17 1 Introduc.on to Data Management Lecture #23 SQL NoSQL (J) Instructor: Mike Carey mjcarey@ics.uci.edu Announcements

3/13/17

10

UpdatesandTransac.ons

19

•  Key-valuestore-liketransac.ons(w/recordlevelatomicity)

•  Insert,delete,andupsertops;index-consistent

•  2PLconcurrency•  WALno-steal,

no-forcewithLSMshadowing

•  Q3:AddanewusertoMugshot.com:

insertintoMugshotUsers({"id":11,"alias":"John","name":"JohnDoe","userSince":date.me("2012-08-20T10:10:00.000Z"),"address":{"street":"789JaneSt","city":"SanHarry","state":"CA","zip":"98767","country":"USA"},"friendIds":{{5,9,11}},"employment":[{"organiza.onName":"Kongreen","startDate":date("2009-08-11")}]});

AsterixDBClusterOverview

2020

Data Loads and Feeds

AQL queries and results

Data publishing

Cluster Controller

MD Node Controller

Node Controller

Node Controller! ! !

Aste

rixD

B

Page 11: Introduc.on to Data Management Lecture #23 SQL NoSQL J · 3/13/17 1 Introduc.on to Data Management Lecture #23 SQL NoSQL (J) Instructor: Mike Carey mjcarey@ics.uci.edu Announcements

3/13/17

11

ASTERIXSo�wareStack

21

Hivesterix Apache VXQuery

Algebricks Algebra Layer M/R LayerPregelix

Hyracks Data-Parallel Platform

Hyracks Job

HadoopM/R JobPregel Job

AQL HiveQL XQuery

AsterixDB

APeekatPerformance

22

Page 12: Introduc.on to Data Management Lecture #23 SQL NoSQL J · 3/13/17 1 Introduc.on to Data Management Lecture #23 SQL NoSQL (J) Instructor: Mike Carey mjcarey@ics.uci.edu Announcements

3/13/17

12

APeekatPerformance(cont.)

#AsterixDB23

•  Poten.alusecaseareasinclude–  Socialdataanaly.cs–  Cellphoneeventanaly.cs–  Behavioralscience–  Educa.on–  Publichealth–  Powerusagemonitoring–  Clustermanagementloganaly.cs–  ....

24

ExampleAsterixDBUseCases

Page 13: Introduc.on to Data Management Lecture #23 SQL NoSQL J · 3/13/17 1 Introduc.on to Data Management Lecture #23 SQL NoSQL (J) Instructor: Mike Carey mjcarey@ics.uci.edu Announcements

3/13/17

13

CurrentStatus

•  4yearini.alNSFproject(250+KLOC),started2009•  NowofficiallyApacheAsterixDB!–  Semistructured“NoSQL”styledatamodel–  Declara.veparallelqueries,inserts,deletes,…–  Datastorage/indexing(primary&secondary,LSM-based)–  Internalandexternaldatasetsbothsupported–  Richsetofdatatypes(includingtext,.me,loca.on)–  Fuzzyandspa.alqueryprocessing–  NoSQL-liketransac.ons(forinserts/deletes)–  Datafeedsandindexesforexternaldatasets–  ....

25

ForMoreInforma.on

•  AsterixprojectUCI/UCRresearchhome–  htp://asterix.ics.uci.edu/

•  ApacheAsterixDBhome–  htps://asterixdb.apache.org/docs/0.9.0/index.html

•  SQL++Primer–  htps://asterixdb.apache.org/docs/0.9.0/sqlpp/primer-sqlpp.html

•  NavigatefromCS122awiki(HW)togetandinstallit!

QUESTIONS?

26