introduc.on to data management lecture #23 sql nosql j · 3/13/17 1 introduc.on to data management...
TRANSCRIPT
3/13/17
1
Introduc.ontoDataManagement
Lecture#23SQLNoSQL(J)
Instructor:[email protected]
Announcements
• Homeworkinfo:– HW#7:Duetoday(at5PM).– HW#8istheend(“NoSQL”)!• Dueaweekfromtoday.• Discountedlateness:5pts/dayforHW#8
• Theplan:– Today:NoSQL&BigData(alaAsterixDB)• Notinbook:Seepaperlinkedtowikisyllabus!• AlsoseedocsontheApacheAsterixDBsite.
3/13/17
2
Last.me:SQL/NoSQLHistoryTalk• Thepre-rela.onalera• Therela.onalDBera• Beyondrowsandcolumns?
1. Theobject-orientedDBera2. Theobject-rela.onalDBera3. TheXMLDBera4. TheNoSQLDBera
• Reflec.ons(andthen:AsterixDB)
NowtheNoSQLDBEra?• NotfromtheDBworld– Distributedsystemsfolks– Alsovariousstartups
• FromcachesàK/Vusecases– Neededmassivescale-out– OLTP(vs.parallelDB)apps– Simple,low-latencyAPI– NeedakeyK,butwantnoschemaforV– Record-levelatomicity,replicaconsistencyvaries
• Inthecontextofthistalk,NoSQLdoesnotmean– Hadoop(orSQLonHadoop)– Graphdatabasesorgraphanaly.csplagorms
3/13/17
3
NoSQLData(JSON-based)
{“id”:“123”,“Customer”:{“custName”:“Fred”,“custCity”:“LA”}“total”:25.97,“Items”:[{“product-sku”:401,“qty”:2,“price”:9.99},{“product-sku”:544,“qty”:1,“price”:3.99}]}
{“sku”:401,“name”:“GarfieldT-Shirt”,“listPrice”:9.99,“size”:“XL”}{“sku”:544,“name”:“USBCharger”,“listPrice”:5.99,“power”:“115V”}
Notethat-Theworld’snotflat,butit’sless<messy/>-We’renowinthe2010’s,.ming-wise
Collec.on(Order) Collec.on(Product)
• Popularexamples:MongoDB,Couchbase• Cove.ngthebenefitsofmanyDBgoodies– Secondaryindexingandnon-keyaccess– Declara.vequeries– Aggregatesandnow(ini.allysmall)joins
• Seemtobeheadingtowards...– BDMS(thinkscalable,OLTP-aimed,parallelDBMS)– Declara.vequeriesandqueryop.miza.on,butappliedtoschema-lessdata
– Returnof(some,op.onal!)schemainforma.on
CurrentNoSQLTrends
3/13/17
4
OurExample:ApacheAsterixDB
htp://asterixdb.apache.org/
BigData/WebWarehousing
8
Sowhatwenton–andwhy?
What’sgoingonrightnow?
What’sgoingon…?
3/13/17
5
9
Also:Today’sBigDataTangle
(Pig)
SQL
AsterixDB:“OneSizeFitsaBunch”
10
SemistructuredDataManagement
ParallelDatabaseSystems
Data-IntensiveComputing
BDMSDesiderata:• Flexibledatamodel• Efficientrun.me• Fullquerycapability• Costpropor.onalto
taskathand(!)• Designedfor
con.nuousdatainges.on
• Supporttoday’s“BigDatadatatypes”
• • •
3/13/17
6
• BuildanewBigDataManagementSystem(BDMS)– Runonlargecommodityclusters– Handlemassquan..esofsemistructureddata– Openlylayered,forselec.vereusebyothers– Sharewiththecommunityviaopensource(ASF)
• Conductscalableinforma.onsystemsresearch,e.g.,– Large-scalequeryprocessingandworkloadmanagement– Highlyscalablestorageandindexmanagement– Fuzzymatching,spa.aldata,date/.medata(allinparallel)– Novelsupportfor“fastdata”(bothinandout)
• Trainnextgenera.onof“BigData”graduates11
ProjectGoals
createdataverseTinySocial;usedataverseTinySocial;createtypeMugshotUserTypeas{id:int32,alias:string,name:string,userSince:date.me,address:{street:string,city:string,state:string,zip:string,country:string},friendIds:{{int32}},employment:[EmploymentType]};
ASTERIXDataModel(ADM)
12
createdatasetMugshotUsers(MugshotUserType)primarykeyid;
Highlightsinclude:• JSON++baseddatamodel• Richtypesupport(spa.al,temporal,…)• Records,lists,bags• Openvs.closedtypes
createtypeEmploymentTypeasopen{organiza.onName:string,startDate:date,endDate:date?};
3/13/17
7
createdataverseTinySocial;usedataverseTinySocial;createtypeMugshotUserTypeas{id:int32,alias:string,name:string,user-since:date.me,address:{street:string,city:string,state:string,zip:string,country:string},friend-ids:{{int32}},employment:[EmploymentType]}
createdataverseTinySocial;usedataverseTinySocial;createtypeMugshotUserTypeas{id:int32};
ASTERIXDataModel(ADM)
13
createdatasetMugshotUsers(MugshotUserType)primarykeyid;
Highlightsinclude:• JSON++baseddatamodel• Richtypesupport(spa.al,temporal,…)• Records,lists,bags• Openvs.closedtypes
createtypeEmploymentTypeasopen{organiza.onName:string,startDate:date,endDate:date?};
createdataverseTinySocial;usedataverseTinySocial;createtypeMugshotUserTypeas{id:int32,alias:string,name:string,user-since:date.me,address:{street:string,city:string,state:string,zip:string,country:string},friend-ids:{{int32}},employment:[EmploymentType]}
createdataverseTinySocial;usedataverseTinySocial;createtypeMugshotUserTypeas{id:int32};createtypeMugshotMessageTypeasclosed{messageId:int32,authorId:int32,.mestamp:date.me,inResponseTo:int32?,senderLoca.on:point?,tags:{{string}},message:string};
ASTERIXDataModel(ADM)
14
createdatasetMugshotUsers(MugshotUserType)primarykeyid;createdatasetMugshotMessages(MugshotMessageType)primarykeymessageId;
Highlightsinclude:• JSON++baseddatamodel• Richtypesupport(spa.al,temporal,…)• Records,lists,bags• Openvs.closedtypes
createtypeEmploymentTypeasopen{organiza.onName:string,startDate:date,endDate:date?};
3/13/17
8
15
{"id":1,"alias":"Margarita","name":"MargaritaStoddard","address":{"street":"234ThomasAve","city":"SanHugo","zip":"98765","state":"CA","country":"USA"},"userSince":date.me("2012-08-20T10:10:00"),"friendIds":{{2,3,6,10}},"employment":[{"organiza.onName":"Codetechno","startDate":date("2006-08-06")}]},{"id":2,"alias":"Isbel","name":"IsbelDull","address":{"street":"345JamesAve","city":"SanHugo","zip":"98765","state":"CA","country":"USA"},"userSince":date.me("2011-01-22T10:10:00"),"friendIds":{{1,4}},"employment":[{"organiza.onName":"Hexviafind","startDate":date("2010-04-27")}]},{"id":3,"alias":"Emory","name":"EmoryUnk","address":{"street":"456JoseAve","city":"SanHugo","zip":"98765","state":"CA","country":"USA"},"userSince":date.me("2012-07-10T10:10:00"),"friendIds":{{1,5,8,9}},"employment":[{"organiza.onName":"geomedia","startDate":date("2010-06-17"),"endDate":date("2010-01-26")}]}...
Ex:MugshotUsersData
createindexmsUserSinceIdxonMugshotUsers(userSince);createindexmsTimestampIdxonMugshotMessages(.mestamp);createindexmsAuthorIdxonMugshotMessages(authorId)typebtree;createindexmsSenderLocIndexonMugshotMessages(senderLoca.on)typertree;createindexmsMessageIdxonMugshotMessages(message)typekeyword;
//---------------------andalso------------------------------------------------------------------------------------
createtypeAccessLogTypeasclosed{ip:string,.me:string,user:string,verb:string,`path`:string,stat:int32,size:int32};createexternaldatasetAccessLog(AccessLogType)usinglocalfs(("path"="{hostname}://{path}"),("format"="delimited-text"),("delimiter"="|"));
createfeedmySocketFeedusingsocket_adaptor(("sockets"="{address}:{port}"),("addressType"="IP"),("type-name"="MugshotMessageType"),("format"="adm"));connectfeedmySocketFeedtodatasetMugshotMessages;
OtherDDLFeatures
16
Externaldatahighlights:• Equalopportunityaccess• “Keepeverything!”• Datainges.on,notstreams
3/13/17
9
ASTERIXQueries(SQL++orAQL)• Q1:Listtheusernameandmessagessentbythoseusers
whojoinedtheMugshotsocialnetworkinacertain.mewindow:
selectuser.nameasuname,(selectvaluemsg.messagefromMugshotMessagesmsgwheremsg.authorId=user.id)asmessagesfromMugshotUsersuserwhereuser.userSince>=date.me('2010-07-22T00:00:00')anduser.userSince<=date.me('2012-07-29T23:59:59');
17
{"uname":"IsbelDull","messages":["likesamsungtheplanisamazing”,"liket-mobileitsplagormismind-blowing"]}{"uname":"EmoryUnk","messages":["lovesprintitsshortcut-menuisawesome:)",...]}
SQL++(cont.)
18
• Q2:Iden.fyac.veusersandgroup/countthembycountry:
withendTimeascurrent_date.me(),startTimeasendTime-dura.on("P30D")selectuser.address.countryascountry,count(users)asac.veUsersfromMugshotUsersuserwheresomelogrecinAccessLogsaHsfiesuser.alias=logrec.useranddate.me(logrec..me)>=startTimeanddate.me(logrec..me)<=endTimegroupbyuser.address.country; SQL++highlights:
• Lotsofotherfeatures(seewebsite!)• Spa.alpredicatesandaggrega.on• Set-similarity(“fuzzy”)matching
3/13/17
10
UpdatesandTransac.ons
19
• Key-valuestore-liketransac.ons(w/recordlevelatomicity)
• Insert,delete,andupsertops;index-consistent
• 2PLconcurrency• WALno-steal,
no-forcewithLSMshadowing
• Q3:AddanewusertoMugshot.com:
insertintoMugshotUsers({"id":11,"alias":"John","name":"JohnDoe","userSince":date.me("2012-08-20T10:10:00.000Z"),"address":{"street":"789JaneSt","city":"SanHarry","state":"CA","zip":"98767","country":"USA"},"friendIds":{{5,9,11}},"employment":[{"organiza.onName":"Kongreen","startDate":date("2009-08-11")}]});
AsterixDBClusterOverview
2020
Data Loads and Feeds
AQL queries and results
Data publishing
Cluster Controller
MD Node Controller
Node Controller
Node Controller! ! !
Aste
rixD
B
3/13/17
11
ASTERIXSo�wareStack
21
Hivesterix Apache VXQuery
Algebricks Algebra Layer M/R LayerPregelix
Hyracks Data-Parallel Platform
Hyracks Job
HadoopM/R JobPregel Job
AQL HiveQL XQuery
AsterixDB
APeekatPerformance
22
3/13/17
12
APeekatPerformance(cont.)
#AsterixDB23
• Poten.alusecaseareasinclude– Socialdataanaly.cs– Cellphoneeventanaly.cs– Behavioralscience– Educa.on– Publichealth– Powerusagemonitoring– Clustermanagementloganaly.cs– ....
24
ExampleAsterixDBUseCases
3/13/17
13
CurrentStatus
• 4yearini.alNSFproject(250+KLOC),started2009• NowofficiallyApacheAsterixDB!– Semistructured“NoSQL”styledatamodel– Declara.veparallelqueries,inserts,deletes,…– Datastorage/indexing(primary&secondary,LSM-based)– Internalandexternaldatasetsbothsupported– Richsetofdatatypes(includingtext,.me,loca.on)– Fuzzyandspa.alqueryprocessing– NoSQL-liketransac.ons(forinserts/deletes)– Datafeedsandindexesforexternaldatasets– ....
25
ForMoreInforma.on
• AsterixprojectUCI/UCRresearchhome– htp://asterix.ics.uci.edu/
• ApacheAsterixDBhome– htps://asterixdb.apache.org/docs/0.9.0/index.html
• SQL++Primer– htps://asterixdb.apache.org/docs/0.9.0/sqlpp/primer-sqlpp.html
• NavigatefromCS122awiki(HW)togetandinstallit!
QUESTIONS?
26