adaptive schema databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf ·...
TRANSCRIPT
![Page 1: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter](https://reader030.vdocuments.site/reader030/viewer/2022040713/5e19ade5bcebbf0c59508bfd/html5/thumbnails/1.jpg)
AdaptiveSchemaDatabases
WilliamSpothb,Bahareh SadatArabi,EricS.Chano,DieterGawlicko,AdelGhoneimyo,BorisGlavici,BedaHammerschmidto,OliverKennedyb,
Seokki Leei,ZhenHua Liuo,XingNiui,YingYangb
b:UniversityatBuffalo i:IllinoisInst.Tech. o:Oracle
1
![Page 2: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter](https://reader030.vdocuments.site/reader030/viewer/2022040713/5e19ade5bcebbf0c59508bfd/html5/thumbnails/2.jpg)
AdaptiveSchemaDatabases
2
![Page 3: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter](https://reader030.vdocuments.site/reader030/viewer/2022040713/5e19ade5bcebbf0c59508bfd/html5/thumbnails/3.jpg)
Classicrelationaldatabase• Navigationalandorganizationalpurposeretaindiscovery,goodperformanceandspace,reusable.
3
![Page 4: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter](https://reader030.vdocuments.site/reader030/viewer/2022040713/5e19ade5bcebbf0c59508bfd/html5/thumbnails/4.jpg)
Classicrelationaldatabase• But...Highupfrontcostandinflexible
4
![Page 5: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter](https://reader030.vdocuments.site/reader030/viewer/2022040713/5e19ade5bcebbf0c59508bfd/html5/thumbnails/5.jpg)
BigData/NOSQL• Datacanbeusedimmediately.
5
![Page 6: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter](https://reader030.vdocuments.site/reader030/viewer/2022040713/5e19ade5bcebbf0c59508bfd/html5/thumbnails/6.jpg)
BigData/NOSQL• But...SacrificenavigationalandPerformancebenefitandmayendupwithduplicateofwork
6
![Page 7: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter](https://reader030.vdocuments.site/reader030/viewer/2022040713/5e19ade5bcebbf0c59508bfd/html5/thumbnails/7.jpg)
AdaptiveSchemaDatabases
Queriesandfeedback...
eventually
• BridgethegapbetweenrelationaldatabaseandNoSQl.
7
![Page 8: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter](https://reader030.vdocuments.site/reader030/viewer/2022040713/5e19ade5bcebbf0c59508bfd/html5/thumbnails/8.jpg)
AdaptiveSchemaDatabases
Queriesandfeedback...
eventually
• BridgethegapbetweenrelationaldatabaseandNoSQl.
8
![Page 9: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter](https://reader030.vdocuments.site/reader030/viewer/2022040713/5e19ade5bcebbf0c59508bfd/html5/thumbnails/9.jpg)
AdaptiveSchemaDatabases
Input:
Queries:SELECTnameFROMUndergradUNIONSELECTnameFROMGrad
SELECTdeg FROMGrad
SELECTnameFROMStudent
…
9
![Page 10: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter](https://reader030.vdocuments.site/reader030/viewer/2022040713/5e19ade5bcebbf0c59508bfd/html5/thumbnails/10.jpg)
Outline
Unstructured Data Semi-structed Data (e.g., JSON)
Extraction workflow
SchemaWorkspace
Schema Matching
Extraction workflow Extraction workflow
Extraction Schema Candidates
SchemaWorkspace
SchemaWorkspace
SchemaWorkspace
Queries + Feedback• Extractionanddiscovery• Adaptive,personalizedschemas
fromqueries• Explanationsandfeedback• Adaptiveorganization• Conclusionsandfuturework
10
![Page 11: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter](https://reader030.vdocuments.site/reader030/viewer/2022040713/5e19ade5bcebbf0c59508bfd/html5/thumbnails/11.jpg)
Extraction
Unstructured Data Semi-structed Data (e.g., JSON)
Extraction workflow
SchemaWorkspace
Schema Matching
Extraction workflow Extraction workflow
Extraction Schema Candidates
SchemaWorkspace
SchemaWorkspace
SchemaWorkspace
Queries + Feedback
11
![Page 12: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter](https://reader030.vdocuments.site/reader030/viewer/2022040713/5e19ade5bcebbf0c59508bfd/html5/thumbnails/12.jpg)
• ASDextractsschemacandidateset
Giveninput:
12
Extraction
![Page 13: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter](https://reader030.vdocuments.site/reader030/viewer/2022040713/5e19ade5bcebbf0c59508bfd/html5/thumbnails/13.jpg)
• ASDextractsschemacandidateset
Giveninput:
13
Extraction
![Page 14: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter](https://reader030.vdocuments.site/reader030/viewer/2022040713/5e19ade5bcebbf0c59508bfd/html5/thumbnails/14.jpg)
• ASDextractsschemacandidateset
Giveninput:
14
Extraction
![Page 15: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter](https://reader030.vdocuments.site/reader030/viewer/2022040713/5e19ade5bcebbf0c59508bfd/html5/thumbnails/15.jpg)
Extraction• ASDextractsschemacandidateset
Giveninput:
15
![Page 16: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter](https://reader030.vdocuments.site/reader030/viewer/2022040713/5e19ade5bcebbf0c59508bfd/html5/thumbnails/16.jpg)
• ASDextractsschemacandidateset
schemacandidatesetCext={Sext,Pext},whereSext isasetofcandidateschemas,Pext isaprobabilitydistributionovertheseschemas.
16
Discovery
![Page 17: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter](https://reader030.vdocuments.site/reader030/viewer/2022040713/5e19ade5bcebbf0c59508bfd/html5/thumbnails/17.jpg)
• ASDextractsschemacandidateset
Smax:thebestguessschema
17
Discovery
![Page 18: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter](https://reader030.vdocuments.site/reader030/viewer/2022040713/5e19ade5bcebbf0c59508bfd/html5/thumbnails/18.jpg)
Adaptive,personalizedschemasfromqueries
Unstructured Data Semi-structed Data (e.g., JSON)
Extraction workflow
SchemaWorkspace
Schema Matching
Extraction workflow Extraction workflow
Extraction Schema Candidates
SchemaWorkspace
SchemaWorkspace
SchemaWorkspace
Queries + Feedback
18
![Page 19: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter](https://reader030.vdocuments.site/reader030/viewer/2022040713/5e19ade5bcebbf0c59508bfd/html5/thumbnails/19.jpg)
• ASDmaintainsasetofschemaworkspacesW={W1,...,Wn}.
Initially,W={}
19
Adaptive,personalizedschemas
![Page 20: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter](https://reader030.vdocuments.site/reader030/viewer/2022040713/5e19ade5bcebbf0c59508bfd/html5/thumbnails/20.jpg)
• ASDmaintainsasetofschemaworkspacesW={W1,...,Wn}.
20
Query1:SELECTname FROMUndergrad UNIONSELECTname FROMGrad
FindingSchemasfromQueries
![Page 21: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter](https://reader030.vdocuments.site/reader030/viewer/2022040713/5e19ade5bcebbf0c59508bfd/html5/thumbnails/21.jpg)
• ASDmaintainsasetofschemaworkspacesW={W1,...,Wn}.
21
FindingSchemasfromQueries
Query1:SELECTname FROMUndergrad UNIONSELECTname FROMGrad
![Page 22: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter](https://reader030.vdocuments.site/reader030/viewer/2022040713/5e19ade5bcebbf0c59508bfd/html5/thumbnails/22.jpg)
• ASDmaintainsasetofschemaworkspacesW={W1,...,Wn}.
22
Query2:SELECTdeg FROMGrad
FindingSchemasfromQueries
![Page 23: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter](https://reader030.vdocuments.site/reader030/viewer/2022040713/5e19ade5bcebbf0c59508bfd/html5/thumbnails/23.jpg)
• ASDmaintainsasetofschemaworkspacesW={W1,...,Wn}.
Query3:SELECTnameFROMStudent
W1 =(S1={Undergrad(name)},P1=0.27),(S1={Grad(name)},P1=0.23),(S1={Undergrad(name), Grad(name)},P1=0.5)
23
SynthesizingTables
![Page 24: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter](https://reader030.vdocuments.site/reader030/viewer/2022040713/5e19ade5bcebbf0c59508bfd/html5/thumbnails/24.jpg)
Explanationsandfeedback
Unstructured Data Semi-structed Data (e.g., JSON)
Extraction workflow
SchemaWorkspace
Schema Matching
Extraction workflow Extraction workflow
Extraction Schema Candidates
SchemaWorkspace
SchemaWorkspace
SchemaWorkspace
Queries + Feedback
24
![Page 25: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter](https://reader030.vdocuments.site/reader030/viewer/2022040713/5e19ade5bcebbf0c59508bfd/html5/thumbnails/25.jpg)
Extractionerrorsappearinthreeforms:(1)AqueryincompatiblewithSmax
(2)AnupdatewithdatathatviolatesSmax(3)Anextractionerrorpresentedtouser
Weprovide:(1)explanationofresults(2)provenance(3)Warn theanalystwithambiguity(4)Explain theambiguity(5)Evaluate themagnitudeofambiguity(6)Assisttheanalysttoresolve theambiguity
Whatmightgowrong
25
![Page 26: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter](https://reader030.vdocuments.site/reader030/viewer/2022040713/5e19ade5bcebbf0c59508bfd/html5/thumbnails/26.jpg)
ASDinteractswiththeoutsideworld:Schema,Data,andUpdate.
Schemainteractions:WhenaqueryincompatiblewithSmax andtheworkspace
Datainteractions:provenanceforattributeandrowlevelambiguity.
Updateinteractions:• representschemamismatchesasmissingvalues.• resolvedataerrorswithaprobabilisticrepair.• upgradeherschematomatchthechanges.• checkpointherworkspaceandignorenewupdates.
Typesoferrors
26
![Page 27: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter](https://reader030.vdocuments.site/reader030/viewer/2022040713/5e19ade5bcebbf0c59508bfd/html5/thumbnails/27.jpg)
Explanationsandfeedback
Explanations:WematchStudentwith
bothGradandUndergrad
27
Condition2:Queryfromunknown schemaelements:SELECTnameFROMStudent
W1 =(S1={Undergrad(name)},P1=0.27),(S1={Grad(name)},P1=0.23),(S1={Undergrad(name), Grad(name)},P1=0.5)
![Page 28: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter](https://reader030.vdocuments.site/reader030/viewer/2022040713/5e19ade5bcebbf0c59508bfd/html5/thumbnails/28.jpg)
Adaptiveorganization
Unstructured Data Semi-structed Data (e.g., JSON)
Extraction workflow
SchemaWorkspace
Schema Matching
Extraction workflow Extraction workflow
Extraction Schema Candidates
SchemaWorkspace
SchemaWorkspace
SchemaWorkspace
Queries + Feedback
28
![Page 29: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter](https://reader030.vdocuments.site/reader030/viewer/2022040713/5e19ade5bcebbf0c59508bfd/html5/thumbnails/29.jpg)
AdaptiveorganizationTrade-offbetweenstoringdatainitsnativeformatandbasedonaspecificschema.
Whatisthechallenge?Manyworkspaces,addtabletotheschema,….
ChallengesandPossibleSolutions:• Wewantmultiplepersonalizedschemas
1.Relationalworkspaceschemaisessentiallyaviewoverrawdata.Materializingviewcanbeused.
2.Useexistingadaptivephysicaldesign andcaching techniques.• Sharedmaterializations
1.Incrementalmaterializedviewmaintenance.Leveragetechniquesfromrevisioncontrolsystems.
2.Viewselectionproblem.29
![Page 30: Adaptive Schema Databases - cidrdb.orgcidrdb.org/cidr2017/slides/p84-spoth-cidr17-slides.pdf · Adaptive Schema Databases William Spothb, BaharehSadat Arabi, Eric S. Chano, Dieter](https://reader030.vdocuments.site/reader030/viewer/2022040713/5e19ade5bcebbf0c59508bfd/html5/thumbnails/30.jpg)
Conclusionsandfuturework
ASDbridgesthegapbetweenrelationaldatabasesandNoSQL.
• Discovery:Helpuserexploreandunderstandnewdatabyprovidinganoutlineoftheavailableinformation.Done
• Materialization:Adoptworkonadaptivedatastructures.Partiallydone• DataSynthesis:Synthesisnewtablesandattributesfromexistingdata.
Done• ConflictResponse:
– Versioningorbranchingtheschema.– Loganalysistohelpusersassesstheimpactofschemarevisions.
30