semex a platform for personal information management and ...alon/files/nydbir05.pdfpersonal...
TRANSCRIPT
![Page 1: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/1.jpg)
Semex: a Platform for PersonalInformation Management and
IntegrationAlon Halevy
University of Washington(On Sabbatical @ Stanford & Transformic Inc.)
April 15, 2005NY Area DB/IR Day
Joint work with: Luna Dong, Jayant Madhavan
![Page 2: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/2.jpg)
Did You mean:DB and IR, DB or IR?
Personal information management: Pushes the limits on DB&IR.
Demonstrate some DB&IR issuesthrough the Semex Project.
NSF starting to get interested in PIM: First brainstorming workshop in January. People from DB, IR, HCI, Psychology.
![Page 3: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/3.jpg)
What is PIM?
![Page 4: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/4.jpg)
Questions I Can’t Answer
Find my VLDB04 paper, and the PowerPoint(maybe in an attachment?).Find emails from my Californian friends.Which paper by Ken Ross did I cite in mylatest SIGMOD paper?What quarter was Mary in my class and whatgrade did she get?Which experiment did I run with NF1 andwhich emails discussed them?
![Page 5: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/5.jpg)
Why?
HTMLMail &
calendar Papers Files Presentations
Information is organized by application, notby any semantically meaningful logicalorganization.Vannevar Bush said this in 1946: PersonalMemex.
![Page 6: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/6.jpg)
OriginitatedFrom
EarlyVersion
PublishedIn
ConfHomePage
ExperimentOf
PaperAbout
BudgetOf
Sender
Recipient
CourseGradeIn
AddressOf
Attached
Cites
PresentationFor
CoAuthor
FrequentEmailer
HomePage
![Page 7: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/7.jpg)
Miller Barton MillerR. Miller
Association queries
![Page 8: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/8.jpg)
Association queries
Articles
Contact info
R. Miller
![Page 9: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/9.jpg)
Association queries
Article: “Data drivenunderstanding andrefinement of schemamapping”
IsCitedBy
Article: “The Piazza Peer-data Management Project”
Cites
![Page 10: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/10.jpg)
Association queries
IsCitedBy
Article: “The Piazza Peer-data Management Project”
Cites
![Page 11: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/11.jpg)
PIM vs. Web Search"But there's a fundamental difference betweensearching a universe of documents created bystrangers and searching your on personal library.
When you're free wheeling through ideas that youyourself have collated -- particularly when you'dlong ago forgotten about them -- there's somethingabout the experience that seems uncannily like freewheeling through the corridors of your ownmemory. It feels like thinking."
Steven Johnson, New York Times,January 30, 2005
![Page 12: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/12.jpg)
Semex Over-arching Goals
Create an ‘AHA!’ experience with a PIMsystem “How did I ever live without this?” Extensible to arbitrary associations.
Leverage the PIM environment andknowledge to increase productivity inother tasksBuild a platform for <your cool stuff here>
![Page 13: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/13.jpg)
Leveraging Semex:On-the-Fly Data Integration
Who published at SIGMOD but was not recently on the PC?
![Page 14: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/14.jpg)
Partial Success of DI
EII: Enterprise Information Integration Starting to catch on. See SIGMOD-05 industrial paper for good
perspectives.Mostly in applications such as: Customer Relationship Management Portal construction Frequently occurring queries.
Still quite an effort to set up an integrationscenario.
![Page 15: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/15.jpg)
On-The-Fly IntegrationConference
PC
Presentation
OrganizedBy
publishedIn
Person
Paper
servesOn
Author
presentedIn
Who published at SIGMOD but was not recently on the PC?
![Page 16: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/16.jpg)
Outline
Semex (open) architectureThe glue: reference reconciliationCurrent research “what if’s”, and challenges: Malleable schemas On-the-fly information integration Association queries and indexing Visualizations of personal information More challenges
![Page 17: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/17.jpg)
System Architecture
Word Excel PPT PDF Bibtex Latex Email Contacts
Semi-structured domain Model repository
Association extractor
ReferenceReconciliation
SimpleExtractedExternalDefined
Association extractor Association extractor Association extractor
![Page 18: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/18.jpg)
IR/DB Themes
Axiom: the desktop is the database.
Need to manage any kind of data: Once you touch it, it’s managed!
Schema? Sure! A bit here and there.
![Page 19: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/19.jpg)
Association queries
Referencereconciliation
Halevy
![Page 20: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/20.jpg)
Multi-class Reconciliation
Article: a1=(“Distributed Query Processing”,“169-180”, {p1,p2,p3}, c1)a2=(“Distributed query processing”,“169-180”, {p4,p5,p6}, c2)
Venue: c1=(“ACM Conference on Management of Data”, “1978”,“Austin, Texas”)
c2=(“ACM SIGMOD”, “1978”, null)
Person: p1=(“Robert S. Epstein”, null)p2=(“Michael Stonebraker”, null)p3=(“Eugene Wong”, null)p4=(“Epstein, R.S.”, null)p5=(“Stonebraker, M.”, null)p6=(“Wong, E.”, null)
![Page 21: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/21.jpg)
Reference Reconciliation
Input: A set of references ROutput: A partitioning over R, such that Each partition refers to a single real-world
entity – high precision
Different partitions refer to different entities – high recall
![Page 22: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/22.jpg)
Reference Reconciliation ResultsArticle: a1=(“Distributed Query Processing”,“169-180”, {p1,p2,p3}, c1)
a2=(“Distributed query processing”,“169-180”, {p4,p5,p6}, c2)
Venue: c1=(“ACM Conference on Management of Data”, “1978”,“Austin, Texas”)
c2=(“ACM SIGMOD”, “1978”, null)
Person: p1=(“Robert S. Epstein”, null)p2=(“Michael Stonebraker”, null)p3=(“Eugene Wong”, null)p4=(“Epstein, R.S.”, null)p5=(“Stonebraker, M.”, null)p6=(“Wong, E.”, null)p7=(“Eugene Wong”, “[email protected]”)p8=(null, “[email protected]”)p9=(“mike”, “[email protected]”)
![Page 23: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/23.jpg)
Novel ChallengesArticle: a1=(“Distributed Query Processing”,“169-180”, {p1,p2,p3}, c1)
a2=(“Distributed query processing”,“169-180”, {p4,p5,p6}, c2)
Venue: c1=(“ACM Conference on Management of Data”, “1978”,“Austin, Texas”)
c2=(“ACM SIGMOD”, “1978”, null)
Person: p1=(“Robert S. Epstein”, null)p2=(“Michael Stonebraker”, null)p3=(“Eugene Wong”, null)p4=(“Epstein, R.S.”, null)p5=(“Stonebraker, M.”, null)p6=(“Wong, E.”, null)p7=(“Eugene Wong”, “[email protected]”)p8=(null, “[email protected]”)p9=(“mike”, “[email protected]”)
1. MultipleClasses 3. Multi-value
Attributes
2. LimitedInformation
4. Lack of training data
![Page 24: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/24.jpg)
Applying Traditional Record LinkageAlgorithm
1750
1950
2150
2350
2550
2750
2950
3150
3350
1 2 3 4
Evidence
#(P
ers
on
Pa
rtit
ion
s)
1409
Person references: 24076 Real-world persons:1750
3159
![Page 25: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/25.jpg)
Main Ideas[Dong et. al, SIGMOD 05]
Leverage the context (network) of thereferences.Propagate reconciliation decisionsbetween different classes.Enrich references as we go along.Enforce some integrity constraints.
![Page 26: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/26.jpg)
I. Exploiting Context Information
Associated Reference I – Contact list p5=(“Stonebraker, M.”, null, {p4, p6}) p8=(null, “[email protected]”, {p7}) p6=p7
Associated Reference II – Authored articles p2=(“Michael Stonebraker”, null) p5=(“Stonebraker, M.”, null) p2 and p5 authored the same article
Cross-attribute similarity – Name&email p5=(“Stonebraker, M.”, null) p8=(null, “[email protected]”)
![Page 27: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/27.jpg)
I. Exploiting Context Information
3159
2169 21692096
1750
1950
2150
2350
2550
2750
2950
3150
3350
Attr-wise Name&Email Article Contact
Evidence
#(P
ers
on
Pa
rtit
ion
s)
1409
346
Person references: 24076 Real-world persons:1750
![Page 28: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/28.jpg)
Merging PropagationArticle: a1=(“Distributed Query Processing”,“169-180”, {p1,p2,p3}, c1)
a2=(“Distributed query processing”,“169-180”, {p4,p5,p6}, c2)Venue: c1=(“ACM Conference on Management of Data”, “1978”,
“Austin, Texas”)c2=(“ACM SIGMOD”, “1978”, null)
Perseon: p1=(“Robert S. Epstein”, null)p2=(“Michael Stonebraker”, null)p3=(“Eugene Wong”, null)p4=(“Epstein, R.S.”, null)p5=(“Stonebraker, M.”, null)p6=(“Wong, E.”, null)
![Page 29: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/29.jpg)
3159
2169 21692096
3159
2146 2135
2022
1750
1950
2150
2350
2550
2750
2950
3150
3350
Attr-wise Name&Email Article Contact
Evidence
#(P
ers
on
Pa
rti
tio
ns
)
Traditional Propagation
II. Merging Propagation
Person references: 24076 Real-world persons:1750
![Page 30: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/30.jpg)
III. Reference Enrichment
p2=(“Michael Stonebraker”, null, {p1,p3})p8=(null, “[email protected]”, {p7})p9=(“mike”, “[email protected]”, null)
P8-9 =(“mike”, “[email protected]”, {p7})
![Page 31: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/31.jpg)
III. Reference Enrichment
3159
2169 21692096
3169
2036 2036
19101750
1950
2150
2350
2550
2750
2950
3150
3350
Attr-wise Name&Email Article Contact
Evidence
#(P
ers
on
Part
itio
ns)
Traditional Merge Propagation
Person references: 24076 Real-world persons:1750
![Page 32: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/32.jpg)
3159
2169 21692096
3169
2002 1990
18731750
1950
2150
2350
2550
2750
2950
3150
3350
Attr-wise Name&Email Article Contact
Evidence
#(P
erso
n P
arti
tio
ns)
Traditional Merge Propagation Full
Overall Results
Person references: 24076 Real-world persons:1750
1409
125346
![Page 33: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/33.jpg)
The Dependency Graph(“Distributed…”, “Distributed …”)
(“169-180”, “169-180”)
(a1, a2)(“Michael Stonebraker”, “Stonebraker, M.”)
(p2, p5)
(“Eugene Wong”, “Wong, E.”)
(p3, p6)(c1, c2)
(“ACM …”, “ACM SIGMOD”) (“1978”, “1978”)
Reference similarity Attribute similarity
(“Robert S. Epstein”, “Epstein, R.S.”)
(p1, p4)
![Page 34: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/34.jpg)
Propagation I(“Distributed…”, “Distributed …”)
(“169-180”, “169-180”)
(a1, a2)(“Michael Stonebraker”, “Stonebraker, M.”)
(p2, p5)
(“Eugene Wong”, “Wong, E.”)
(p3, p6)(c1, c2)
(“ACM …”, “ACM SIGMOD”) (“1978”, “1978”)
(“Robert S. Epstein”, “Epstein, R.S.”)
(p1, p4)
Reconciled Similar
![Page 35: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/35.jpg)
Propagation II(“Distributed…”, “Distributed …”)
(“169-180”, “169-180”)
(a1, a2)(“Michael Stonebraker”, “Stonebraker, M.”)
(p2, p5)
(“Eugene Wong”, “Wong, E.”)
(p3, p6)(c1, c2)
(“ACM …”, “ACM SIGMOD”) (“1978”, “1978”)
(“Robert S. Epstein”, “Epstein, R.S.”)
(p1, p4)
Reconciled Similar
![Page 36: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/36.jpg)
Propagation III(“Distributed…”, “Distributed …”)
(“169-180”, “169-180”)
(a1, a2)(“Michael Stonebraker”, “Stonebraker, M.”)
(p2, p5)
(“Eugene Wong”, “Wong, E.”)
(p3, p6)(c1, c2)
(“ACM …”, “ACM SIGMOD”) (“1978”, “1978”)
(“Robert S. Epstein”, “Epstein, R.S.”)
(p1, p4)
Reconciled Similar
![Page 37: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/37.jpg)
Propagation IV(“Distributed…”, “Distributed …”)
(“169-180”, “169-180”)
(a1, a2)(“Michael Stonebraker”, “Stonebraker, M.”)
(p2, p5)
(“Eugene Wong”, “Wong, E.”)
(p3, p6)(c1, c2)
(“ACM …”, “ACM SIGMOD”) (“1978”, “1978”)
(“Robert S. Epstein”, “Epstein, R.S.”)
(p1, p4)
Reconciled Similar
![Page 38: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/38.jpg)
Propagation V(“Distributed…”, “Distributed …”)
(“169-180”, “169-180”)
(a1, a2)(“Michael Stonebraker”, “Stonebraker, M.”)
(p2, p5)
(“Eugene Wong”, “Wong, E.”)
(p3, p6)(c1, c2)
(“ACM …”, “ACM SIGMOD”) (“1978”, “1978”)
(“Robert S. Epstein”, “Epstein, R.S.”)
(p1, p4)
Reconciled Similar
![Page 39: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/39.jpg)
Comparison with UAI Methods
Propagating similarities betweenclasses was investigated with UAImethods: (e.g., Russell and Pasula)Fit everything into a probabilistic modelof the domain.Our approach exploits thedependencies, but does not enforce amodel.
![Page 40: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/40.jpg)
Outline
Semex (open) architectureThe glue: reference reconciliation
Current research “what if’s”, and challenges: Malleable schemas On-the-fly information integration Association queries and indexing Visualizations of personal information More challenges
![Page 41: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/41.jpg)
DB/IR Themes
How do we model an applicationdomain that involves both structuredand unstructured data?
![Page 42: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/42.jpg)
Malleable SchemasMost DB/IR work is on seamless querying: Integration after the fact.
But what about the modeling phase? How can we design applications that manipulate both
kinds of data? Domains where:
Border between two types is not clear or evolving, There is no obvious structure, Structure is not known at modeling time, Complete structure would be too complicated for users.
![Page 43: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/43.jpg)
A Different Example:Web Data Integration
Building a meta-search engine forclassifieds sites on the web.Modeling the class RealEstate. Realize: subclasses are a messy
proposition. Instead: describe subclasses by keywords.
![Page 44: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/44.jpg)
Malleable Schemas:Keywords as Schema Constructs
Web: modeling the class RealEstate. Realize: subclasses are a messy proposition. Instead: describe subclasses by keywords.
PIM: modeling property Participant. Realize: there are many shades of participation. Instead: describe variants with keywords.
Key point: keywords are seen asreplacements for some schema constructs.
![Page 45: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/45.jpg)
Importing External DataConference
PC
Presentation
OrganizedBy
publishedIn
Person
Paper
servesOn
Author
presentedIn
Who published at SIGMOD but was not recently on the PC?
![Page 46: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/46.jpg)
Importing Data w/Background Knowledge
We know a lot about the domain model andits possible instances: Schema matching is easier: [A la Doan et.] Reference reconciliation is easier: [Etzioni and
Perkowitz, 95] Wrapper construction: [Kushmerick et al, 97]
Leverage: The user’s past actions Colleagues with the same data needs
Challenge: matching relationships (inaddition to attributes).
![Page 47: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/47.jpg)
Help When Looking at Data
View external sources from myperspective: Highlight people I know (and why) on a
web pageFill in blanks: Tell me which people may be missing in a
spreadsheet I’ve received Suggest other names (papers, etc.) when
I’m creating a list.
![Page 48: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/48.jpg)
Association Querying
Find objects not referenced in thequery: Ask for Semex: Get Luna Dong, Jayant
MadhavanLearn interesting/useful associationpaths: Co-author, collaborator, relatedProject
Rank lists intelligently (lineage)
![Page 49: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/49.jpg)
Views on Personal Information
Semex enables multiple views onpersonal information: See everything about a project View the progression of a project or paper Activity clustering [Mitchell does for email] Pointers to external resources
![Page 50: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/50.jpg)
From PIM to G(roup)IM
We would like to share: Subsets of our data Fragments of the domain model
Create personal profiles for: Better web search, online shopping, ad
placement.Manage information along a social network.To share or not to share?
![Page 51: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/51.jpg)
Summary
The goal of Semex is to bring the benefits ofdata management to the desktop Needs to be invisible! Automatically create associations between data
items.Fundamental challenges to DB/IR: Manage everything Exploit schema(S) when you them them Model data flexibly Support new types of queries.
“The most profound technologies arethose that disappear”. Mark Weiser
![Page 52: Semex a Platform for Personal Information Management and ...alon/files/NYDBIR05.pdfPersonal information management: Pushes the limits on DB&IR. Demonstrate some DB&IR issues through](https://reader034.vdocuments.site/reader034/viewer/2022052007/601c001793277f65fa351aeb/html5/thumbnails/52.jpg)
Some References
Overview: CIDR 2005Reference Reconciliation: SIGMOD 2005A cool demo: SIGMOD 2005The website:http://data.cs.washington.edu/semex/NSF PIM Workshop:http://pim.ischool.washington.edu/