interacvely building geospaal mashupsdp) ‐review package ‐30 minutes tutorial pracce ‐2‐3...

50
1 Interac(vely Building Geospa(al Mashups Craig A. Knoblock University of Southern California Work in collabora(on with Shubham Gupta, Pedro Szekely, and RaHapoom Tuchinda

Upload: others

Post on 10-May-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

1

Interac(vely Building  Geospa(al Mashups  

Craig A. Knoblock  University of Southern California 

Work in collabora(on with Shubham Gupta,  Pedro Szekely, and RaHapoom Tuchinda 

Page 2: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

MASHUPS 

•  A website or application that combines content from more than one source into an integrated experience [wikipedia]

a) LA crime map c) Ski bonk b) zillow.com

Combined Data gives new insight / provides new services

- Crime Report from different counties - Map

- Real Estate Listing -Property Tax

- Weather -Snow Report -Snow Resorts

Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

Page 3: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

PROBLEM 

•  Most Mashups require significant exper(se to create 

•  Demand for crea(ng integrated applica(ons is huge 

•  Every user has their own unique requirements for an integrated applica(on 

•  Available sources and needs to integrated data con(nues to grow 

Page 4: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

MASHUP BUILDING ISSUES 

4

Wrapper  Wrapper Data Retrieval 

Clean  Clean 

AHribute  AHribute Calibra(on ‐source modeling ‐cleaning 

Combine Integra(on 

Customize Display 

Display 

Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

Page 5: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

EXISTING APPROACHES 

5

Goal: Create Mashups without Programming •  Doesn’t translate to not having to understand programming 

Yahoo’s Pipes 

Widget Paradigm ‐  Widgets (i.e., 43 for Pipes, 300+ 

for MS) represents an opera(on on the data 

‐  Loca(ng and learning to customize widget can be (me consuming 

‐  Most tools focus on par(cular issues and ignore others 

Can we come up with a framework that addresses all of the  issues while s(ll making the Mashup building process easy? 

Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

Page 6: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

KEY CONTRIBUTIONS 

•  A programming by demonstra(on approach that uses a single table for building a Mashup 

•  An integrated approach that links data extrac(on, source modeling, data cleaning, and data integra(on together 

•  A query formula(on technique that allows users to specify examples to build complicated queries 

6 Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

Page 7: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

KEY IDEAS 

•  Focus on data, not opera(ons – Users are more familiar with data 

•  Leverage exis(ng data – Help source modeling, cleaning, and data integra(on 

•  Consolidate as opposed to Divide‐And‐Conquer –  Solving a problem in one issue can help solve another issue 

–  Interac(ng within a single spreadsheet pladorm 

7 Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

Page 8: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

KARMA USER INTERFACE 

Data Source Types Currently Supported by Karma 

Various Informa(on Integra(on Opera(ons 

Data Table – Spreadsheet Type Interface 

Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

Page 9: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

MAP 

{EvacCenter_ID, Address, City} 

Extract 

{EvacCenter_ID, Address, City 

Evacua(on Centers CSV 

{Date, Injuries, Fatali(es} 

Injury sta(s(cs in Excel Spreadsheet 

Visualize as chart 

Extract 

{Headlines, Summary, Date, Link} 

Google News Website 

Visualize as  bulleted list 

Extract 

Extract 

{Name, City, Phone No.} 

Clean 

Emergency Coordinator MySQL  Database 

, Name, Phone No.} 

Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

INTEGRATION SCENARIO 

Page 10: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

RETRIEVING DATA  FROM DIVERSE SOURCES 

•  Karma facilitates retrieval of data from structured data‐sources, such as Excel spreadsheets, MySQL databases and CSV files 

•  Karma also facilitates the extrac(on of data from semi‐structured data sources such as web pages 

CSV Text File  MySQL Database 

Excel Spreadsheet  HTML Web Page 

Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

Page 11: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

EXTRACTION BY EXAMPLE 

•  The retrieval of data from structured data‐sources, such as Excel sheets and CSV files is done through a drag and drop mechanism 

•  The user is only required to select a sample data‐element and drop it into Karma’s data table 

Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

Page 12: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

EXTRACTION FROM THE WEB 

12

Tbody/tr[1]/td[2]/a TBODY tr tr

td td

1. 2. Japon Bistro

td

a br br

970 E Colora.. Upscale yet affordabl..

td

a br br

8400 Wilshir. Chic elegance…..

   Hokusai

Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

Tbody/tr*/td*/a 

Page 13: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

EXTRACTION FROM THE WEB 

13

TBODY tr tr

td td 1. 2.

Japon Bistro

td a br br

970 E Colora.. Upscale yet affordab

td a br br

8400 Wilshir.    Chic elegance…

  Hokusai

Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

Page 14: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

EXPLOITING WRAPPER LIBRARIES 

Wrapper Library: Karma lists all the available wrappers on 

the local machine. 

Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

Page 15: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

SOURCE MODELING 

•  Karma automa(cally generates the seman(c types of each aHribute to learn the underlying model of the data source 

•  Supervised machine learning techniques are used to generate a set of paHerns for each seman(c type from training data 

Ini(al Type  Manually label the data with the correct seman(c type to train Karma 

When the new data is imported of same type, Karma automa(cally labels it correctly 

Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

Page 16: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

LEARNING SEMANTIC TYPES 

:StreetAddress: :Email: 4DIG CAPS Rd [email protected] 3DIG N CAPS Ave [email protected] … … :State: :Telephone: CA (3DIG) 3DIG-4DIG 2UPPER +1 3DIG 2DIG 4DIG … …

Background knowledge

learn  Patterns

label 

 Idea: Learn a model of the content of data and use it to recognize new examples

Page 17: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

DATA CLEANING •  Karma performs the data cleaning by learning and applying the 

transforma(on rules that are learned from examples 

Ini(al data source  User provides example 

Karma learns a transforma(on rule and applies to remaining data Data source auer cleaning 

Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

Page 18: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

DATA CLEANING: PREDEFINED TRANSFORMATIONS 

18

.

.

.

Predefined Rules 

31 Reviews →  31 

Subset Rule: (s1s2..sk) → (d1d2…dt)  ∧  (k  <=  t) ∧  si ∈ {d1,d2,…,dt} ∧ di ≠ dj 

Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

Page 19: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

DATA INTEGRATION 

•  Karma discovers the related sources by detec(ng and ranking associa(ons based on the common aHribute names and matching seman(c types 

•  Karma suggests poten(al joins between the current data sources in the form of column comple(ons 

Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

Page 20: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

USER SELECTS FROM  COLUMN COMPLETIONS 

MySQL Database loaded as a another source in Karma 

Karma suggests the possible column comple(ons in a drop down list 

Karma executes the join query once the user selects an op(on 

Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

Page 21: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

DATA VISUALIZATION •  Visualiza(on by demonstra(on approach 

–  The user demonstrates to Karma the kind of visualiza(on desired for the data specified through examples using a drag and drop mechanism 

Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

Page 22: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

DATA VISUALIZATION 

Karma currently supports four types of visualiza(on formats: 

1.  Chart Format: Useful for visualizing numerical sta(s(cs, (me based events etc 

2.  Paragraph Format: Useful for visualizing descrip(ve text data such as Wikipedia defini(ons 

Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

Page 23: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

DATA VISUALIZATION 

3.  List Format: Useful for visualizing informa(on in a bulleted list such as list of summarized news ar(cles 

4.  Table Format: Useful for visualizing informa(on that is best presented in a row‐and‐column format such as numerical values etc 

Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

Page 24: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

RESULTS CAN BE PUBLISHED  IN MULTIPLE FORMATS 

•  Karma lets you export your final mashup in variety of formats: 

‐  HTML Page 

‐  Database table 

‐  KML Layer ‐  XML File 

‐  CSV Text File 

Different mashup publishing op(ons 

Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

Page 25: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

AUTOMATICALLY FINDS  GEOSPATIAL REFERENCES 

•  Final mashup output in HTML web page format: ‐  Karma iden(fies geospa(al informa(on in the current data with the help of geographic 

seman(c types such as PR‐Address, PR‐La(tude etc ‐  The Google geocoding service is used to find the coordinates for a given address 

‐  Karma uses the coordinates informa(on to place the markers in the final mashup 

Poten(al geographic informa(on Op(ons to publish mashup as HTML 

web page 

Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

Page 26: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

CONSTRUCTS A MAP WITH  USER‐DEFINED LAYOUT 

•  Final mashup as a HTML web page: 

Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

Page 27: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

RESULTS CAN BE EXPORTED AS KML 

•  Final mashup output as a KML layer 

Op(ons to publish mashup as KML 

layer 

Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

Page 28: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

KML LAYERS CAN BE OPENED  IN GOOGLE EARTH 

The generated KML layer can be viewed in a GIS souware such as Google Earth 

Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

Page 29: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

RESULTS CAN BE STORED IN A DB •  The final mashup data can also be saved into a database table by providing 

the details about the database loca(on, username and password, etc in Karma 

Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

Page 30: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

EVALUATION 

30

•  Baseline: A combina(on of Dapper/Pipes •  Claims: 

1.  Users with no programming experiences can build all four Mashup types. 

2.  Karma takes less (me to complete each subtask and scales beHer as the tasks get harder 

3.  Overall, the user takes less (me to build the same Mashup in Karma compared to Dapper/Pipes 

•  Users: –  Programmers (20) –  Non‐programmers (3) 

Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

Page 31: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

EVALUATION: SETUP 

31 Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

Familiariza(on ‐Programmers   (2 assignments on   DP) ‐Review Package ‐30 minutes tutorial 

Prac(ce ‐ 2‐3 tasks using   Karma 

Test (3 tasks) ‐Programmers: Alterna(ng  between Karma vs. DP for  each task ‐Non Programmers: use only Karma ‐Screen are recorded using  video capture souware 

5 minute cut off (me 

Page 32: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

EVALUATION: TASKS 

•  Claim 1: Users with no programming experiences can build all four Mashup types 

•  Claim 2: When the Mashup subtask is difficult, Karma takes less (me to complete that subtask 

•  Claim 3: Overall, the user takes less (me to build the same Mashup in Karma compared to Dapper/Pipes 

Task No. Mashup Type

Data Extraction

Source Modeling

Data Cleaning

Data Integration

1 1 (1 source)

Moderate Simple Difficult N/A

2 2,3 (union+form)

Difficult Simple Simple Union (simple)

3 4 (join 2 sources)

Simple Simple N/A Join (difficult)

32 Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

Page 33: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

33

Claim 1: Users with no programming experiences can build all four Mashup types 

33 Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

Page 34: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

EVALUATION: NON‐PROGRAMMERS 

34 Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

Page 35: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

35

Claim 2: Karma takes less (me to complete each subtask 

35 Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

Page 36: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

EVALUATION: EXTRACTION 

36 Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

•   As the extrac(on task gets more difficult, Dapper/Pipes takes 

-  longer -  more subjects failing to complete the task (11% for moderate and 25% for difficult) 

Dapper/Pipes  Karma 

Karma  (programmer)  Dapper/Pipes 

Page 37: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

EVALUATION: SOURCE MODELING 

37 Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

•   Karma performed worse in task 1 and tasks 2 

-  only 30 sec difference - subjects take (mes selec(ng aHributes -  the saving will be realized in the data integra(on step. 

•   Karma performed beHer in task 3 because it can automa(cally iden(fy the aHribute 

Dapper/Pipes  Karma 

Page 38: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

EVALUATION: DATA CLEANING 

38 Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

•   Karma performed beHer in both tasks 

•   When the cleaning task gets harder, more subjects are failing in Dapper/Pipes (35% for simple and 83% in hard) 

Dapper/Pipes  Karma 

Page 39: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

EVALUATION: DATA INTEGRATION 

39 Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

•   Because of the table structure, subjects can specify union indirectly by dropping data into the right cell 

•   The (me spent in source modeling step allows Karma to suggest the linking source 

•   Dapper/Pipes: 30% fail in the union case and 95% fail in the join case 

Dapper/Pipes  Karma 

Page 40: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

40

Claim 3: Overall, the user takes less (me to build the same Mashup in Karma compared to Dapper/Pipes 

40 Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

Page 41: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

EVALUATION: OVERALL 

41 Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

Dapper/Pipes  Karma 

Page 42: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

EVALUATION: AVERAGE 

42 Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

2.22x 

0.67x 

4.16x 6.49x 

3.32x 

Dapper/Pipes  Karma 

Page 43: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

RELATED WORK:  MASHUP BUILDING TOOLS 

1: Extrac(on, 2: Union, 3: Form‐based Interac(on, 4: Join  

43 Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

Require an expert 

Mainly focus on extrac(on / linear 

Q/A approach / linear / scalability 

Widgets Fancier UI/ more widgets 

Fewer Widgets / Confusion on workflow 

Early work. Focus on DOM, too basic 

Create points on Map 

RDF / Manually specify data int 

Tuple = card. Drawing links for rela(ons 

Page 44: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

RELATED WORK:  DATA EXTRACTION 

•  Automa(c extrac(on: table and lists only –  RoadRunner (exploit HTML structure)  [Crescenzi et al., 2001] –  Adel (grammer induc(on to detect rows)  [Lerman+ 2001] –  VisualWeb (OCR technique to detect tables) [GaHerbauer+ 2007] 

•  Semi‐Automa(c: require more label examples –  WIEN  (induc(ve – less expressive than stalker) [Kushmerick 1997] –  Stalker (Cotes(ng) [Muslea+ 1999] –  SouMealy   (finite state transducer) [Hsu 1998] –  WHISK (rigid format, exact delimiter) [Soderland 1998] 

•  DOM: rely on well‐formed HTML and less labeling  –  Simile [Huynh+ 2005]  –  Dapper –  Interac(ve Wrapper Genera(on (ML + predic(on on DOM)[Irmak+ 2006] –  PLOW (add natural language) [Allen+ 2007] –  Cards [Dontcheva+ 2007] –  Karma  [Tuchinda+ 2008] 

44 Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

Page 45: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

RELATED WORK:  SOURCE MODELING 

•  1:1 mapping, N:M mapping –  Schema‐level match 

•  TranScm [Milo+ 98] •  DIKE [Palopoli+ 99] •  Artemis [Castano+ 01] •  Delta [Cliuon+ 97] 

–  +Instance‐based matcher •  SemInt  [Li 00] •  LSD [Doan 01] •  ILA [Etzioni 95] •  iMapp [Dhamanka 04] •  Clio (interac(ve) [Ling 01] •  Inducing Source Descrip(on  [Carman 07] 

•  Karma leverages exis(ng techniques to narrow candidate matches –  String Similari(es [Cohen+ 2003] 

45 Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

Page 46: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

RELATED WORK:  DATA CLEANING 

•  Commercial Tools: Focus on wri(ng transforma(on –  ACR/Data, Migra(on Architect  [Chaudhuri+ 1997] 

•  Discrepancy Detec(on: Use as a stepping stone for record linkage and cleaning system –  Levenshtein distance [Needleman+ 70] –  Vector based [Baeza‐Yates+ 99] –  EM [Ristad+ 98] –  SVM [Bilenko+ 03] 

•  Record linkage & cleaning systems: Focus on ranking [Winkler 06] –  Fuzzy Match [Chaudhuri+ 03] –  Apollo [Michalowski+ 05] –  Phoebus [Michelson+ 07] –  PoHer’s wheel [Raman+ 01] 

•  Karma  –  Gains reference sources through source modeling process –  Provides predefined transforma(ons 

46 Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

Page 47: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

RELATED WORK:  DATA INTEGRATION 

•  Universal Rela.on: Make it easier to formulate the query but users s(ll need to formulate the query [Ullman 1980, 1988] 

•  Query by example: Need to know which data sources to use and the query may not return results –  QBE [Zloof 1975] 

•  Retrieval by formula.on: Need to understand domain model to formulate par(al descrip(on –  Helgon [Fischer 1989],  RABBIT [Williams 1982] 

•  Graphical Query Language: Users s(ll need to navigate through sources (graphs)  –  Gql [Benzi 1998,  Haw 1994, Papantonakis 1988] 

•  Ques.on‐Answering Techniques: Understanding about database opera(ons required –  Agent Wizard [Tuchinda+ 2004] 

•  Interac.ve Schema/data integra.on: Understanding about source schema required –  Clio [Ling 01] 

•  Karma is based on Programming by Demonstra.on [Cyper 2001; Lau2001] 47 

Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

Page 48: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

CONCLUSION •  Mashups are a fast growing area –  Need an efficient way to for casual web users to build them 

•  Contribu(ons –  A PBD approach that uses a single table for building a Mashup 

–  An integrated approach that solves the various Mashup building issues 

–  A query formula(on technique that allows users to specify examples to build complicated queries 

•  Evaluated the validity of the Karma approach –  Subjects were able to complete Mashup building tasks in Karma 

–  The overall improvement is at least a factor of 3.5 48 

Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

Page 49: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

FUTURE WORK 

•  Learn and generalize over the task – Store the integra(on plan so that it can be reexecuted on current data 

•  Support the integra(on of geospa(al data types (i.e., vector layers, raster layers) 

•  Improve the techniques for automa(c source modeling  

•  Learn new transforma(ons from examples for data cleaning 

49 Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  • 

Page 50: Interacvely Building Geospaal MashupsDP) ‐Review Package ‐30 minutes tutorial Pracce ‐2‐3 tasks using Karma Test (3 tasks) ‐Programmers: Alternang between Karma vs. DP for

PAPERS •  Building geospa.al mashups to visualize informa.on for crisis 

management.  Shubham Gupta and Craig A. Knoblock. In Proceedings of the 7th Interna2onal Conference on Informa2on Systems for Crisis Response and Management, 2010.  

•  Interac.ve data integra.on through smart copy & paste.   Zachary G. Ives, Craig A. Knoblock, Steven Minton, Marie Jacob, Partha Pra(m Talukdar, RaHapoom Tuchinda, Jose Luis Ambite, Maria Muslea, and Cenk Gazen,  Fourth Biennial Conference on Innova2ve Data Systems Research (CIDR), 2009. 

•  Building mashups by example.  RaHapoom Tuchinda, Pedro Szekely, and Craig A. Knoblock.  Proceedings of the 2008 Interna2onal Conference on Intelligent User Interfaces, 2008 

•  Building data integra.on queries by demonstra.on.  RaHapoom Tuchinda, Pedro Szekely, and Craig A. Knoblock.  In Proceedings of the Interna2onal Conference on Intelligent User Interfaces, 2007 

50 50 Introduc(on  Approach  Evalua(on  Related Work  Conclusion • •  •  •