Transcript
Page 1: Structured Data in Web Search

Structured Data on the Web

Alon HalevyGoogle

May 23, 2014

Joint work with: Jayant Madhavan, Cong Yu, Fei Wu, Hongrae Lee, Warren ShenAnish Das Sarma, Rahul Gupta, Boulos Harb, Zack Ives, Afshin Rostamizadeh, Sree Balakrishnan, Anno Langen, Steven Whang, Mohamed Yahya, and others

Page 2: Structured Data in Web Search

Structured Data in Search Results

Page 3: Structured Data in Web Search
Page 4: Structured Data in Web Search
Page 5: Structured Data in Web Search

Set QueriesChicago restaurants

Page 6: Structured Data in Web Search

Association Queries

Page 7: Structured Data in Web Search

Data in Movies!

Page 8: Structured Data in Web Search

The Knowledge Graph

Knowledge Graph

Brazil

Brasiliacapital

population2014

2001

mayor

Page 9: Structured Data in Web Search

Query Reformulation

Knowledge Graph

Brazil

Brasiliacapital

population2014

2001

mayor

Brazil capitalWhat is the capital of

Brazil“Google, tell me the

capital of brazil”

Brazil nuts Culture of Brazil “Google, will Brazil

win the world cup?”

Page 10: Structured Data in Web Search

Other Sources of Data

Knowledge Graph

Brazil

Brasiliacapital

population2014

2001

mayor

Brazil capital

The population of Brasilia is 2207718 according to the GeoNames geographical

database

Tables Text

Page 11: Structured Data in Web Search

Answer Queries Directly from Web?

Brazil capital

The population of Brasilia is 2207718 according to the GeoNames geographical

database

Tables Text

Knowledge Graph

Brazil

Brasiliacapital

population2014

2001

mayor

Page 12: Structured Data in Web Search

The Web vs. the Knowledge Graph

Page 13: Structured Data in Web Search

Tables, Tables

Brazil capital

The population of Brasilia is 2207718 according to the GeoNames geographical

database

Tables Text

Knowledge Graph

Brazil

Brasiliacapital

population2014

2001

mayor

Fusion Tables: Enabling a broad range of users to create tabular content

WebTables: Finding good HTML tables on the Web

Page 14: Structured Data in Web Search

• City planning

• Sustainability: water, coffee, …

• Crisis response

• Advancing public discourse (e.g., gun control)

• Data philanthropy – corporations encouraged to contribute data to the good of society.

Page 15: Structured Data in Web Search

Background for Coffee Examples

Page 16: Structured Data in Web Search

Fusion Tablesgoogle.com/fusiontables

[SIGMOD 2010, SIGMOD 2012]

• Goal: an easy-to-use database system that is integrated with the Web.

• Key: support common workflows– Easy upload (CSV, KML, spreadsheets)– Sharing (even outside your company)– Visualizations front and center– Easy publishing

• Goal 2: Fusion in the data cloud -- discover others’ data and combine with yours.

Page 17: Structured Data in Web Search

Coffee Producing Countries

Page 18: Structured Data in Web Search

Coffee Consumption Per Capita

Page 19: Structured Data in Web Search
Page 20: Structured Data in Web Search

Big Data for Regular People

Table Facts:

English poverty rates:32,000 wards with a total of 1.8 million verticesColors indicate poverty levels

2011 Rioting:2100 incidentsColors indicate addresses of Rioting and Rioters

Best UK Internet Journalist

Knight-Batten Award for Innovations in Journalism

Page 21: Structured Data in Web Search
Page 22: Structured Data in Web Search
Page 23: Structured Data in Web Search

Crowd Sourcing

Page 24: Structured Data in Web Search
Page 25: Structured Data in Web Search
Page 26: Structured Data in Web Search

Data Integration as Search

Page 27: Structured Data in Web Search
Page 28: Structured Data in Web Search

Join with Population Data:What is a City?

Page 29: Structured Data in Web Search

Big Data Integration

Table Facts:

Texas Counties 2010 Census:254 counties with 543000 verticesColored based on various demographics

See SIGMOD 2012 paper for details on scaling map visualizations

Page 30: Structured Data in Web Search

Crowdsourcing Cafes

Page 31: Structured Data in Web Search

HTML Tables

Page 32: Structured Data in Web Search

Search Engine for Data Sets

research.google.com/tables[VLDB 2008, 2011, 2014]

Page 33: Structured Data in Web Search

Give Answers from Tables

Page 34: Structured Data in Web Search

It Better Be Right!

Page 35: Structured Data in Web Search

Answer with a Visualization

Page 36: Structured Data in Web Search

Long Term Goal: A Data-Guided Decision Engine

• Support decision making:– Healthcare debate– Should I install solar in my house?– Which charity should I contribute to?

• Show relevant data– Expose facets of the decision and enable drilldown– Show opposing views

• Manually curated examples of decision engines:– Justfacts.com, followthemoney.com, decide.com

Page 37: Structured Data in Web Search

WebTables on google.com!

Page 38: Structured Data in Web Search

HTML Lists

See Elmeleegy et al., VLDB 2009

Page 39: Structured Data in Web Search

Tree Search

Amish quilts

Parking tickets in India

Horses

The Deep Web [Madhavan et al., VLDB 2008]

Page 40: Structured Data in Web Search

Other Sources of Data

• Spreadsheets• CSV files• Tables embedded in PDF• XML, RDF• Visualizations• Online databases (Fusion Tables, Tableau, …)

Each source has its particularities, but most problems are common to all.

Page 41: Structured Data in Web Search

Non-Tabular Data in HTML

Page 42: Structured Data in Web Search

Vertical Tables

Page 43: Structured Data in Web Search
Page 44: Structured Data in Web Search

Data Optimized for Page Layout

Page 45: Structured Data in Web Search

Tabular Data Optimized for Site Layout

See [Ling et al, IJCAI 2013] for stitching tables within a site.

Page 46: Structured Data in Web Search

Semantics Can Be Brittle

Page 47: Structured Data in Web Search

Semantics are in Text

Page 48: Structured Data in Web Search

The Big Challenge

• Analyze natural language text as it pertains to structured data.

• Different from (open) information extraction that builds databases entirely from text.

• Good news: natural language parsing technology is now scalable.

Page 49: Structured Data in Web Search

First Step: Annotating Columns [Venetis et al., VLDB 2011]

Page 50: Structured Data in Web Search

Step 2: Understanding Relationships

Page 51: Structured Data in Web Search
Page 52: Structured Data in Web Search

Dictionary of Attributes

• I want the list of all attributes that countries may have.

• Freebase doesn’t have coffee production. • Is this an ontology?

– Not quite! I want an ontology suited for search.

Page 53: Structured Data in Web Search

Biperpedia: [VLDB 2014]

Ontology for Search Applications

Page 54: Structured Data in Web Search
Page 55: Structured Data in Web Search

Comparing to Freebase Coverage

Page 56: Structured Data in Web Search

Tower of Babel: Internet Style

In 2013, the coffee production of El Salvador dropped by 20% due to the coffee rust disease.

Coffee production el salvador 2013

El Salvador exports coffee 2013

Knowledge Graph

Tables Text

Page 57: Structured Data in Web Search

Conclusions

• This was a talk about Big Data:– Millions of people creating data sets– Billions of people seeing the data being impacted

• Get out there and find your favorite application.

• Dreams do come true:– At least as it pertains to structured data on the

Web!

Page 58: Structured Data in Web Search
Page 59: Structured Data in Web Search

References

• Fusion Tables: SIGMOD 2010, 2012• WebTables: VLDB 2008, 2009, 2011


Top Related