![Page 1: TableNet: An Approach for Determining Fine-grained …...24.08.2003 3 10.01 28.06.2008 Yoshiihide Kiryu Japan 29.0.4.2013 Automatically extract tables from Wikipedia and efficiently](https://reader030.vdocuments.site/reader030/viewer/2022040823/5e6e0c183c999858d44f9945/html5/thumbnails/1.jpg)
TableNet: An Approach for Determining Fine-grained
Relations for Wikipedia TablesBesnik Fetahu, Avishek Anand, Maria Koutraki
@FetahuBesnikhttps://github.com/bfetahu/wiki_tables
The complete code to be published soon!!
![Page 2: TableNet: An Approach for Determining Fine-grained …...24.08.2003 3 10.01 28.06.2008 Yoshiihide Kiryu Japan 29.0.4.2013 Automatically extract tables from Wikipedia and efficiently](https://reader030.vdocuments.site/reader030/viewer/2022040823/5e6e0c183c999858d44f9945/html5/thumbnails/2.jpg)
Why tables?
![Page 3: TableNet: An Approach for Determining Fine-grained …...24.08.2003 3 10.01 28.06.2008 Yoshiihide Kiryu Japan 29.0.4.2013 Automatically extract tables from Wikipedia and efficiently](https://reader030.vdocuments.site/reader030/viewer/2022040823/5e6e0c183c999858d44f9945/html5/thumbnails/3.jpg)
Rich Factual Information in Tables
3
100 meters — Running Race
![Page 4: TableNet: An Approach for Determining Fine-grained …...24.08.2003 3 10.01 28.06.2008 Yoshiihide Kiryu Japan 29.0.4.2013 Automatically extract tables from Wikipedia and efficiently](https://reader030.vdocuments.site/reader030/viewer/2022040823/5e6e0c183c999858d44f9945/html5/thumbnails/4.jpg)
Rich Factual Information in Tables
3
100 meters — Running RaceSeason's bests
![Page 5: TableNet: An Approach for Determining Fine-grained …...24.08.2003 3 10.01 28.06.2008 Yoshiihide Kiryu Japan 29.0.4.2013 Automatically extract tables from Wikipedia and efficiently](https://reader030.vdocuments.site/reader030/viewer/2022040823/5e6e0c183c999858d44f9945/html5/thumbnails/5.jpg)
Rich Factual Information in Tables
4
What is the time difference for the best time in Women’s 100 Meter Race in 1974 and 2018?
• No single source can answer such a complex question.
• Factual information in tables is scattered in isolated tables across different articles.
![Page 6: TableNet: An Approach for Determining Fine-grained …...24.08.2003 3 10.01 28.06.2008 Yoshiihide Kiryu Japan 29.0.4.2013 Automatically extract tables from Wikipedia and efficiently](https://reader030.vdocuments.site/reader030/viewer/2022040823/5e6e0c183c999858d44f9945/html5/thumbnails/6.jpg)
Rich Factual Information in Tables
5
What is the time difference for the best time in Women’s 100 Meter Race in 1974 and 2018?
![Page 7: TableNet: An Approach for Determining Fine-grained …...24.08.2003 3 10.01 28.06.2008 Yoshiihide Kiryu Japan 29.0.4.2013 Automatically extract tables from Wikipedia and efficiently](https://reader030.vdocuments.site/reader030/viewer/2022040823/5e6e0c183c999858d44f9945/html5/thumbnails/7.jpg)
Rich Factual Information in Tables
5
What is the time difference for the best time in Women’s 100 Meter Race in 1974 and 2018?
Season's
![Page 8: TableNet: An Approach for Determining Fine-grained …...24.08.2003 3 10.01 28.06.2008 Yoshiihide Kiryu Japan 29.0.4.2013 Automatically extract tables from Wikipedia and efficiently](https://reader030.vdocuments.site/reader030/viewer/2022040823/5e6e0c183c999858d44f9945/html5/thumbnails/8.jpg)
Rich Factual Information in Tables
5
What is the time difference for the best time in Women’s 100 Meter Race in 1974 and 2018?
Season's
![Page 9: TableNet: An Approach for Determining Fine-grained …...24.08.2003 3 10.01 28.06.2008 Yoshiihide Kiryu Japan 29.0.4.2013 Automatically extract tables from Wikipedia and efficiently](https://reader030.vdocuments.site/reader030/viewer/2022040823/5e6e0c183c999858d44f9945/html5/thumbnails/9.jpg)
Rich Factual Information in Tables
5
What is the time difference for the best time in Women’s 100 Meter Race in 1974 and 2018?
Season's
Answer: 0.64s
![Page 10: TableNet: An Approach for Determining Fine-grained …...24.08.2003 3 10.01 28.06.2008 Yoshiihide Kiryu Japan 29.0.4.2013 Automatically extract tables from Wikipedia and efficiently](https://reader030.vdocuments.site/reader030/viewer/2022040823/5e6e0c183c999858d44f9945/html5/thumbnails/10.jpg)
Tables in Wikipedia
![Page 11: TableNet: An Approach for Determining Fine-grained …...24.08.2003 3 10.01 28.06.2008 Yoshiihide Kiryu Japan 29.0.4.2013 Automatically extract tables from Wikipedia and efficiently](https://reader030.vdocuments.site/reader030/viewer/2022040823/5e6e0c183c999858d44f9945/html5/thumbnails/11.jpg)
• Tables are one of the richest sources of factual information in Wikipedia and the Web:
• ~530k Wikipedia article contain tables
• ~3M extracted tables
• Results in > 32M rows
• Tables have the potential to cover hundreds of millions of facts and can be used to assess fact consistency and validity if tables can be interlinked.
Tables in Wikipedia
7Extractor code: https://github.com/bfetahu/wiki_tables
![Page 12: TableNet: An Approach for Determining Fine-grained …...24.08.2003 3 10.01 28.06.2008 Yoshiihide Kiryu Japan 29.0.4.2013 Automatically extract tables from Wikipedia and efficiently](https://reader030.vdocuments.site/reader030/viewer/2022040823/5e6e0c183c999858d44f9945/html5/thumbnails/12.jpg)
Tables in Wikipedia
8Extractor code: https://github.com/bfetahu/wiki_tables
![Page 13: TableNet: An Approach for Determining Fine-grained …...24.08.2003 3 10.01 28.06.2008 Yoshiihide Kiryu Japan 29.0.4.2013 Automatically extract tables from Wikipedia and efficiently](https://reader030.vdocuments.site/reader030/viewer/2022040823/5e6e0c183c999858d44f9945/html5/thumbnails/13.jpg)
Tables in Wikipedia
8Extractor code: https://github.com/bfetahu/wiki_tables
![Page 14: TableNet: An Approach for Determining Fine-grained …...24.08.2003 3 10.01 28.06.2008 Yoshiihide Kiryu Japan 29.0.4.2013 Automatically extract tables from Wikipedia and efficiently](https://reader030.vdocuments.site/reader030/viewer/2022040823/5e6e0c183c999858d44f9945/html5/thumbnails/14.jpg)
Tables in Wikipedia
8Extractor code: https://github.com/bfetahu/wiki_tables
![Page 15: TableNet: An Approach for Determining Fine-grained …...24.08.2003 3 10.01 28.06.2008 Yoshiihide Kiryu Japan 29.0.4.2013 Automatically extract tables from Wikipedia and efficiently](https://reader030.vdocuments.site/reader030/viewer/2022040823/5e6e0c183c999858d44f9945/html5/thumbnails/15.jpg)
Tables in Wikipedia
9Extractor code: https://github.com/bfetahu/wiki_tables
![Page 16: TableNet: An Approach for Determining Fine-grained …...24.08.2003 3 10.01 28.06.2008 Yoshiihide Kiryu Japan 29.0.4.2013 Automatically extract tables from Wikipedia and efficiently](https://reader030.vdocuments.site/reader030/viewer/2022040823/5e6e0c183c999858d44f9945/html5/thumbnails/16.jpg)
Challenges and Potential of Tables
![Page 17: TableNet: An Approach for Determining Fine-grained …...24.08.2003 3 10.01 28.06.2008 Yoshiihide Kiryu Japan 29.0.4.2013 Automatically extract tables from Wikipedia and efficiently](https://reader030.vdocuments.site/reader030/viewer/2022040823/5e6e0c183c999858d44f9945/html5/thumbnails/17.jpg)
• Extraction and Canonicalization problems:
• Lack of explicit schemas (what do the columns mean?!)
• Non-standard authoring practices
• Optimized for human readability and display
• Isolated factual information in Tables:
• Tables do not contain any explicit relations to other related tables
• Tables often subsume or are equivalent to other tables
• Joining the different tables can provide a richer picture of the factual information present in tables
• Alignment challenges:
• Table columns are ambiguous out of their context in which they appear (e.g. “Name” for actors, scientists, animals, race type etc.)
• Subject (or key) columns or a combination of columns is necessary for any two tables to be considered for alignment
• Large number of tables as candidates for alignment
Challenges
11
![Page 18: TableNet: An Approach for Determining Fine-grained …...24.08.2003 3 10.01 28.06.2008 Yoshiihide Kiryu Japan 29.0.4.2013 Automatically extract tables from Wikipedia and efficiently](https://reader030.vdocuments.site/reader030/viewer/2022040823/5e6e0c183c999858d44f9945/html5/thumbnails/18.jpg)
TableNet Approach
![Page 19: TableNet: An Approach for Determining Fine-grained …...24.08.2003 3 10.01 28.06.2008 Yoshiihide Kiryu Japan 29.0.4.2013 Automatically extract tables from Wikipedia and efficiently](https://reader030.vdocuments.site/reader030/viewer/2022040823/5e6e0c183c999858d44f9945/html5/thumbnails/19.jpg)
TableNet: Objectives
13
Automatically extract tables from Wikipedia and efficiently align tables with high accuracy and coverage with fine-grained relation types.
![Page 20: TableNet: An Approach for Determining Fine-grained …...24.08.2003 3 10.01 28.06.2008 Yoshiihide Kiryu Japan 29.0.4.2013 Automatically extract tables from Wikipedia and efficiently](https://reader030.vdocuments.site/reader030/viewer/2022040823/5e6e0c183c999858d44f9945/html5/thumbnails/20.jpg)
TableNet: Objectives
13
Rank Time Athlete Country Date12
4
10.4910.64
10.70
Florence G.-JoynerCarmelita JeterMarion Jones
Shelly-Ann F.-Pryce
United StatesUnited StatesUnited States
Jamaica
16.07.198820.09.200912.09.199829.06.2012
3 10.65
t2: All-time top 25 women
Rank Time Athlete Country Date
1
2
4
9.58
9.69
9.72
Usain Bolt
Tyson GayYohan BlakeAsafa Powell
Jamaica
United StatesJamaicaJamaica
16.08.2009
20.09.200923.08.201223.08.2012
t3: All-time top 25 menArea Men Women
Time Athlete Nation Time Athlete NationAfrica 9.85 Olusoji Fasuba Nigeria 10.78 Murielle Ahoure Ivory CoastAsia 9.91 Femi Ogunode Qatar 10.79 Li Xuemei China
Europe 9.86 Francis Obikwelu Portugal 10.73 Christine Arron France
South America 10.00 Robson da Silva Brazil 11.01 An Cláudia Lemos Brazil
t1: Continental records
t4: Top 10 Junior (under-20) menagerestriction
topmenrecords
genderrestriction
topwomen
records
gender
restriction
genderrestriction age
restriction
Date:DateCountry:LocationAthlete:Person{F}
t2schema:
Date:DateCountry:LocationAthlete:Person{M,age<20}
t4schema:
Date:DateCountry:LocationAthlete:Person{M}
t3schema:
Area/Nation:LocationAthlete:Person{M,F}
t1schema:
TableRelations:
(t1,t4):rel_1=genderRestriction(t1,t4)rel_2=ageRestriction(t4,t1)
(t1,t3):rel_1=topMenRecords(t3,t1)rel_2=genderRestriction(t1,t2)
(t3,t4):rel_1=ageRestriction(t4,t3)
(t1,t2):rel_1=genderRestriction(t1,t2)rel_2=topWomanRecords(t2,t1) Rank Time Athlete Country Date
12
9.9710.00
Trayvon BromellTrentavis Friday
Darrel BrownJeff Demps
United StatesUnited States
Trinidad and TobagoJamaica
13.06.201405.07.2014
24.08.200328.06.20083 10.01
Yoshiihide Kiryu Japan 29.0.4.2013
Automatically extract tables from Wikipedia and efficiently align tables with high accuracy and coverage with fine-grained relation types.
![Page 21: TableNet: An Approach for Determining Fine-grained …...24.08.2003 3 10.01 28.06.2008 Yoshiihide Kiryu Japan 29.0.4.2013 Automatically extract tables from Wikipedia and efficiently](https://reader030.vdocuments.site/reader030/viewer/2022040823/5e6e0c183c999858d44f9945/html5/thumbnails/21.jpg)
TableNet Overview
14
Table Extraction and Schema Description Generation
Candidate Pair Generation
Table Alignment
TableNet
![Page 22: TableNet: An Approach for Determining Fine-grained …...24.08.2003 3 10.01 28.06.2008 Yoshiihide Kiryu Japan 29.0.4.2013 Automatically extract tables from Wikipedia and efficiently](https://reader030.vdocuments.site/reader030/viewer/2022040823/5e6e0c183c999858d44f9945/html5/thumbnails/22.jpg)
TableNet Overview
14
Table Extraction and Schema Description Generation
Candidate Pair Generation
Table Alignment
TableNet
![Page 23: TableNet: An Approach for Determining Fine-grained …...24.08.2003 3 10.01 28.06.2008 Yoshiihide Kiryu Japan 29.0.4.2013 Automatically extract tables from Wikipedia and efficiently](https://reader030.vdocuments.site/reader030/viewer/2022040823/5e6e0c183c999858d44f9945/html5/thumbnails/23.jpg)
Candidate Pair Generation
![Page 24: TableNet: An Approach for Determining Fine-grained …...24.08.2003 3 10.01 28.06.2008 Yoshiihide Kiryu Japan 29.0.4.2013 Automatically extract tables from Wikipedia and efficiently](https://reader030.vdocuments.site/reader030/viewer/2022040823/5e6e0c183c999858d44f9945/html5/thumbnails/24.jpg)
• More than 530k Wikipedia articles contain tables
• Consider all pairs as relevant ?!→ 530K! (factorial)
• Efficient algorithm are needed to filter out irrelevant article pairs.
• We propose an efficient approach to reduce the amount of irrelevant pairs and at the same time maintain a high coverage of relevant article pairs, whose tables can be aligned.
TableNet: Candidate Generation
16
![Page 25: TableNet: An Approach for Determining Fine-grained …...24.08.2003 3 10.01 28.06.2008 Yoshiihide Kiryu Japan 29.0.4.2013 Automatically extract tables from Wikipedia and efficiently](https://reader030.vdocuments.site/reader030/viewer/2022040823/5e6e0c183c999858d44f9945/html5/thumbnails/25.jpg)
TableNet: Candidate Generation
17
Article Abstract Features
![Page 26: TableNet: An Approach for Determining Fine-grained …...24.08.2003 3 10.01 28.06.2008 Yoshiihide Kiryu Japan 29.0.4.2013 Automatically extract tables from Wikipedia and efficiently](https://reader030.vdocuments.site/reader030/viewer/2022040823/5e6e0c183c999858d44f9945/html5/thumbnails/26.jpg)
TableNet: Candidate Generation
17
Article Abstract Features
• doc2Vec similarity between abstracts
• Avg. word2Vec abstract vector similarity
• tf-idf similarity between abstracts
![Page 27: TableNet: An Approach for Determining Fine-grained …...24.08.2003 3 10.01 28.06.2008 Yoshiihide Kiryu Japan 29.0.4.2013 Automatically extract tables from Wikipedia and efficiently](https://reader030.vdocuments.site/reader030/viewer/2022040823/5e6e0c183c999858d44f9945/html5/thumbnails/27.jpg)
TableNet: Candidate Generation
18
Categories and KBs Features
![Page 28: TableNet: An Approach for Determining Fine-grained …...24.08.2003 3 10.01 28.06.2008 Yoshiihide Kiryu Japan 29.0.4.2013 Automatically extract tables from Wikipedia and efficiently](https://reader030.vdocuments.site/reader030/viewer/2022040823/5e6e0c183c999858d44f9945/html5/thumbnails/28.jpg)
TableNet: Candidate Generation
18
Categories and KBs Features
Wikipedia categories might lack in quality → Computing embeddings of categories using graph embeddings approaches:
• Similarity in embedding space between categories
• Direct and parent categories overlap
• DBpedia type overlap ….
![Page 29: TableNet: An Approach for Determining Fine-grained …...24.08.2003 3 10.01 28.06.2008 Yoshiihide Kiryu Japan 29.0.4.2013 Automatically extract tables from Wikipedia and efficiently](https://reader030.vdocuments.site/reader030/viewer/2022040823/5e6e0c183c999858d44f9945/html5/thumbnails/29.jpg)
TableNet: Candidate Generation
19
Table Features
• Column title similarity
• Column title distance
![Page 30: TableNet: An Approach for Determining Fine-grained …...24.08.2003 3 10.01 28.06.2008 Yoshiihide Kiryu Japan 29.0.4.2013 Automatically extract tables from Wikipedia and efficiently](https://reader030.vdocuments.site/reader030/viewer/2022040823/5e6e0c183c999858d44f9945/html5/thumbnails/30.jpg)
Table Alignment
![Page 31: TableNet: An Approach for Determining Fine-grained …...24.08.2003 3 10.01 28.06.2008 Yoshiihide Kiryu Japan 29.0.4.2013 Automatically extract tables from Wikipedia and efficiently](https://reader030.vdocuments.site/reader030/viewer/2022040823/5e6e0c183c999858d44f9945/html5/thumbnails/31.jpg)
TableNet: Table Alignment
21
cia ci
b cin c j
a c jb c j
n
columnrep.
φiaLSTM
cellsφi
b φin φ j
aφd φ jb φ j
n
h ia h i
b h in hd h j
a h jb h j
n
delim
iter
col-by-col attention
output layer
Column Description
Instance Values
Column Type
For a table pair predict their relation type
r(ti, tj) ⟶ {subPartOf, equivalent, none}
![Page 32: TableNet: An Approach for Determining Fine-grained …...24.08.2003 3 10.01 28.06.2008 Yoshiihide Kiryu Japan 29.0.4.2013 Automatically extract tables from Wikipedia and efficiently](https://reader030.vdocuments.site/reader030/viewer/2022040823/5e6e0c183c999858d44f9945/html5/thumbnails/32.jpg)
TableNet: Table Alignment
22
Table Column Representations
• Column Description • Represent the column description tokens based on their world embeddings (Glove) • Disadvantage: Column descriptions can be ambiguous (e.g. Title column for Books
or Movies)
• Instance Values • Avg. embedding of the cell values based on graph embeddings (node2Vec trained
on Wikipedia anchor graph)
• Column Type • Represent LCA category through graph embeddings
![Page 33: TableNet: An Approach for Determining Fine-grained …...24.08.2003 3 10.01 28.06.2008 Yoshiihide Kiryu Japan 29.0.4.2013 Automatically extract tables from Wikipedia and efficiently](https://reader030.vdocuments.site/reader030/viewer/2022040823/5e6e0c183c999858d44f9945/html5/thumbnails/33.jpg)
Evaluation Setup
![Page 34: TableNet: An Approach for Determining Fine-grained …...24.08.2003 3 10.01 28.06.2008 Yoshiihide Kiryu Japan 29.0.4.2013 Automatically extract tables from Wikipedia and efficiently](https://reader030.vdocuments.site/reader030/viewer/2022040823/5e6e0c183c999858d44f9945/html5/thumbnails/34.jpg)
• Random sample of 50 Wikipedia (source) articles, respectively their tables
• Ground-truth considerations:
• Coverage: ensure that for each of the tables source articles, we have all relevant tables for alignment
• Efficiency: iteratively manually construct filters to remove articles whose tables cannot yield any relation for the tables of interest
• Labelling: crowdsource the remaining pairs for labelling (3 annotators per table)
• Labelling Quality: comprehensive worker training through detailed instructions and examples before joining the task.
• Ground-truth stats for the 17k crowdsourced table pairs:
• 52% pairs with noalignment
• 24% pairs with equivalent
• 23% pairs with subPartOf.
Ground-truth Data
24
![Page 35: TableNet: An Approach for Determining Fine-grained …...24.08.2003 3 10.01 28.06.2008 Yoshiihide Kiryu Japan 29.0.4.2013 Automatically extract tables from Wikipedia and efficiently](https://reader030.vdocuments.site/reader030/viewer/2022040823/5e6e0c183c999858d44f9945/html5/thumbnails/35.jpg)
Candidate Generation
![Page 36: TableNet: An Approach for Determining Fine-grained …...24.08.2003 3 10.01 28.06.2008 Yoshiihide Kiryu Japan 29.0.4.2013 Automatically extract tables from Wikipedia and efficiently](https://reader030.vdocuments.site/reader030/viewer/2022040823/5e6e0c183c999858d44f9945/html5/thumbnails/36.jpg)
TableNet: Candidate Generation Results
26
876 876823
780 762710
604517
420
307
66950 15450 8652 50472987
1854927
515309
206Rµ = 0.81∆ = 0.98
0.0
0.2
0.4
0.6
0.8
1.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9τ
score
a
a
PRµ
Use the computed features for pre-filtering, then apply a RF (tweaked to increase recall) for classifying candidates as relevant/irrelevant.
![Page 37: TableNet: An Approach for Determining Fine-grained …...24.08.2003 3 10.01 28.06.2008 Yoshiihide Kiryu Japan 29.0.4.2013 Automatically extract tables from Wikipedia and efficiently](https://reader030.vdocuments.site/reader030/viewer/2022040823/5e6e0c183c999858d44f9945/html5/thumbnails/37.jpg)
Table Alignment
![Page 38: TableNet: An Approach for Determining Fine-grained …...24.08.2003 3 10.01 28.06.2008 Yoshiihide Kiryu Japan 29.0.4.2013 Automatically extract tables from Wikipedia and efficiently](https://reader030.vdocuments.site/reader030/viewer/2022040823/5e6e0c183c999858d44f9945/html5/thumbnails/38.jpg)
Table Alignment Results
28
TableNet based on a BiLSTM with column-by-column attention can determine fine-grained relation types with an accuracy of 83%.
![Page 39: TableNet: An Approach for Determining Fine-grained …...24.08.2003 3 10.01 28.06.2008 Yoshiihide Kiryu Japan 29.0.4.2013 Automatically extract tables from Wikipedia and efficiently](https://reader030.vdocuments.site/reader030/viewer/2022040823/5e6e0c183c999858d44f9945/html5/thumbnails/39.jpg)
Conclusions
![Page 40: TableNet: An Approach for Determining Fine-grained …...24.08.2003 3 10.01 28.06.2008 Yoshiihide Kiryu Japan 29.0.4.2013 Automatically extract tables from Wikipedia and efficiently](https://reader030.vdocuments.site/reader030/viewer/2022040823/5e6e0c183c999858d44f9945/html5/thumbnails/40.jpg)
• Contributions:
• TableNet - a knowledge graph of aligned tables
• Fine grained relation types between tables: equivalent, subPartOf
• Improvement over existing works, with fine grained table relations
• Exhaustive ground truth from 50 Wikipedia articles resulting 17K table pairs
• Resources for TableNet
• Data & Code: https://github.com/bfetahu/wiki_tables
• Note: The candidate feature generation code and the table alignment code will be published before the TheWebCon 2019.
![Page 41: TableNet: An Approach for Determining Fine-grained …...24.08.2003 3 10.01 28.06.2008 Yoshiihide Kiryu Japan 29.0.4.2013 Automatically extract tables from Wikipedia and efficiently](https://reader030.vdocuments.site/reader030/viewer/2022040823/5e6e0c183c999858d44f9945/html5/thumbnails/41.jpg)
Thank you! Questions?