im2013vit

SKIMMR: Making KnowledgeDiscovery Easier

Vít Novácek ([email protected])

February 8th, 2013 @ DERI meeting

[email protected]

Introduction SKIMMR Demo Evaluation Conclusions

Outline

IntroductionSKIMMR

KB ComputationKB Utilisation

DemoEvaluation

Evaluated FeaturesEvaluation Methodology

Conclusions

1 / 10


Machine-Aided Skim Reading

Traditional (Skim) Readingfull reading – deep insights (slow)skim reading – superficial overview (quicker)

How Can Automation Help?going deep is hardlarge scale shallow processing more feasible

What Kind of Automation?extraction (text and data mining)augmentation (computing more complex content)indexing and queryingpresentation of the results

Related Workprocessing: text mining, graph analysis, distributional semantics, fuzzy IRpresentation: GoPubMed, Textpresso, IVEA, CORAAL, Exhibit, . . .

Image source:http://a-pieceofpaper.blogspot.com

2 / 10

http://a-pieceofpaper.blogspot.com


Input/Extraction Pipe-Lines

Text Extractionpreprocessing (tokenization, tagging, shallow parsing)NE recognitionrelation extractionco-occurrence analysis + statistics (PMI, TF/IDF, . . . )

Digesting Linked Datagraph decompositioncluster analysisco-occurrence analysis + statistics (PMI, TF/IDF, . . . )

Extraction Results(s,p,o, r ,w) statementssubject, predicate, object, provenance, weight

Image source:http://atyoursurveys.blogspot.com

3 / 10

http://atyoursurveys.blogspot.com


Computing the Knowledge Base

Distributional Representationaggregated co-occurrence/relation statementsstatements → tensor representationevery element still linked to its provenancematrix perspectives of the tensor

Augmentationperspectives give rise to emergent patterns like:

semantic similarityconcept clusters and taxonomiesIF-THEN rulesconcept ordering and relative relevance

Image source:www.bystonline.org

4 / 10

www.bystonline.org


Indexing the Knowledge Base

Term IndexT1 T2 . . . Tn

T1 w1,1 w1,2 . . . w1,n

T2 w2,1 w2,2 . . . w2,n...

......

. . ....

Tn wn,1 wn,2 . . . wn,n

wi,j ∈ [0, 1]

Statement IndexS1 S2 . . . Sm

T1 c1,1 c1,2 . . . c1,m

T2 c2,1 c2,2 . . . c2,m...

......

. . ....

Tn cn,1 cn,2 . . . cn,m

ci,j ∈ {0, 1}

Provenance IndexP1 P2 . . . Pq

S1 w1,1 w1,2 . . . w1,q

S2 w2,1 w2,2 . . . w2,q...

......

. . ....

Sm wm,1 wm,2 . . . wm,q

wi,j ∈ [0, 1]

Auxiliary Fulltext Indexuser’s entry pointincreasing robustness“keys”: queriesvalues: term identifiersfairly standard IR:

OKAPI BM25F

Image source:http://teptdataservices.blogspot.com

5 / 10

http://teptdataservices.blogspot.com


Querying the Knowledge Base

Initial Result Term Setexample query: ? ↔ Tx AND (? ↔ Ty OR ? ↔ Tz)term index look-up:

Fx = {(T1, wx,1), (T2, wx,2), . . . , (Tn, wx,n)}Fy = {(T1, wy,1), (T2, wy,2), . . . , (Tn, wy,n)}Fz = {(T1, wz,1), (T2, wz,2), . . . , (Tn, wz,n)}

combining atomic results: Fx ∩ (Fy ∪ Fz)

Complete Results

terms: RT = {(T1,wT1 ), (T2,wT

2 ), . . . ,Tn,wTn }, where wT

i arethe weights resulting from the combinationstatements: RS = {(S1,wS

1 ), (S2,wS2 ), . . . , (Sm,wS

m)}, wherewS

i = fν(∑n

j=1 wTj cj,i)

provenances: RP = {(P1,wP1 ), (P2,wP

2 ), . . . , (Pq,wPq )}, where

wPi = fν(

∑mj=1 wS

j wj,i)

Image source:http://nuget.org

6 / 10

http://nuget.org


Let’s Learn About Some Grim Stuff!

7 / 10


What to Evaluate?

Quality of the Extracted/Computed Content“noise-to-signal” ratiorelevance of results w.r.t. queriesinformation value (obvious vs. enlightening)

User Experienceusability of SKIMMR

generaldomain-specific

performance benefits (over a base-line)

Image source:http://voguepay.com

8 / 10

http://voguepay.com


How to Evaluate?

Quality of the Extracted/Computed Contentidentification (or creation) of a gold standardgeneralised IR measurescommittee-based annotation of the results

User ExperienceSUS surveydomain-specific surveyuser performance analysis (SKIMMR vs. base-line)

Image source:http://www.123rf.com

9 / 10

http://www.123rf.com


Conclusions and Future Work

Current Statusmachine-aided skim reading notion coinedbasic theoretical background proposeda prototype implemented (general and biomedical versions)

http://pypi.python.org/pypi/skimmr_gt/0.1-a1http://pypi.python.org/pypi/skimmr_bm/0.1-a1

Next Stepsevaluation (with a gold standard and sample users)dissemination and follow-ups (write-up, proposals)back-end extensions:

more (complex) types of relationsproper APIs (development, web service, . . . )database and/or cloud storage

front-end extensions:smoother transition between the graphscomplex queryingadditional visualisations (trends, focused provenances, . . . )

Image source:http://support.pacifichost.com

10 / 10

http://pypi.python.org/pypi/skimmr_gt/0.1-a1

http://pypi.python.org/pypi/skimmr_bm/0.1-a1

http://support.pacifichost.com

im2013vit

Technology

tnt1 w1

pqs1 w1

nt2 w2

qs2 w2

iimage source

weightimage source

knowledge baseterm indext1

okapi bm25fimage source