im2013vit
DESCRIPTION
Unlike full reading, 'skim-reading' involves the process of looking quickly over information in an attempt to cover more material whilst still being able to retain a superficial view of the underlying content. Within this work, we specifically emulate this natural human activity by providing a dynamic graph-based view of entities automatically extracted from text using superficial text parsing / processing techniques. We provide a preliminary web-based tool (called `SKIMMR') that generates a network of inter-related concepts from a set of documents. In SKIMMR, a user may browse the network to investigate the lexically-driven information space extracted from the documents. When a particular area of that space looks interesting to a user, the tool can then display the documents that are most relevant to the displayed concepts. We present this as a simple, viable methodology for browsing a document collection (such as a collection scientific research articles) in an attempt to limit the information overload of examining that document collection.TRANSCRIPT
SKIMMR: Making KnowledgeDiscovery Easier
Vít Novácek ([email protected])
February 8th, 2013 @ DERI meeting
Introduction SKIMMR Demo Evaluation Conclusions
Outline
IntroductionSKIMMR
KB ComputationKB Utilisation
DemoEvaluation
Evaluated FeaturesEvaluation Methodology
Conclusions
1 / 10
Introduction SKIMMR Demo Evaluation Conclusions
Machine-Aided Skim Reading
Traditional (Skim) Readingfull reading – deep insights (slow)skim reading – superficial overview (quicker)
How Can Automation Help?going deep is hardlarge scale shallow processing more feasible
What Kind of Automation?extraction (text and data mining)augmentation (computing more complex content)indexing and queryingpresentation of the results
Related Workprocessing: text mining, graph analysis, distributional semantics, fuzzy IRpresentation: GoPubMed, Textpresso, IVEA, CORAAL, Exhibit, . . .
Image source:http://a-pieceofpaper.blogspot.com
2 / 10
Introduction SKIMMR Demo Evaluation Conclusions
Input/Extraction Pipe-Lines
Text Extractionpreprocessing (tokenization, tagging, shallow parsing)NE recognitionrelation extractionco-occurrence analysis + statistics (PMI, TF/IDF, . . . )
Digesting Linked Datagraph decompositioncluster analysisco-occurrence analysis + statistics (PMI, TF/IDF, . . . )
Extraction Results(s,p,o, r ,w) statementssubject, predicate, object, provenance, weight
Image source:http://atyoursurveys.blogspot.com
3 / 10
Introduction SKIMMR Demo Evaluation Conclusions
Computing the Knowledge Base
Distributional Representationaggregated co-occurrence/relation statementsstatements → tensor representationevery element still linked to its provenancematrix perspectives of the tensor
Augmentationperspectives give rise to emergent patterns like:
semantic similarityconcept clusters and taxonomiesIF-THEN rulesconcept ordering and relative relevance
Image source:www.bystonline.org
4 / 10
Introduction SKIMMR Demo Evaluation Conclusions
Indexing the Knowledge Base
Term IndexT1 T2 . . . Tn
T1 w1,1 w1,2 . . . w1,n
T2 w2,1 w2,2 . . . w2,n...
......
. . ....
Tn wn,1 wn,2 . . . wn,n
wi,j ∈ [0, 1]
Statement IndexS1 S2 . . . Sm
T1 c1,1 c1,2 . . . c1,m
T2 c2,1 c2,2 . . . c2,m...
......
. . ....
Tn cn,1 cn,2 . . . cn,m
ci,j ∈ {0, 1}
Provenance IndexP1 P2 . . . Pq
S1 w1,1 w1,2 . . . w1,q
S2 w2,1 w2,2 . . . w2,q...
......
. . ....
Sm wm,1 wm,2 . . . wm,q
wi,j ∈ [0, 1]
Auxiliary Fulltext Indexuser’s entry pointincreasing robustness“keys”: queriesvalues: term identifiersfairly standard IR:
OKAPI BM25F
Image source:http://teptdataservices.blogspot.com
5 / 10
Introduction SKIMMR Demo Evaluation Conclusions
Querying the Knowledge Base
Initial Result Term Setexample query: ? ↔ Tx AND (? ↔ Ty OR ? ↔ Tz)term index look-up:
Fx = {(T1, wx,1), (T2, wx,2), . . . , (Tn, wx,n)}Fy = {(T1, wy,1), (T2, wy,2), . . . , (Tn, wy,n)}Fz = {(T1, wz,1), (T2, wz,2), . . . , (Tn, wz,n)}
combining atomic results: Fx ∩ (Fy ∪ Fz)
Complete Results
terms: RT = {(T1,wT1 ), (T2,wT
2 ), . . . ,Tn,wTn }, where wT
i arethe weights resulting from the combinationstatements: RS = {(S1,wS
1 ), (S2,wS2 ), . . . , (Sm,wS
m)}, wherewS
i = fν(∑n
j=1 wTj cj,i)
provenances: RP = {(P1,wP1 ), (P2,wP
2 ), . . . , (Pq,wPq )}, where
wPi = fν(
∑mj=1 wS
j wj,i)
Image source:http://nuget.org
6 / 10
Introduction SKIMMR Demo Evaluation Conclusions
Let’s Learn About Some Grim Stuff!
7 / 10
Introduction SKIMMR Demo Evaluation Conclusions
What to Evaluate?
Quality of the Extracted/Computed Content“noise-to-signal” ratiorelevance of results w.r.t. queriesinformation value (obvious vs. enlightening)
User Experienceusability of SKIMMR
generaldomain-specific
performance benefits (over a base-line)
Image source:http://voguepay.com
8 / 10
Introduction SKIMMR Demo Evaluation Conclusions
How to Evaluate?
Quality of the Extracted/Computed Contentidentification (or creation) of a gold standardgeneralised IR measurescommittee-based annotation of the results
User ExperienceSUS surveydomain-specific surveyuser performance analysis (SKIMMR vs. base-line)
Image source:http://www.123rf.com
9 / 10
Introduction SKIMMR Demo Evaluation Conclusions
Conclusions and Future Work
Current Statusmachine-aided skim reading notion coinedbasic theoretical background proposeda prototype implemented (general and biomedical versions)
http://pypi.python.org/pypi/skimmr_gt/0.1-a1http://pypi.python.org/pypi/skimmr_bm/0.1-a1
Next Stepsevaluation (with a gold standard and sample users)dissemination and follow-ups (write-up, proposals)back-end extensions:
more (complex) types of relationsproper APIs (development, web service, . . . )database and/or cloud storage
front-end extensions:smoother transition between the graphscomplex queryingadditional visualisations (trends, focused provenances, . . . )
Image source:http://support.pacifichost.com
10 / 10