marko grobelnik jasna Škrbec jozef stefan institute

15
ailab.ijs.si Marko Grobelnik Jasna Škrbec Jozef Stefan Institute Social Context as a part of News-Archive-Explorer Web application for exploratory browsing of news streams and archives

Upload: brittany-price

Post on 03-Jan-2016

52 views

Category:

Documents


0 download

DESCRIPTION

Social Context as a part of News-Archive-Explorer Web application for exploratory browsing of news streams and archives. Marko Grobelnik Jasna Škrbec Jozef Stefan Institute. Introduction. News publishers generate content archives - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Marko  Grobelnik Jasna Škrbec Jozef  Stefan Institute

Marko Grobelnik

Jasna Škrbec

Jozef Stefan Institute

Social Context as a part of News-Archive-Explorer

Web application for exploratory browsing of news streams and archives

Page 2: Marko  Grobelnik Jasna Škrbec Jozef  Stefan Institute

Introduction

News publishers generate content archivesThe goal is to build a system to make such archives usable through text mining & visualization

Archive characteristics:Large corpora (up-to few M articles)

Rich meta data (specific for each archive)

Different input formats (xml structure)

Poor search interfaces (not specialized for archives)

Page 3: Marko  Grobelnik Jasna Škrbec Jozef  Stefan Institute

What we want?

Application to…help user search and browse through archives

help user read more about topics related to search

visualize how things are connected in time, place, stories, etc.

get user’s attention and interest in other related issues

tell more about searched content

Page 4: Marko  Grobelnik Jasna Škrbec Jozef  Stefan Institute

Architecture

Archive

Preprocessing Enrycher

SQL

Server

Server side Client side

Page 5: Marko  Grobelnik Jasna Škrbec Jozef  Stefan Institute

Database model

Page 6: Marko  Grobelnik Jasna Škrbec Jozef  Stefan Institute

Already done

Import archive xml filesNew York Times archive (15M articles)

NYTimes LDC (1.7M articles)

Nature (300k articles),

Reuters (830k articles)

Server sideImport to database - PostgreSQL

Preprocessed with enrycher

Client sideFaceted Search interface (author, entity, keyword, publish date, category)

Showing context around searched content/article

Page 7: Marko  Grobelnik Jasna Škrbec Jozef  Stefan Institute

Current version of the GUI

Page 8: Marko  Grobelnik Jasna Škrbec Jozef  Stefan Institute

Showing relationships between entities

Page 9: Marko  Grobelnik Jasna Škrbec Jozef  Stefan Institute

Plans for the future

Improve search (with narrowing criteria, suggestions)

Adding visualizations to show content in time, space and other contexts

Adding links to similar content (stories)

Adding links to outside resources (like dbpedia) or bring this resources inside this application

Integrate with tools developed in AILab to improve search and presentation of articles (SearchPoint, DocAtlas, …)

Improve usability & appearance of user interface

Page 10: Marko  Grobelnik Jasna Škrbec Jozef  Stefan Institute

Topic landscape of the query “Clinton” from Reuters news 1996-1997

Query

Search Results

Topic Map

Selected group of news

Selected story

Page 11: Marko  Grobelnik Jasna Škrbec Jozef  Stefan Institute

Visualization of social relationships between “Clinton” and other entities

Query

Named entities in relation

Page 12: Marko  Grobelnik Jasna Škrbec Jozef  Stefan Institute

Topic Trends Tracking of the documents including “Clinton”

Query

Topic TrendsVisualization

Topics description

US ElectionsUS Budget

Mid-Eastconflict

NATO-Russia

Result set

Page 13: Marko  Grobelnik Jasna Škrbec Jozef  Stefan Institute

WW2 query “Pearl Harbor” into NYTimes archive

Dec 7th 1941

Page 14: Marko  Grobelnik Jasna Škrbec Jozef  Stefan Institute

WW2 query “Belgrade” into NYTimes archive

Apr 6th 1941

Page 15: Marko  Grobelnik Jasna Škrbec Jozef  Stefan Institute

WW2 query “Normandy” into NYTimes archive

June 1944