disaster monitoring with wikipedia

9
Disaster Monitoring with Wikipedia and Online Social Networking Sites: Structured Data and Linked Data Fragments to the Rescue? Thomas Steiner * Google Germany GmbH ABC Str. 19 D-20355 Hamburg, Germany [email protected] Ruben Verborgh Multimedia Lab – Ghent University – iMinds Gaston Crommenlaan 8 bus 201 B-9050 Ledeberg-Ghent, Belgium [email protected] Abstract In this paper, we present the first results of our on- going early-stage research on a realtime disaster de- tection and monitoring tool. Based on Wikipedia, it is language-agnostic and leverages user-generated multi- media content shared on online social networking sites to help disaster responders prioritize their efforts. We make the tool and its source code publicly available as we make progress on it. Furthermore, we strive to pub- lish detected disasters and accompanying multimedia content following the Linked Data principles to facili- tate its wide consumption, redistribution, and evaluation of its usefulness. 1 Introduction 1.1 Disaster Monitoring: A Global Challenge According to a study (Laframboise and Loko 2012) pub- lished by the International Monetary Fund (IMF), about 700 disasters were registered worldwide between 2010 and 2012, affecting more than 450 million people. According to the study, “[d]amages have risen from an estimated US$20 billion on average per year in the 1990s to about US$100 billion per year during 2000–10.” The authors expect this upward trend to continue “as a result of the rising concentra- tion of people living in areas more exposed to disasters, and climate change.” In consequence, disaster monitoring will become more and more crucial in the future. National agencies like the Federal Emergency Manage- ment Agency (FEMA) 1 in the United States of America or the Bundesamt für Bevölkerungsschutz und Katastrophen- hilfe (BBK, 2 “Federal Office of Civil Protection and Disas- ter Assistance”) in Germany work to ensure the safety of the population on a national level, combining and provid- ing relevant tasks and information in a single place. The United Nations Office for the Coordination of Humanitar- ian Affairs (OCHA) 3 is a United Nations (UN) body formed * Second affiliation: CNRS, Université de Lyon, LIRIS – UMR5205, Université Lyon 1, France Copyright c 2015, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. 1 FEMA: http://www.fema.gov/ 2 BBK: http://www.bbk.bund.de/ 3 OCHA: http://www.unocha.org/ to strengthen the UN’s response to complex emergencies and disasters. The Global Disaster Alert and Coordination System (GDACS) 4 is “a cooperation framework between the United Nations, the European Commission, and dis- aster managers worldwide to improve alerts, information exchange, and coordination in the first phase after major sudden-onset disasters.” Global companies like Facebook, 5 Airbnb, 6 or Google 7 have dedicated crisis response teams that work on making critical emergency information acces- sible in times of disaster. As can be seen from the (incom- prehensive) list above, disaster detection and response is a problem tackled on national, international, and global lev- els; both from the public and private sectors. 1.2 Hypotheses and Research Questions In this paper, we present the first results of our on- going early-stage research on a realtime comprehensive Wikipedia-based monitoring system for the detection of dis- asters around the globe. This system is language-agnostic and leverages multimedia content shared on online social networking sites, striving to help disaster responders priori- tize their efforts. Structured data about detected disasters is made available in the form of Linked Data to facilitate its consumption. An earlier version of this paper without the focus on multimedia content from online social networking sites and Linked Data was published in (Steiner 2014b). For the present and further extended work, we are steered by the following hypotheses. H1 Content about disasters gets added very fast to Wiki- pedia and online social networking sites by people in the neighborhood of the event. H2 Disasters being geographically constrained, textual and multimedia content about them on Wikipedia and social networking sites appear first in local language, perhaps only later in English. 4 GDACS: http://www.gdacs.org/ 5 Facebook Disaster Relief: https://www.facebook.com/DisasterRelief 6 Airbnb Disaster Response: https://www.airbnb.com/disaster-response 7 Google Crisis Response: https://www.google.org/crisisresponse/ arXiv:1501.06329v1 [cs.SI] 26 Jan 2015

Upload: sahamizu-daniel

Post on 05-Dec-2015

221 views

Category:

Documents


0 download

DESCRIPTION

Disaster Monitoring with Wikipedia Online Social Networking Sites:Structured Data and Linked Data Fragments to the Rescue?

TRANSCRIPT

Page 1: Disaster Monitoring with Wikipedia

Disaster Monitoring with Wikipedia and Online Social Networking SitesStructured Data and Linked Data Fragments to the Rescue

Thomas SteinerlowastGoogle Germany GmbH

ABC Str 19D-20355 Hamburg Germany

tomacgooglecom

Ruben VerborghMultimedia Lab ndash Ghent University ndash iMinds

Gaston Crommenlaan 8 bus 201B-9050 Ledeberg-Ghent Belgium

rubenverborghugentbe

Abstract

In this paper we present the first results of our on-going early-stage research on a realtime disaster de-tection and monitoring tool Based on Wikipedia it islanguage-agnostic and leverages user-generated multi-media content shared on online social networking sitesto help disaster responders prioritize their efforts Wemake the tool and its source code publicly available aswe make progress on it Furthermore we strive to pub-lish detected disasters and accompanying multimediacontent following the Linked Data principles to facili-tate its wide consumption redistribution and evaluationof its usefulness

1 Introduction11 Disaster Monitoring A Global ChallengeAccording to a study (Laframboise and Loko 2012) pub-lished by the International Monetary Fund (IMF) about700 disasters were registered worldwide between 2010 and2012 affecting more than 450 million people According tothe study ldquo[d]amages have risen from an estimated US$20billion on average per year in the 1990s to about US$100billion per year during 2000ndash10rdquo The authors expect thisupward trend to continue ldquoas a result of the rising concentra-tion of people living in areas more exposed to disasters andclimate changerdquo In consequence disaster monitoring willbecome more and more crucial in the future

National agencies like the Federal Emergency Manage-ment Agency (FEMA)1 in the United States of America orthe Bundesamt fuumlr Bevoumllkerungsschutz und Katastrophen-hilfe (BBK2 ldquoFederal Office of Civil Protection and Disas-ter Assistancerdquo) in Germany work to ensure the safety ofthe population on a national level combining and provid-ing relevant tasks and information in a single place TheUnited Nations Office for the Coordination of Humanitar-ian Affairs (OCHA)3 is a United Nations (UN) body formed

lowastSecond affiliation CNRS Universiteacute de Lyon LIRIS ndashUMR5205 Universiteacute Lyon 1 FranceCopyright ccopy 2015 Association for the Advancement of ArtificialIntelligence (wwwaaaiorg) All rights reserved

1FEMA httpwwwfemagov2BBK httpwwwbbkbundde3OCHA httpwwwunochaorg

to strengthen the UNrsquos response to complex emergenciesand disasters The Global Disaster Alert and CoordinationSystem (GDACS)4 is ldquoa cooperation framework betweenthe United Nations the European Commission and dis-aster managers worldwide to improve alerts informationexchange and coordination in the first phase after majorsudden-onset disastersrdquo Global companies like Facebook5Airbnb6 or Google7 have dedicated crisis response teamsthat work on making critical emergency information acces-sible in times of disaster As can be seen from the (incom-prehensive) list above disaster detection and response isa problem tackled on national international and global lev-els both from the public and private sectors

12 Hypotheses and Research QuestionsIn this paper we present the first results of our on-going early-stage research on a realtime comprehensiveWikipedia-based monitoring system for the detection of dis-asters around the globe This system is language-agnosticand leverages multimedia content shared on online socialnetworking sites striving to help disaster responders priori-tize their efforts Structured data about detected disasters ismade available in the form of Linked Data to facilitate itsconsumption An earlier version of this paper without thefocus on multimedia content from online social networkingsites and Linked Data was published in (Steiner 2014b) Forthe present and further extended work we are steered by thefollowing hypotheses

H1 Content about disasters gets added very fast to Wiki-pedia and online social networking sites by people in theneighborhood of the event

H2 Disasters being geographically constrained textual andmultimedia content about them on Wikipedia and socialnetworking sites appear first in local language perhapsonly later in English

4GDACS httpwwwgdacsorg5Facebook Disaster ReliefhttpswwwfacebookcomDisasterRelief

6Airbnb Disaster Responsehttpswwwairbnbcomdisaster-response

7Google Crisis Responsehttpswwwgoogleorgcrisisresponse

arX

iv1

501

0632

9v1

[cs

SI]

26

Jan

2015

H3 Link structure dynamics of Wikipedia provide fora meaningful way to detect future disasters ie disastersunknown at system creation time

These hypotheses lead us to the following research questionsthat we strive to answer in the near future

Q1 How timely and accurate is content from Wikipedia andonline social networking sites for the purpose of disasterdetection and ongoing monitoring compared to contentfrom authoritative and government sources

Q2 To what extent can the disambiguated nature of Wiki-pedia (things identified by URIs) improve on keyword-based disaster detection approaches eg via online socialnetwork sites or search logs

Q3 How much noise is introduced by full-text searches(which are not based on disambiguated URIs) for mul-timedia content on online social networking sites

The remainder of the article is structured as follows Firstwe discuss related work and enabling technologies in thenext section followed by our methodology in We de-scribe an evaluation strategy in and finally conclude withan outlook on future work in

2 Related Work and Enabling Technologies21 Disaster DetectionDigitally crowdsourced data for disaster detection and re-sponse has gained momentum in recent years as the Internethas proven resilient in times of crises compared to other in-frastructure Ryan Falor Crisis Response Product Managerat Google in 2011 remarks in (Falor 2011) that ldquoa substan-tial [ ] proportion of searches are directly related to thecrises and people continue to search and access informa-tion online even while traffic and search levels drop tem-porarily during and immediately following the crisesrdquo Inthe following we provide a non-exhaustive list of relatedwork on digitally crowdsourced disaster detection and re-sponse Sakaki Okazaki and Matsuo (2010) consider eachuser of the online social networking (OSN) site Twitter8

a sensor for the purpose of earthquake detection in JapanGoodchild and Glennon (2010) show how crowdsourcedgeodata from Wikipedia and Wikimapia9 ldquoa multilingualopen-content collaborative maprdquo can help complete author-itative data about disasters Abel et al (2012) describe a cri-sis monitoring system that extracts relevant content aboutknown disasters from Twitter Liu et al (2008) examinecommon patterns and norms of disaster coverage on thephoto sharing site Flickr10 Ortmann et al (2011) proposeto crowdsource Linked Open Data for disaster managementand also provide a good overview on well-known crowd-sourcing tools like Google Map Maker11 OpenStreetMap12

and Ushahidi (Okolloh 2009) We have developed a moni-toring system (Steiner 2014c) that detects news events from

8Twitter httpstwittercom9Wikimapia httpwikimapiaorg

10Flickr httpswwwflickrcom11Google Map Maker httpwwwgooglecommapmaker12OpenStreetMap httpwwwopenstreetmaporg

concurrent Wikipedia edits and auto-generates related multi-media galleries based on content from various OSN sites andWikimedia Commons13 Finally Lin and Mishne (2012) ex-amine realtime search query churn on Twitter including inthe context of disasters

22 The Common Alerting ProtocolTo facilitate collaboration a common protocol is essentialThe Common Alerting Protocol (CAP) (Westfall 2010) isan XML-based general data format for exchanging publicwarnings and emergencies between alerting technologiesCAP allows a warning message to be consistently dissem-inated simultaneously over many warning systems to manyapplications The protocol increases warning effectivenessand simplifies the task of activating a warning for officialsCAP also provides the capability to include multimedia datasuch as photos maps or videos Alerts can be geographi-cally targeted to a defined warning area An exemplary floodwarning CAP feed stemming from GDACS is shown in List-ing 1 The step from trees to graphs can be taken throughLinked Data which we introduce in the next section

23 Linked Data and Linked Data PrinciplesLinked Data (Berners-Lee 2006) defines a set of agreed-onbest practices and principles for interconnecting and pub-lishing structured data on the Web It uses Web technolo-gies like the Hypertext Transfer Protocol (HTTP Fielding etal 1999) and Unique Resource Identifiers (URIs Berners-Lee Fielding and Masinter 2005) to create typed links be-tween different sources The portal httplinkeddataorgdefines Linked Data as being ldquoabout using the Web to con-nect related data that wasnrsquot previously linked or using theWeb to lower the barriers to linking data currently linkedusing other methodsrdquo Tim Berners-Lee (2006) defined thefour rules for Linked Data in a W3C Design Issue as follows

1 Use URIs as names for things

2 Use HTTP URIs so that people can look up those names

3 When someone looks up a URI provide useful informa-tion using the standards (RDF SPARQL)

4 Include links to other URIs so that they can discover morethings

Linked Data uses RDF (Klyne and Carroll 2004) to createtyped links between things in the world The result is often-times referred to as the Web of Data RDF encodes state-ments about things in the form of (subject predicate

object) triples Heath and Bizer (2011) speak of RDF links

24 Linked Data FragmentsVarious access mechanisms to Linked Data exist on the Webeach of which comes with its own trade-offs regarding queryperformance freshness of data and server costavailabilityTo retrieve information about a specific subject you candereference its URL SPARQL endpoints allow to execute

13Wikimedia Commons httpscommonswikimediaorg

ltalert xmlns=urnoasisnamestcemergencycap12gt

ltidentifiergtGDACS_FL_4159_1ltidentifiergt

ltsendergtinfogdacsorgltsendergt ltsentgt2014-07-14T235959-0000ltsentgt

ltstatusgtActualltstatusgt ltmsgTypegtAlertltmsgTypegt

ltscopegtPublicltscopegt ltincidentsgt4159ltincidentsgt

ltinfogt

ltcategorygtGeoltcategorygtlteventgtFloodlteventgt

lturgencygtPastlturgencygtltseveritygtModerateltseveritygt

ltcertaintygtUnknownltcertaintygt

ltsenderNamegtGlobal Disaster Alert and Coordination SystemltsenderNamegt

ltheadline gtltdescription gt

ltwebgtprotectvrule width0ptprotecthrefhttpwwwgdacsorgreportsaspxeventype=FLhttpwwwgdacsorgreportsaspxeventype=FLamp

ampampeventid=4159ltwebgt

ltparametergtltvalueNamegteventidltvalueNamegtltvaluegt4159ltvaluegtltparametergt

ltparametergtltvalueNamegtcurrentepisodeidltvalueNamegtltvaluegt1ltvaluegtltparametergt

ltparametergtltvalueNamegtglideltvalueNamegtltvalue gtltparametergt

ltparametergtltvalueNamegtversionltvalueNamegtltvaluegt1ltvaluegtltparametergt

ltparametergtltvalueNamegtfromdateltvalueNamegtltvaluegtWed 21 May 2014 220000 GMTltvaluegtltparametergt

ltparametergtltvalueNamegttodateltvalueNamegtltvaluegtMon 14 Jul 2014 215959 GMTltvaluegtltparametergt

ltparametergtltvalueNamegteventtypeltvalueNamegtltvaluegtFLltvaluegtltparametergt

ltparametergtltvalueNamegtalertlevelltvalueNamegtltvaluegtGreenltvaluegtltparametergt

ltparametergtltvalueNamegtalerttypeltvalueNamegtltvaluegtautomaticltvaluegtltparametergt

ltparametergtltvalueNamegtlinkltvalueNamegtltvaluegtprotectvrule width0ptprotecthrefhttpwwwgdacsorgreportaspxeventtype=FLhttp

wwwgdacsorgreportaspxeventtype=FLampampampeventid=4159ltvaluegtltparametergt

ltparametergtltvalueNamegtcountryltvalueNamegtltvaluegtBrazilltvaluegtltparametergt

ltparametergtltvalueNamegteventnameltvalueNamegtltvalue gtltparametergt

ltparametergtltvalueNamegtseverityltvalueNamegtltvaluegtMagnitude 744ltvaluegtltparametergt

ltparametergtltvalueNamegtpopulationltvalueNamegtltvaluegt0 killed and 0 displacedltvaluegtltparametergt

ltparametergtltvalueNamegtvulnerabilityltvalueNamegtltvalue gtltparametergt

ltparametergtltvalueNamegtsourceidltvalueNamegtltvaluegtDFOltvaluegtltparametergt

ltparametergtltvalueNamegtiso3ltvalueNamegtltvalue gtltparametergt

ltparametergt

ltvalueNamegthazardcomponentsltvalueNamegtltvaluegtFLdead=0displaced=0main_cause=Heavy Rainseverity=2sqkm=25656457ltvaluegt

ltparametergt

ltparametergtltvalueNamegtdatemodifiedltvalueNamegtltvaluegtMon 01 Jan 0001 000000 GMTltvaluegtltparametergt

ltareagtltareaDescgtPolygonltareaDescgtltpolygongt100ltpolygongtltareagt

ltinfogt

ltalertgt

Listing 1 Common Alerting Protocol feed via the Global Disaster Alert and Coordination System (httpwwwgdacsorgxmlgdacs_capxml 2014-07-16)

complex queries on RDF data but they are not always avail-able While endpoints are more convenient for clients indi-vidual requests are considerably more expensive for serversAlternatively a data dump allows you to query locally

Linked Data Fragments (Verborgh et al 2014) providea uniform view on all such possible interfaces to LinkedData by describing each specific type of interface by thekind of fragments through which it allows access to thedataset Each fragment consists of three parts

data all triples of this dataset that match a specific selector

metadata triples that describe the dataset andor the LinkedData Fragment

controls hypermedia links andor forms that lead to otherLinked Data Fragments

This view allows to describe new interfaces with differenttrade-off combinations One such interface is triple patternfragments (Verborgh et al 2014) which enables users tohost Linked Data on low-cost servers with higher availability

than public SPARQL endpoints Such a light-weight mech-anism is ideal to expose live disaster monitoring data

3 Proposed Methodology31 Leveraging Wikipedia Link StructureWikipedia is an international online encyclopedia currentlyavailable in 287 languages14 with these characteristics

1 Articles in one language are interlinked with versionsof the same article in other languages eg the articleldquoNatural disasterrdquo on the English Wikipedia (httpenwikipediaorgwikiNatural_disaster) links to 74 ver-sions of this article in different languages15 We note thatthere exist similarities and differences among Wikipedias

14All Wikipedias httpmetawikimediaorgwikiList_of_

Wikipedias15Article language links httpenwikipediaorgwapiphp

action=queryampprop=langlinksamplllimit=maxamptitles=Natural_

disaster

with ldquosalient informationrdquo that is unique to each languageas well as more widely shared facts (Bao et al 2012)

2 Each article can have redirects ie alternative URLs thatpoint to the article For the English ldquoNatural disasterrdquo ar-ticle there are eight redirects16 eg ldquoNatural Hazardrdquo(synonym) ldquoExamples of natural disasterrdquo (refinement)or ldquoNatural disastersrdquo (plural)

3 For each article the list of back links that link to the cur-rent article is available ie inbound links other than redi-rects The article ldquoNatural disasterrdquo has more than 500 ar-ticles that link to it17 Likewise the list of outbound linksie other articles that the current article links to is avail-able18

By combining an articlersquos in- and outbound links we deter-mine the set of mutual links ie the set of articles that thecurrent article links to (outbound links) and at the same timereceives links from (inbound links)

32 Identification of Wikipedia Articles forMonitoring

Starting with the well-curated English seed article ldquoNaturaldisasterrdquo we programmatically follow each of the thereincontained links of type ldquoMain articlerdquo which leads to an ex-haustive list of English articles of concrete types of disasterseg ldquoTsunamirdquo (httpenwikipediaorgwikiTsunami)ldquoFloodrdquo (httpenwikipediaorgwikiFlood) ldquoEarth-quakerdquo (httpenwikipediaorgwikiEarthquake) etcIn total we obtain links to 20 English articles about differ-ent types of disasters19 For each of these English disastersarticles we obtain all versions of each article in differentlanguages [step (i) above] and of the resulting list of in-ternational articles in turn all their redirect URLs [step (ii)above] The intermediate result is a complete list of all (cur-rently 1270) articles in all Wikipedia languages and all theirredirects that have any type of disaster as their subject Wecall this list the ldquodisasters listrdquo and make it publicly avail-able in different formats (txt tsv and json) where theJSON version is the most flexible and recommended one20

Finally we obtain for each of the 1270 articles in the ldquodis-asters listrdquo all their back links ie their inbound links [step

16Article redirects httpenwikipediaorgw

apiphpaction=queryamplist=backlinksampblfilterredir=

redirectsampbllimit=maxampbltitle=Natural_disaster17Article inbound links httpenwikipediaorgwapi

phpaction=queryamplist=backlinksampbllimit=maxampblnamespace=

0ampbltitle=Natural_disaster18Article outbound links httpenwikipediaorgwapiphp

action=queryampprop=linksampplnamespace=0ampformat=jsonamppllimit=

maxamptitles=Natural_disaster19ldquoAvalancherdquo ldquoBlizzardrdquo ldquoCyclonerdquo ldquoDroughtrdquo ldquoEarth-

quakerdquo ldquoEpidemicrdquo ldquoExtratropical cyclonerdquo ldquoFloodrdquo ldquoGamma-ray burstrdquo ldquoHailrdquo ldquoHeat waverdquo ldquoImpact eventrdquo ldquoLimnic erup-tionrdquo ldquoMeteorological disasterrdquo ldquoSolar flarerdquo ldquoTornadordquo ldquoTropi-cal cyclonerdquo ldquoTsunamirdquo ldquoVolcanic eruptionrdquo ldquoWildfirerdquo

20ldquoDisasters listrdquo httpsgithubcomtomayacpostdoc

blobmasterpaperscomprehensive-wikipedia-monitoring-

for-global-and-realtime-natural-disaster-detectiondata

disasters-listjson

(iii) above] which serves to detect instances of disastersunknown at system creation time For example the articleldquoTyphoon Rammasun (2014)rdquo (httpenwikipediaorgwikiTyphoon_Rammasun_(2014))mdashwhich as a concrete in-stance of a disaster of type tropical cyclone is not containedin our ldquodisasters listrdquomdashlinks back to ldquoTropical cyclonerdquo(httpenwikipediaorgwikiTropical_cyclone) so wecan identify ldquoTyphoon Rammasun (2014)rdquo as related totropical cyclones (but not necessarily identify as a tropi-cal cyclone) even if at the systemrsquos creation time the ty-phoon did not exist yet Analog to the inbound links weobtain all outbound links of all articles in the ldquodisasterslistrdquo eg ldquoTropical cyclonerdquo has an outbound link to ldquo2014Pacific typhoon seasonrdquo (httpenwikipediaorgwiki2014_Pacific_typhoon_season) which also happens to bean inbound link of ldquoTropical cyclonerdquo so we have detecteda mutual circular link structure Figure 1 shows the examplein its entirety starting from the seed level to the disaster typelevel to the in-outbound link level The end result is a largelist called the ldquomonitoring listrdquo of all articles in all Wiki-pedia languages that are somehowmdashvia a redirect inboundor outbound link (or resulting mutual link)mdashrelated to anyof the articles in the ldquodisasters listrdquo We make a snapshot ofthis dynamic ldquomonitoring listrdquo available for reference21 butnote that it will be out-of-date soon and should be regener-ated on a regular basis The current version holds 141001different articles

33 Monitoring ProcessIn the past we have worked on a Server-Sent Events (SSE)API (Steiner 2014a) capable of monitoring realtime editingactivity on all language versions of Wikipedia This APIallows us to easily analyze Wikipedia edits by reacting onevents fired by the API Whenever an edit event occurs wecheck if it is for one of the articles on our ldquomonitoring listrdquoWe keep track of the historic one-day-window editing activ-ity for each article on the ldquomonitoring listrdquo including theirversions in other languages and upon a sudden spike ofediting activity trigger an alert about a potential new in-stance of a disaster type that the spiking article is an inboundor outbound link of (or both) To illustrate this if eg theGerman article ldquoPazifische Taifunsaison 2014rdquo including allof its language links is spiking we can infer that this is re-lated to a disaster of type ldquoTropical cyclonerdquo due to the de-tected mutual link structure mentioned earlier (Figure 1)

In order to detect spikes we apply exponential smooth-ing to the last n edit intervals (we require n ge 5) that oc-curred in the past 24 hours with a smoothing factor α = 05The therefore required edit events are retrieved programmat-ically via the Wikipedia API22 As a spike occurs when anedit interval gets ldquoshort enoughrdquo compared to historic edit-

21ldquoMonitoring listrdquo httpsgithubcomtomayacpostdoc

blobmasterpaperscomprehensive-wikipedia-monitoring-for-

global-and-realtime-disaster-detectiondatamonitoring-

listjson22Wikipedia last revisions httpenwikipediaorgw

apiphpaction=queryampprop=revisionsamprvlimit=6amprvprop=

timestamp|useramptitles=Typhoon_Rammasun_(2014)

enTyphoonRammasun (2014)

en2014 Pacifictyphoon season

enTropicalcyclone

enNaturaldisaster

enTropical storm

deTropischerWirbelsturm enDisaster

preparedness

(seed article)

(redirect)

(language link)

(inbound link)

(outbound link)

(mutual link)

seed level

disaster type level

in-outbound link level

Legend

dePazifischeTaifunsaison 2014

(inbound link)

English German

Figure 1 Extracted Wikipedia link structure (tiny excerpt) starting from the seed article ldquoNatural disasterrdquo

ing activity we report a spike whenever the latest edit inter-val is shorter than half a standard deviation 05timesσ

A subset of all Wikipedia articles are geo-referenced23

so when we detect a spiking article we try to obtain geo co-ordinates for the article itself (eg ldquoPazifische Taifunsaison2014rdquo) or any of its language links thatmdashas a consequenceof the assumption in H2mdashmay provide more local details(eg ldquo2014 Pacific typhoon seasonrdquo in English or ldquo2014年太平洋季rdquo in Chinese) We then calculate the center pointof all obtained latitudelongitude pairs

34 Multimedia Content from Online SocialNetworking Sites

In the past we have worked on an application called So-cial Media Illustrator (Steiner 2014c) that provides a so-cial multimedia search framework that enables searching forand extraction of multimedia data from the online socialnetworking sites Google+24 Facebook25 Twitter26 Insta-gram27 YouTube28 Flickr29 MobyPicture30 TwitPic31 and

23Article geo coordinates httpenwikipediaorgwapi

phpaction=queryampprop=coordinatesampformat=jsonampcolimit=

maxampcoprop=dim|country|region|globeampcoprimary=allamptitles=

September_11_attacks24Google+ httpsplusgooglecom25Facebook httpswwwfacebookcom26Twitter httpstwittercom27Instagram httpinstagramcom28YouTube httpwwwyoutubecom29Flickr httpwwwflickrcom30MobyPicture httpwwwmobypicturecom31TwitPic httptwitpiccom

Wikimedia Commons32 In a first step it deduplicates exact-and near-duplicate social multimedia data based on a previ-ously describe algorithm (Steiner et al 2013) It then rankssocial multimedia data by social signals (Steiner 2014c)based on an abstraction layer on top of the online social net-working sites mentioned above and in a final step allowsfor the creation of media galleries following aesthetic princi-ples (Steiner 2014c) of the two kinds Strict Order Equal Sizeand Loose Order Varying Size defined in (Steiner 2014c)We have ported crucial parts of the code of Social Media Il-lustrator from the client-side to the server-side enabling usnow to create media galleries at scale and on demand basedon the titles of spiking Wikipedia articles that are used asseparate search terms for each language The social mediacontent therefore does not have to link to Wikipedia Oneexemplary media gallery can be seen in Figure 2 each indi-vidual media item in the gallery is clickable and links backto the original post on the particular online social networkingsite allowing crisis responders to monitor the media galleryas a whole and to investigate interesting media items at thesource and potentially get in contact with the originator

35 Linked Data PublicationIn a final step once a given confidence threshold has beenreached and upon human inspection we plan to send outa notification according to the Common Alerting Protocolfollowing the format that (for GDACS) can be seen in List-ing 1 While Common Alerting Protocol messages are gen-erally well understood additional synergies can be unlockedby leveraging Linked Data sources like DBpedia Wikidata

32Wikimedia Commons httpcommonswikimediaorgwikiMain_Page

and Freebase and interlinking them with detected poten-tially relevant multimedia data from online social network-ing sites Listing 2 shows an early-stage proposal for doingso The alerts can be exposed as triple pattern fragments toenable live querying at low cost This can also include pushpull and streaming models as Linked Data Fragments (Ver-borgh et al 2014) allow for all A further approach consistsin converting CAP messages to Linked Data by transform-ing the CAP eXtensible Markup Language (XML) format toResource Description Format (RDF) and publishing it

36 Implementation DetailsWe have created a publicly available prototypal demo appli-cation deployed33 at httpdisaster-monitorherokuappcom that internally connects to the SSE API from (Steiner2014a) It is implemented in Nodejs on the server and asa JavaScript Web application on the client This applicationuses an hourly refreshed version of the ldquomonitoring listrdquofrom Section 32 and whenever an edit event sent throughthe SSE API matches any of the articles in the list it checksif given this articlersquos and its language linksrsquo edit history of

33Source code httpsgithubcomtomayacpostdoctree

masterdemosdisaster-monitor

Figure 2 Screenshot of the Disaster Monitor ap-plication prototype available at httpdisaster-

monitorherokuappcom showing detected past disas-ters on a heatmap and a media gallery for a currentlyspiking disaster around ldquoHurricane Gonzalordquo

the past 24 hours the current edit event shows spiking be-havior as outlined in Section 33 The core source code ofthe monitoring loop can be seen in Section 3 a screenshotof the application is shown in Figure 2

(function()

fired whenever an edit event happens on any Wikipedia

var parseWikipediaEdit = function(data)

var article = datalanguage + rsquorsquo + dataarticle

var disasterObj = monitoringList[article]

the article is on the monitoring list

if (disasterObj)

showCandidateArticle(dataarticle datalanguage

disasterObj)

fired whenever an article is on the monitoring list

var showCandidateArticle = function(article language roles)

getGeoData(article language function(err geoData)

getRevisionsData(article language function(err

revisionsData)

if (revisionsDataspiking)

spiking article

if (geoDataaverageCoordinateslat)

geo-referenced article create map

trigger alert if article is spiking

)

)

getMonitoringList(seedArticle function(err data)

get the initial monitoring list

if (err) return consolelog(rsquoError initializing the apprsquo)

monitoringList = data

consolelog(rsquoMonitoring rsquo + Objectkeys(monitoringList)length

+ rsquo candidate Wikipedia articlesrsquo)

start monitoring process once we have a monitoring list

var wikiSource = new EventSource(wikipediaEdits)

wikiSourceaddEventListener(rsquomessagersquo function(e)

return parseWikipediaEdit(JSONparse(edata))

)

auto-refresh monitoring list every hour

setInterval(function()

getMonitoringList(seedArticle function(err data)

if (err) return consolelog(rsquoError refreshing monitoring

listrsquo)

monitoringList = data

consolelog(rsquoMonitoring rsquo + Objectkeys(monitoringList)

length +

rsquo candidate Wikipedia articlesrsquo)

)

1000 60 60)

)

)()

Listing 3 Monitoring loop of the disaster monitor

ltprotectvrule width0ptprotecthrefhttpexorgdisasterenHurricane_GonzalohttpexorgdisasterenHurricane_Gonzalogt owlsameAs

httpenwikipediaorgwikiHurricane_Gonzalo

httplivedbpediaorgpageHurricane_Gonzalo

httpwwwfreebasecomm0123kcg5

exrelatedMediaItems _video1

exrelatedMediaItems _photo1_video1 exmediaUrl httpsmtccdnvinecorvideos82796227091134303173323251712_2ca88ba54445116698738182474199804mp4

exmicropostUrl httptwittercomgpessoaostatus527603540860997632

exposterUrl httpsvcdnvinecorthumbs231E0009CF1134303174572797952_25116698738182474199804mp4jpg

expublicationDate 2014-10-30T031501Z

exsocialInteractions [ exlikes 1 exshares 0 ]

extimestamp 1414638901000

extype video

exuserProfileUrl httptwittercomalejandroriano

exmicropost [

exhtml Herersquos Hurricane Gonzalo as seen from the Space_Station as it orbited above today httpstcoRpJt0P2bXa

explainText Herersquos Hurricane Gonzalo as seen from the Space_Station as it orbited above today ]_photo1 exmediaUrl httpsuploadwikimediaorgwikipediacommonsbbbSchiffsanleger_Wittenbergen_-_Orkan_Gonzalojpg

exmicropostUrl httpscommonswikimediaorgwikiFileSchiffsanleger_Wittenbergen_-_Orkan_Gonzalo_(22102014)_01jpg

exposterUrl httpsuploadwikimediaorgwikipediacommonsthumbbbbSchiffsanleger_Wittenbergen_-_Orkan_Gonzalo_

282210201429_01jpg500px-Schiffsanleger_Wittenbergen_-_Orkan_Gonzalo_(22102014)_01jpg

expublicationDate 2014-10-24T084016Z

exsocialInteractions [ exshares 0 ]

extimestamp 1414140016000

extype photo

exuserProfileUrl httpscommonswikimediaorgwikiUserHuhu Uet

exmicropost [

exhtml Schiffsanleger Wittenbergen - Orkan Gonzalo (22102014) 01

explainText Schiffsanleger Wittenbergen - Orkan Gonzalo (22102014) 01 ]

Listing 2 Exemplary Linked Data for Hurricane Gonzalo using a yet to-be-defined vocabulary (potentially HXL httphxl

humanitarianresponseinfonsindexhtml or MOAC httpobservedchangecommoacns) that interlinks the disaster withseveral other Linked Data sources and relates it to multimedia content on online social networking sites

4 Proposed Steps Toward an EvaluationWe recall our core research questions that were Q1 Howtimely and accurate for the purpose of disaster detection andongoing monitoring is content from Wikipedia compared toauthoritative sources mentioned above and Q2 Does thedisambiguated nature of Wikipedia surpass keyword-baseddisaster detection approaches eg via online social net-working sites or search logs Regarding Q1 only a manualcomparison covering several months worth of disaster dataof the relevant authoritative data sources mentioned in Sec-tion 11 with the output of our system can help respond to thequestion Regarding Q2 we propose an evaluation strategyfor the OSN site Twitter loosely inspired by the approachof Sakaki et al in (Sakaki Okazaki and Matsuo 2010) Wechoose Twitter as a data source due to the publicly avail-able user data through its streaming APIs34 which wouldbe considerably harder if not impossible with other OSNsor search logs due to privacy concerns and API limitationsBased on the articles in the ldquomonitoring listrdquo we put forwardusing article titles as search terms but without disambigua-tion hints in parentheses eg instead of the complete articletitle ldquoTyphoon Rammasun (2014)rdquo we suggest using ldquoTy-phoon Rammasunrdquo alone We advise monitoring the sample

34Twitter streaming APIs httpsdevtwittercomdocs

streaming-apisstreamspublic

stream35 for the appearance of any of the search terms asthe filtered stream36 is too limited regarding the number ofsupported search terms In order to avoid ambiguity issueswith the international multi-language tweet stream we rec-ommend matching search terms only if the Twitter-detectedtweet language equals the search termrsquos language eg En-glish as in ldquoTyphoon Rammasunrdquo

5 Conclusions and Future WorkIn this paper we have presented the first steps of our ongoingresearch on the creation of a Wikipedia-based disaster mon-itoring system In particular we finished its underlying codescaffolding and connected the system to several online so-cial networking sites allowing for the automatic generationof media galleries Further we propose to publish data aboutdetected and monitored disasters as live queryable LinkedData which can be made accessible in a scalable and adhoc manner using triple pattern fragments (Verborgh et al2014) by leveraging free cloud hosting offers (Matteis andVerborgh 2014) While the system itself already functionsa good chunk of work still lies ahead with the fine-tuning of

35Twitter sample stream httpsdevtwittercomdocsapi

11getstatusessample36Twitter filtered stream httpsdevtwittercomdocsapi

11poststatusesfilter

its parameters A first examples are the exponential smooth-ing parameters of the revision intervals responsible for de-termining whether an article is spiking and thus a potentialnew disaster or not A second example is the role that dis-asters play with articles they can be inbound outbound ormutual links and their importance for actual occurrences ofdisasters will vary Future work will mainly focus on findinganswers to our research questions Q1 and Q2 and the verifi-cation of the hypotheses H1ndashH3 We will focus on the eval-uation of the systemrsquos usefulness accuracy and timelinessin comparison to other keyword-based approaches An inter-esting aspect of our work is that the monitoring system is notlimited to disasters Using an analogous approach we canmonitor for human-made disasters (called ldquoAnthropogenichazardrdquo on Wikipedia) like terrorism war power outagesair disasters etc We have created an exemplary ldquomonitor-ing listrdquo and made it available37

Concluding we are excited about this research and lookforward to putting the final system into operational practicein the weeks and months to come Be safe

References[Abel et al 2012] Abel F Hauff C Houben G-JStronkman R and Tao K 2012 Twitcident Fighting Firewith Information from Social Web Streams In Proceedingsof the 21st International Conference Companion on WorldWide Web WWW rsquo12 Companion 305ndash308 New YorkNY USA ACM

[Bao et al 2012] Bao P Hecht B Carton S Quaderi MHorn M and Gergle D 2012 Omnipedia Bridging thewikipedia language gap In Proceedings of the SIGCHI Con-ference on Human Factors in Computing Systems CHI rsquo121075ndash1084 New York NY USA ACM

[Berners-Lee Fielding and Masinter 2005] Berners-Lee TFielding R T and Masinter L 2005 Uniform ResourceIdentifier (URI) Generic Syntax RFC 3986 IETF

[Berners-Lee 2006] Berners-Lee T 2006 Linked Datahttpwwww3orgDesignIssuesLinkedDatahtml

[Falor 2011] Falor R 2011 Search data re-veals people turn to the internet in criseshttpbloggoogleorg201108search-data-reveals-people-turn-tohtml

[Fielding et al 1999] Fielding R Gettys J Mogul JFrystyk H Masinter L Leach P and Berners-Lee T1999 Hypertext Transfer Protocol ndash HTTP11 RFC 2616IETF

[Goodchild and Glennon 2010] Goodchild M F and Glen-non J A 2010 Crowdsourcing Geographic Informationfor Disaster Response A Research Frontier InternationalJournal of Digital Earth 3(3)231ndash241

[Heath and Bizer 2011] Heath T and Bizer C 2011Linked Data Evolving the Web into a Global Data Space

37Anthropogenic hazard ldquomonitoring listrdquo httpsgithub

comtomayacpostdocblobmasterpaperscomprehensive-

wikipedia-monitoring-for-global-and-realtime-disaster-

detectiondatamonitoring-list-anthropogenic-hazardjson

Synthesis Lectures on the Semantic Web Theory and Tech-nology Morgan amp Claypool

[Klyne and Carroll 2004] Klyne G and Carroll J J 2004Resource Description Framework (RDF) Concepts and Ab-stract Syntax Recommendation W3C

[Laframboise and Loko 2012] Laframboise N and LokoB 2012 Natural Disasters Mitigating Impact Man-aging Risks IMF Working Paper International Mon-etary Fund httpwwwimforgexternalpubsftwp

2012wp12245pdf[Lin and Mishne 2012] Lin J and Mishne G 2012 AStudy of ldquoChurnrdquo in Tweets and Real-Time Search Queries(Extended Version) CoRR abs12056855

[Liu et al 2008] Liu S B Palen L Sutton J HughesA L and Vieweg S 2008 In Search of the Bigger Pic-ture The Emergent Role of On-line Photo Sharing in Timesof Disaster In Proceedings of the Information Systems forCrisis Response and Management Conference (ISCRAM)

[Matteis and Verborgh 2014] Matteis L and Verborgh R2014 Hosting queryable and highly available Linked Datafor free In Proceedings of the ISWC Developers Workshop2014 co-located with the 13th International Semantic WebConference (ISWC 2014) Riva del Garda Italy October 192014 13ndash18

[Okolloh 2009] Okolloh O 2009 Ushahidi or ldquotestimonyrdquoWeb 20 Tools for Crowdsourcing Crisis Information Par-ticipatory learning and action 59(1)65ndash70

[Ortmann et al 2011] Ortmann J Limbu M Wang Dand Kauppinen T 2011 Crowdsourcing Linked OpenData for Disaster Management In Gruumltter R Kolas DKoubarakis M and Pfoser D eds Proceedings of theTerra Cognita Workshop on Foundations Technologies andApplications of the Geospatial Web In conjunction with theInternational Semantic Web Conference (ISWC2011) vol-ume 798 11ndash22 Bonn Germany CEUR Workshop Pro-ceedings

[Sakaki Okazaki and Matsuo 2010] Sakaki T OkazakiM and Matsuo Y 2010 Earthquake Shakes Twitter UsersReal-time Event Detection by Social Sensors In Proceed-ings of the 19th International Conference on World WideWeb WWW rsquo10 851ndash860 New York NY USA ACM

[Steiner et al 2013] Steiner T Verborgh R Gabarro JMannens E and Van de Walle R 2013 Clustering Me-dia Items Stemming from Multiple Social Networks TheComputer Journal

[Steiner 2014a] Steiner T 2014a Bots vs WikipediansAnons vs Logged-Ins (Redux) A Global Study of Edit Ac-tivity on Wikipedia and Wikidata In Proceedings of TheInternational Symposium on Open Collaboration OpenSymrsquo14 251ndash257 New York NY USA ACM

[Steiner 2014b] Steiner T 2014b Comprehensive Wiki-pedia monitoring for global and realtime natural disaster de-tection In Proceedings of the ISWC Developers Workshop2014 co-located with the 13th International Semantic WebConference (ISWC 2014) Riva del Garda Italy October 192014 86ndash95

[Steiner 2014c] Steiner T 2014c Enriching UnstructuredMedia Content About Events to Enable Semi-AutomatedSummaries Compilations and Improved Search by Lever-aging Social Networks PhD Dissertation UniversitatPolitegravecnica de Catalunya

[Verborgh et al 2014] Verborgh R Hartig O De MeesterB Haesendonck G De Vocht L Vander Sande M Cy-ganiak R Colpaert P Mannens E and Van de Walle R2014 Querying datasets on the web with high availability InMika P Tudorache T Bernstein A Welty C KnoblockC Vrandecic D Groth P Noy N Janowicz K andGoble C eds The Semantic Web ndash ISWC 2014 volume8796 of Lecture Notes in Computer Science Springer Inter-national Publishing 180ndash196

[Westfall 2010] Westfall J 2010 Common Alerting Pro-tocol Version 12 Standard OASIS httpdocsoasis-

openorgemergencycapv12CAP-v12-osdoc

  • 1 Introduction
    • 11 Disaster Monitoring A Global Challenge
    • 12 Hypotheses and Research Questions
      • 2 Related Work and Enabling Technologies
        • 21 Disaster Detection
        • 22 The Common Alerting Protocol
        • 23 Linked Data and Linked Data Principles
        • 24 Linked Data Fragments
          • 3 Proposed Methodology
            • 31 Leveraging Wikipedia Link Structure
            • 32 Identification of Wikipedia Articles for Monitoring
            • 33 Monitoring Process
            • 34 Multimedia Content from Online Social Networking Sites
            • 35 Linked Data Publication
            • 36 Implementation Details
              • 4 Proposed Steps Toward an Evaluation
              • 5 Conclusions and Future Work
Page 2: Disaster Monitoring with Wikipedia

H3 Link structure dynamics of Wikipedia provide fora meaningful way to detect future disasters ie disastersunknown at system creation time

These hypotheses lead us to the following research questionsthat we strive to answer in the near future

Q1 How timely and accurate is content from Wikipedia andonline social networking sites for the purpose of disasterdetection and ongoing monitoring compared to contentfrom authoritative and government sources

Q2 To what extent can the disambiguated nature of Wiki-pedia (things identified by URIs) improve on keyword-based disaster detection approaches eg via online socialnetwork sites or search logs

Q3 How much noise is introduced by full-text searches(which are not based on disambiguated URIs) for mul-timedia content on online social networking sites

The remainder of the article is structured as follows Firstwe discuss related work and enabling technologies in thenext section followed by our methodology in We de-scribe an evaluation strategy in and finally conclude withan outlook on future work in

2 Related Work and Enabling Technologies21 Disaster DetectionDigitally crowdsourced data for disaster detection and re-sponse has gained momentum in recent years as the Internethas proven resilient in times of crises compared to other in-frastructure Ryan Falor Crisis Response Product Managerat Google in 2011 remarks in (Falor 2011) that ldquoa substan-tial [ ] proportion of searches are directly related to thecrises and people continue to search and access informa-tion online even while traffic and search levels drop tem-porarily during and immediately following the crisesrdquo Inthe following we provide a non-exhaustive list of relatedwork on digitally crowdsourced disaster detection and re-sponse Sakaki Okazaki and Matsuo (2010) consider eachuser of the online social networking (OSN) site Twitter8

a sensor for the purpose of earthquake detection in JapanGoodchild and Glennon (2010) show how crowdsourcedgeodata from Wikipedia and Wikimapia9 ldquoa multilingualopen-content collaborative maprdquo can help complete author-itative data about disasters Abel et al (2012) describe a cri-sis monitoring system that extracts relevant content aboutknown disasters from Twitter Liu et al (2008) examinecommon patterns and norms of disaster coverage on thephoto sharing site Flickr10 Ortmann et al (2011) proposeto crowdsource Linked Open Data for disaster managementand also provide a good overview on well-known crowd-sourcing tools like Google Map Maker11 OpenStreetMap12

and Ushahidi (Okolloh 2009) We have developed a moni-toring system (Steiner 2014c) that detects news events from

8Twitter httpstwittercom9Wikimapia httpwikimapiaorg

10Flickr httpswwwflickrcom11Google Map Maker httpwwwgooglecommapmaker12OpenStreetMap httpwwwopenstreetmaporg

concurrent Wikipedia edits and auto-generates related multi-media galleries based on content from various OSN sites andWikimedia Commons13 Finally Lin and Mishne (2012) ex-amine realtime search query churn on Twitter including inthe context of disasters

22 The Common Alerting ProtocolTo facilitate collaboration a common protocol is essentialThe Common Alerting Protocol (CAP) (Westfall 2010) isan XML-based general data format for exchanging publicwarnings and emergencies between alerting technologiesCAP allows a warning message to be consistently dissem-inated simultaneously over many warning systems to manyapplications The protocol increases warning effectivenessand simplifies the task of activating a warning for officialsCAP also provides the capability to include multimedia datasuch as photos maps or videos Alerts can be geographi-cally targeted to a defined warning area An exemplary floodwarning CAP feed stemming from GDACS is shown in List-ing 1 The step from trees to graphs can be taken throughLinked Data which we introduce in the next section

23 Linked Data and Linked Data PrinciplesLinked Data (Berners-Lee 2006) defines a set of agreed-onbest practices and principles for interconnecting and pub-lishing structured data on the Web It uses Web technolo-gies like the Hypertext Transfer Protocol (HTTP Fielding etal 1999) and Unique Resource Identifiers (URIs Berners-Lee Fielding and Masinter 2005) to create typed links be-tween different sources The portal httplinkeddataorgdefines Linked Data as being ldquoabout using the Web to con-nect related data that wasnrsquot previously linked or using theWeb to lower the barriers to linking data currently linkedusing other methodsrdquo Tim Berners-Lee (2006) defined thefour rules for Linked Data in a W3C Design Issue as follows

1 Use URIs as names for things

2 Use HTTP URIs so that people can look up those names

3 When someone looks up a URI provide useful informa-tion using the standards (RDF SPARQL)

4 Include links to other URIs so that they can discover morethings

Linked Data uses RDF (Klyne and Carroll 2004) to createtyped links between things in the world The result is often-times referred to as the Web of Data RDF encodes state-ments about things in the form of (subject predicate

object) triples Heath and Bizer (2011) speak of RDF links

24 Linked Data FragmentsVarious access mechanisms to Linked Data exist on the Webeach of which comes with its own trade-offs regarding queryperformance freshness of data and server costavailabilityTo retrieve information about a specific subject you candereference its URL SPARQL endpoints allow to execute

13Wikimedia Commons httpscommonswikimediaorg

ltalert xmlns=urnoasisnamestcemergencycap12gt

ltidentifiergtGDACS_FL_4159_1ltidentifiergt

ltsendergtinfogdacsorgltsendergt ltsentgt2014-07-14T235959-0000ltsentgt

ltstatusgtActualltstatusgt ltmsgTypegtAlertltmsgTypegt

ltscopegtPublicltscopegt ltincidentsgt4159ltincidentsgt

ltinfogt

ltcategorygtGeoltcategorygtlteventgtFloodlteventgt

lturgencygtPastlturgencygtltseveritygtModerateltseveritygt

ltcertaintygtUnknownltcertaintygt

ltsenderNamegtGlobal Disaster Alert and Coordination SystemltsenderNamegt

ltheadline gtltdescription gt

ltwebgtprotectvrule width0ptprotecthrefhttpwwwgdacsorgreportsaspxeventype=FLhttpwwwgdacsorgreportsaspxeventype=FLamp

ampampeventid=4159ltwebgt

ltparametergtltvalueNamegteventidltvalueNamegtltvaluegt4159ltvaluegtltparametergt

ltparametergtltvalueNamegtcurrentepisodeidltvalueNamegtltvaluegt1ltvaluegtltparametergt

ltparametergtltvalueNamegtglideltvalueNamegtltvalue gtltparametergt

ltparametergtltvalueNamegtversionltvalueNamegtltvaluegt1ltvaluegtltparametergt

ltparametergtltvalueNamegtfromdateltvalueNamegtltvaluegtWed 21 May 2014 220000 GMTltvaluegtltparametergt

ltparametergtltvalueNamegttodateltvalueNamegtltvaluegtMon 14 Jul 2014 215959 GMTltvaluegtltparametergt

ltparametergtltvalueNamegteventtypeltvalueNamegtltvaluegtFLltvaluegtltparametergt

ltparametergtltvalueNamegtalertlevelltvalueNamegtltvaluegtGreenltvaluegtltparametergt

ltparametergtltvalueNamegtalerttypeltvalueNamegtltvaluegtautomaticltvaluegtltparametergt

ltparametergtltvalueNamegtlinkltvalueNamegtltvaluegtprotectvrule width0ptprotecthrefhttpwwwgdacsorgreportaspxeventtype=FLhttp

wwwgdacsorgreportaspxeventtype=FLampampampeventid=4159ltvaluegtltparametergt

ltparametergtltvalueNamegtcountryltvalueNamegtltvaluegtBrazilltvaluegtltparametergt

ltparametergtltvalueNamegteventnameltvalueNamegtltvalue gtltparametergt

ltparametergtltvalueNamegtseverityltvalueNamegtltvaluegtMagnitude 744ltvaluegtltparametergt

ltparametergtltvalueNamegtpopulationltvalueNamegtltvaluegt0 killed and 0 displacedltvaluegtltparametergt

ltparametergtltvalueNamegtvulnerabilityltvalueNamegtltvalue gtltparametergt

ltparametergtltvalueNamegtsourceidltvalueNamegtltvaluegtDFOltvaluegtltparametergt

ltparametergtltvalueNamegtiso3ltvalueNamegtltvalue gtltparametergt

ltparametergt

ltvalueNamegthazardcomponentsltvalueNamegtltvaluegtFLdead=0displaced=0main_cause=Heavy Rainseverity=2sqkm=25656457ltvaluegt

ltparametergt

ltparametergtltvalueNamegtdatemodifiedltvalueNamegtltvaluegtMon 01 Jan 0001 000000 GMTltvaluegtltparametergt

ltareagtltareaDescgtPolygonltareaDescgtltpolygongt100ltpolygongtltareagt

ltinfogt

ltalertgt

Listing 1 Common Alerting Protocol feed via the Global Disaster Alert and Coordination System (httpwwwgdacsorgxmlgdacs_capxml 2014-07-16)

complex queries on RDF data but they are not always avail-able While endpoints are more convenient for clients indi-vidual requests are considerably more expensive for serversAlternatively a data dump allows you to query locally

Linked Data Fragments (Verborgh et al 2014) providea uniform view on all such possible interfaces to LinkedData by describing each specific type of interface by thekind of fragments through which it allows access to thedataset Each fragment consists of three parts

data all triples of this dataset that match a specific selector

metadata triples that describe the dataset andor the LinkedData Fragment

controls hypermedia links andor forms that lead to otherLinked Data Fragments

This view allows to describe new interfaces with differenttrade-off combinations One such interface is triple patternfragments (Verborgh et al 2014) which enables users tohost Linked Data on low-cost servers with higher availability

than public SPARQL endpoints Such a light-weight mech-anism is ideal to expose live disaster monitoring data

3 Proposed Methodology31 Leveraging Wikipedia Link StructureWikipedia is an international online encyclopedia currentlyavailable in 287 languages14 with these characteristics

1 Articles in one language are interlinked with versionsof the same article in other languages eg the articleldquoNatural disasterrdquo on the English Wikipedia (httpenwikipediaorgwikiNatural_disaster) links to 74 ver-sions of this article in different languages15 We note thatthere exist similarities and differences among Wikipedias

14All Wikipedias httpmetawikimediaorgwikiList_of_

Wikipedias15Article language links httpenwikipediaorgwapiphp

action=queryampprop=langlinksamplllimit=maxamptitles=Natural_

disaster

with ldquosalient informationrdquo that is unique to each languageas well as more widely shared facts (Bao et al 2012)

2 Each article can have redirects ie alternative URLs thatpoint to the article For the English ldquoNatural disasterrdquo ar-ticle there are eight redirects16 eg ldquoNatural Hazardrdquo(synonym) ldquoExamples of natural disasterrdquo (refinement)or ldquoNatural disastersrdquo (plural)

3 For each article the list of back links that link to the cur-rent article is available ie inbound links other than redi-rects The article ldquoNatural disasterrdquo has more than 500 ar-ticles that link to it17 Likewise the list of outbound linksie other articles that the current article links to is avail-able18

By combining an articlersquos in- and outbound links we deter-mine the set of mutual links ie the set of articles that thecurrent article links to (outbound links) and at the same timereceives links from (inbound links)

32 Identification of Wikipedia Articles forMonitoring

Starting with the well-curated English seed article ldquoNaturaldisasterrdquo we programmatically follow each of the thereincontained links of type ldquoMain articlerdquo which leads to an ex-haustive list of English articles of concrete types of disasterseg ldquoTsunamirdquo (httpenwikipediaorgwikiTsunami)ldquoFloodrdquo (httpenwikipediaorgwikiFlood) ldquoEarth-quakerdquo (httpenwikipediaorgwikiEarthquake) etcIn total we obtain links to 20 English articles about differ-ent types of disasters19 For each of these English disastersarticles we obtain all versions of each article in differentlanguages [step (i) above] and of the resulting list of in-ternational articles in turn all their redirect URLs [step (ii)above] The intermediate result is a complete list of all (cur-rently 1270) articles in all Wikipedia languages and all theirredirects that have any type of disaster as their subject Wecall this list the ldquodisasters listrdquo and make it publicly avail-able in different formats (txt tsv and json) where theJSON version is the most flexible and recommended one20

Finally we obtain for each of the 1270 articles in the ldquodis-asters listrdquo all their back links ie their inbound links [step

16Article redirects httpenwikipediaorgw

apiphpaction=queryamplist=backlinksampblfilterredir=

redirectsampbllimit=maxampbltitle=Natural_disaster17Article inbound links httpenwikipediaorgwapi

phpaction=queryamplist=backlinksampbllimit=maxampblnamespace=

0ampbltitle=Natural_disaster18Article outbound links httpenwikipediaorgwapiphp

action=queryampprop=linksampplnamespace=0ampformat=jsonamppllimit=

maxamptitles=Natural_disaster19ldquoAvalancherdquo ldquoBlizzardrdquo ldquoCyclonerdquo ldquoDroughtrdquo ldquoEarth-

quakerdquo ldquoEpidemicrdquo ldquoExtratropical cyclonerdquo ldquoFloodrdquo ldquoGamma-ray burstrdquo ldquoHailrdquo ldquoHeat waverdquo ldquoImpact eventrdquo ldquoLimnic erup-tionrdquo ldquoMeteorological disasterrdquo ldquoSolar flarerdquo ldquoTornadordquo ldquoTropi-cal cyclonerdquo ldquoTsunamirdquo ldquoVolcanic eruptionrdquo ldquoWildfirerdquo

20ldquoDisasters listrdquo httpsgithubcomtomayacpostdoc

blobmasterpaperscomprehensive-wikipedia-monitoring-

for-global-and-realtime-natural-disaster-detectiondata

disasters-listjson

(iii) above] which serves to detect instances of disastersunknown at system creation time For example the articleldquoTyphoon Rammasun (2014)rdquo (httpenwikipediaorgwikiTyphoon_Rammasun_(2014))mdashwhich as a concrete in-stance of a disaster of type tropical cyclone is not containedin our ldquodisasters listrdquomdashlinks back to ldquoTropical cyclonerdquo(httpenwikipediaorgwikiTropical_cyclone) so wecan identify ldquoTyphoon Rammasun (2014)rdquo as related totropical cyclones (but not necessarily identify as a tropi-cal cyclone) even if at the systemrsquos creation time the ty-phoon did not exist yet Analog to the inbound links weobtain all outbound links of all articles in the ldquodisasterslistrdquo eg ldquoTropical cyclonerdquo has an outbound link to ldquo2014Pacific typhoon seasonrdquo (httpenwikipediaorgwiki2014_Pacific_typhoon_season) which also happens to bean inbound link of ldquoTropical cyclonerdquo so we have detecteda mutual circular link structure Figure 1 shows the examplein its entirety starting from the seed level to the disaster typelevel to the in-outbound link level The end result is a largelist called the ldquomonitoring listrdquo of all articles in all Wiki-pedia languages that are somehowmdashvia a redirect inboundor outbound link (or resulting mutual link)mdashrelated to anyof the articles in the ldquodisasters listrdquo We make a snapshot ofthis dynamic ldquomonitoring listrdquo available for reference21 butnote that it will be out-of-date soon and should be regener-ated on a regular basis The current version holds 141001different articles

33 Monitoring ProcessIn the past we have worked on a Server-Sent Events (SSE)API (Steiner 2014a) capable of monitoring realtime editingactivity on all language versions of Wikipedia This APIallows us to easily analyze Wikipedia edits by reacting onevents fired by the API Whenever an edit event occurs wecheck if it is for one of the articles on our ldquomonitoring listrdquoWe keep track of the historic one-day-window editing activ-ity for each article on the ldquomonitoring listrdquo including theirversions in other languages and upon a sudden spike ofediting activity trigger an alert about a potential new in-stance of a disaster type that the spiking article is an inboundor outbound link of (or both) To illustrate this if eg theGerman article ldquoPazifische Taifunsaison 2014rdquo including allof its language links is spiking we can infer that this is re-lated to a disaster of type ldquoTropical cyclonerdquo due to the de-tected mutual link structure mentioned earlier (Figure 1)

In order to detect spikes we apply exponential smooth-ing to the last n edit intervals (we require n ge 5) that oc-curred in the past 24 hours with a smoothing factor α = 05The therefore required edit events are retrieved programmat-ically via the Wikipedia API22 As a spike occurs when anedit interval gets ldquoshort enoughrdquo compared to historic edit-

21ldquoMonitoring listrdquo httpsgithubcomtomayacpostdoc

blobmasterpaperscomprehensive-wikipedia-monitoring-for-

global-and-realtime-disaster-detectiondatamonitoring-

listjson22Wikipedia last revisions httpenwikipediaorgw

apiphpaction=queryampprop=revisionsamprvlimit=6amprvprop=

timestamp|useramptitles=Typhoon_Rammasun_(2014)

enTyphoonRammasun (2014)

en2014 Pacifictyphoon season

enTropicalcyclone

enNaturaldisaster

enTropical storm

deTropischerWirbelsturm enDisaster

preparedness

(seed article)

(redirect)

(language link)

(inbound link)

(outbound link)

(mutual link)

seed level

disaster type level

in-outbound link level

Legend

dePazifischeTaifunsaison 2014

(inbound link)

English German

Figure 1 Extracted Wikipedia link structure (tiny excerpt) starting from the seed article ldquoNatural disasterrdquo

ing activity we report a spike whenever the latest edit inter-val is shorter than half a standard deviation 05timesσ

A subset of all Wikipedia articles are geo-referenced23

so when we detect a spiking article we try to obtain geo co-ordinates for the article itself (eg ldquoPazifische Taifunsaison2014rdquo) or any of its language links thatmdashas a consequenceof the assumption in H2mdashmay provide more local details(eg ldquo2014 Pacific typhoon seasonrdquo in English or ldquo2014年太平洋季rdquo in Chinese) We then calculate the center pointof all obtained latitudelongitude pairs

34 Multimedia Content from Online SocialNetworking Sites

In the past we have worked on an application called So-cial Media Illustrator (Steiner 2014c) that provides a so-cial multimedia search framework that enables searching forand extraction of multimedia data from the online socialnetworking sites Google+24 Facebook25 Twitter26 Insta-gram27 YouTube28 Flickr29 MobyPicture30 TwitPic31 and

23Article geo coordinates httpenwikipediaorgwapi

phpaction=queryampprop=coordinatesampformat=jsonampcolimit=

maxampcoprop=dim|country|region|globeampcoprimary=allamptitles=

September_11_attacks24Google+ httpsplusgooglecom25Facebook httpswwwfacebookcom26Twitter httpstwittercom27Instagram httpinstagramcom28YouTube httpwwwyoutubecom29Flickr httpwwwflickrcom30MobyPicture httpwwwmobypicturecom31TwitPic httptwitpiccom

Wikimedia Commons32 In a first step it deduplicates exact-and near-duplicate social multimedia data based on a previ-ously describe algorithm (Steiner et al 2013) It then rankssocial multimedia data by social signals (Steiner 2014c)based on an abstraction layer on top of the online social net-working sites mentioned above and in a final step allowsfor the creation of media galleries following aesthetic princi-ples (Steiner 2014c) of the two kinds Strict Order Equal Sizeand Loose Order Varying Size defined in (Steiner 2014c)We have ported crucial parts of the code of Social Media Il-lustrator from the client-side to the server-side enabling usnow to create media galleries at scale and on demand basedon the titles of spiking Wikipedia articles that are used asseparate search terms for each language The social mediacontent therefore does not have to link to Wikipedia Oneexemplary media gallery can be seen in Figure 2 each indi-vidual media item in the gallery is clickable and links backto the original post on the particular online social networkingsite allowing crisis responders to monitor the media galleryas a whole and to investigate interesting media items at thesource and potentially get in contact with the originator

35 Linked Data PublicationIn a final step once a given confidence threshold has beenreached and upon human inspection we plan to send outa notification according to the Common Alerting Protocolfollowing the format that (for GDACS) can be seen in List-ing 1 While Common Alerting Protocol messages are gen-erally well understood additional synergies can be unlockedby leveraging Linked Data sources like DBpedia Wikidata

32Wikimedia Commons httpcommonswikimediaorgwikiMain_Page

and Freebase and interlinking them with detected poten-tially relevant multimedia data from online social network-ing sites Listing 2 shows an early-stage proposal for doingso The alerts can be exposed as triple pattern fragments toenable live querying at low cost This can also include pushpull and streaming models as Linked Data Fragments (Ver-borgh et al 2014) allow for all A further approach consistsin converting CAP messages to Linked Data by transform-ing the CAP eXtensible Markup Language (XML) format toResource Description Format (RDF) and publishing it

36 Implementation DetailsWe have created a publicly available prototypal demo appli-cation deployed33 at httpdisaster-monitorherokuappcom that internally connects to the SSE API from (Steiner2014a) It is implemented in Nodejs on the server and asa JavaScript Web application on the client This applicationuses an hourly refreshed version of the ldquomonitoring listrdquofrom Section 32 and whenever an edit event sent throughthe SSE API matches any of the articles in the list it checksif given this articlersquos and its language linksrsquo edit history of

33Source code httpsgithubcomtomayacpostdoctree

masterdemosdisaster-monitor

Figure 2 Screenshot of the Disaster Monitor ap-plication prototype available at httpdisaster-

monitorherokuappcom showing detected past disas-ters on a heatmap and a media gallery for a currentlyspiking disaster around ldquoHurricane Gonzalordquo

the past 24 hours the current edit event shows spiking be-havior as outlined in Section 33 The core source code ofthe monitoring loop can be seen in Section 3 a screenshotof the application is shown in Figure 2

(function()

fired whenever an edit event happens on any Wikipedia

var parseWikipediaEdit = function(data)

var article = datalanguage + rsquorsquo + dataarticle

var disasterObj = monitoringList[article]

the article is on the monitoring list

if (disasterObj)

showCandidateArticle(dataarticle datalanguage

disasterObj)

fired whenever an article is on the monitoring list

var showCandidateArticle = function(article language roles)

getGeoData(article language function(err geoData)

getRevisionsData(article language function(err

revisionsData)

if (revisionsDataspiking)

spiking article

if (geoDataaverageCoordinateslat)

geo-referenced article create map

trigger alert if article is spiking

)

)

getMonitoringList(seedArticle function(err data)

get the initial monitoring list

if (err) return consolelog(rsquoError initializing the apprsquo)

monitoringList = data

consolelog(rsquoMonitoring rsquo + Objectkeys(monitoringList)length

+ rsquo candidate Wikipedia articlesrsquo)

start monitoring process once we have a monitoring list

var wikiSource = new EventSource(wikipediaEdits)

wikiSourceaddEventListener(rsquomessagersquo function(e)

return parseWikipediaEdit(JSONparse(edata))

)

auto-refresh monitoring list every hour

setInterval(function()

getMonitoringList(seedArticle function(err data)

if (err) return consolelog(rsquoError refreshing monitoring

listrsquo)

monitoringList = data

consolelog(rsquoMonitoring rsquo + Objectkeys(monitoringList)

length +

rsquo candidate Wikipedia articlesrsquo)

)

1000 60 60)

)

)()

Listing 3 Monitoring loop of the disaster monitor

ltprotectvrule width0ptprotecthrefhttpexorgdisasterenHurricane_GonzalohttpexorgdisasterenHurricane_Gonzalogt owlsameAs

httpenwikipediaorgwikiHurricane_Gonzalo

httplivedbpediaorgpageHurricane_Gonzalo

httpwwwfreebasecomm0123kcg5

exrelatedMediaItems _video1

exrelatedMediaItems _photo1_video1 exmediaUrl httpsmtccdnvinecorvideos82796227091134303173323251712_2ca88ba54445116698738182474199804mp4

exmicropostUrl httptwittercomgpessoaostatus527603540860997632

exposterUrl httpsvcdnvinecorthumbs231E0009CF1134303174572797952_25116698738182474199804mp4jpg

expublicationDate 2014-10-30T031501Z

exsocialInteractions [ exlikes 1 exshares 0 ]

extimestamp 1414638901000

extype video

exuserProfileUrl httptwittercomalejandroriano

exmicropost [

exhtml Herersquos Hurricane Gonzalo as seen from the Space_Station as it orbited above today httpstcoRpJt0P2bXa

explainText Herersquos Hurricane Gonzalo as seen from the Space_Station as it orbited above today ]_photo1 exmediaUrl httpsuploadwikimediaorgwikipediacommonsbbbSchiffsanleger_Wittenbergen_-_Orkan_Gonzalojpg

exmicropostUrl httpscommonswikimediaorgwikiFileSchiffsanleger_Wittenbergen_-_Orkan_Gonzalo_(22102014)_01jpg

exposterUrl httpsuploadwikimediaorgwikipediacommonsthumbbbbSchiffsanleger_Wittenbergen_-_Orkan_Gonzalo_

282210201429_01jpg500px-Schiffsanleger_Wittenbergen_-_Orkan_Gonzalo_(22102014)_01jpg

expublicationDate 2014-10-24T084016Z

exsocialInteractions [ exshares 0 ]

extimestamp 1414140016000

extype photo

exuserProfileUrl httpscommonswikimediaorgwikiUserHuhu Uet

exmicropost [

exhtml Schiffsanleger Wittenbergen - Orkan Gonzalo (22102014) 01

explainText Schiffsanleger Wittenbergen - Orkan Gonzalo (22102014) 01 ]

Listing 2 Exemplary Linked Data for Hurricane Gonzalo using a yet to-be-defined vocabulary (potentially HXL httphxl

humanitarianresponseinfonsindexhtml or MOAC httpobservedchangecommoacns) that interlinks the disaster withseveral other Linked Data sources and relates it to multimedia content on online social networking sites

4 Proposed Steps Toward an EvaluationWe recall our core research questions that were Q1 Howtimely and accurate for the purpose of disaster detection andongoing monitoring is content from Wikipedia compared toauthoritative sources mentioned above and Q2 Does thedisambiguated nature of Wikipedia surpass keyword-baseddisaster detection approaches eg via online social net-working sites or search logs Regarding Q1 only a manualcomparison covering several months worth of disaster dataof the relevant authoritative data sources mentioned in Sec-tion 11 with the output of our system can help respond to thequestion Regarding Q2 we propose an evaluation strategyfor the OSN site Twitter loosely inspired by the approachof Sakaki et al in (Sakaki Okazaki and Matsuo 2010) Wechoose Twitter as a data source due to the publicly avail-able user data through its streaming APIs34 which wouldbe considerably harder if not impossible with other OSNsor search logs due to privacy concerns and API limitationsBased on the articles in the ldquomonitoring listrdquo we put forwardusing article titles as search terms but without disambigua-tion hints in parentheses eg instead of the complete articletitle ldquoTyphoon Rammasun (2014)rdquo we suggest using ldquoTy-phoon Rammasunrdquo alone We advise monitoring the sample

34Twitter streaming APIs httpsdevtwittercomdocs

streaming-apisstreamspublic

stream35 for the appearance of any of the search terms asthe filtered stream36 is too limited regarding the number ofsupported search terms In order to avoid ambiguity issueswith the international multi-language tweet stream we rec-ommend matching search terms only if the Twitter-detectedtweet language equals the search termrsquos language eg En-glish as in ldquoTyphoon Rammasunrdquo

5 Conclusions and Future WorkIn this paper we have presented the first steps of our ongoingresearch on the creation of a Wikipedia-based disaster mon-itoring system In particular we finished its underlying codescaffolding and connected the system to several online so-cial networking sites allowing for the automatic generationof media galleries Further we propose to publish data aboutdetected and monitored disasters as live queryable LinkedData which can be made accessible in a scalable and adhoc manner using triple pattern fragments (Verborgh et al2014) by leveraging free cloud hosting offers (Matteis andVerborgh 2014) While the system itself already functionsa good chunk of work still lies ahead with the fine-tuning of

35Twitter sample stream httpsdevtwittercomdocsapi

11getstatusessample36Twitter filtered stream httpsdevtwittercomdocsapi

11poststatusesfilter

its parameters A first examples are the exponential smooth-ing parameters of the revision intervals responsible for de-termining whether an article is spiking and thus a potentialnew disaster or not A second example is the role that dis-asters play with articles they can be inbound outbound ormutual links and their importance for actual occurrences ofdisasters will vary Future work will mainly focus on findinganswers to our research questions Q1 and Q2 and the verifi-cation of the hypotheses H1ndashH3 We will focus on the eval-uation of the systemrsquos usefulness accuracy and timelinessin comparison to other keyword-based approaches An inter-esting aspect of our work is that the monitoring system is notlimited to disasters Using an analogous approach we canmonitor for human-made disasters (called ldquoAnthropogenichazardrdquo on Wikipedia) like terrorism war power outagesair disasters etc We have created an exemplary ldquomonitor-ing listrdquo and made it available37

Concluding we are excited about this research and lookforward to putting the final system into operational practicein the weeks and months to come Be safe

References[Abel et al 2012] Abel F Hauff C Houben G-JStronkman R and Tao K 2012 Twitcident Fighting Firewith Information from Social Web Streams In Proceedingsof the 21st International Conference Companion on WorldWide Web WWW rsquo12 Companion 305ndash308 New YorkNY USA ACM

[Bao et al 2012] Bao P Hecht B Carton S Quaderi MHorn M and Gergle D 2012 Omnipedia Bridging thewikipedia language gap In Proceedings of the SIGCHI Con-ference on Human Factors in Computing Systems CHI rsquo121075ndash1084 New York NY USA ACM

[Berners-Lee Fielding and Masinter 2005] Berners-Lee TFielding R T and Masinter L 2005 Uniform ResourceIdentifier (URI) Generic Syntax RFC 3986 IETF

[Berners-Lee 2006] Berners-Lee T 2006 Linked Datahttpwwww3orgDesignIssuesLinkedDatahtml

[Falor 2011] Falor R 2011 Search data re-veals people turn to the internet in criseshttpbloggoogleorg201108search-data-reveals-people-turn-tohtml

[Fielding et al 1999] Fielding R Gettys J Mogul JFrystyk H Masinter L Leach P and Berners-Lee T1999 Hypertext Transfer Protocol ndash HTTP11 RFC 2616IETF

[Goodchild and Glennon 2010] Goodchild M F and Glen-non J A 2010 Crowdsourcing Geographic Informationfor Disaster Response A Research Frontier InternationalJournal of Digital Earth 3(3)231ndash241

[Heath and Bizer 2011] Heath T and Bizer C 2011Linked Data Evolving the Web into a Global Data Space

37Anthropogenic hazard ldquomonitoring listrdquo httpsgithub

comtomayacpostdocblobmasterpaperscomprehensive-

wikipedia-monitoring-for-global-and-realtime-disaster-

detectiondatamonitoring-list-anthropogenic-hazardjson

Synthesis Lectures on the Semantic Web Theory and Tech-nology Morgan amp Claypool

[Klyne and Carroll 2004] Klyne G and Carroll J J 2004Resource Description Framework (RDF) Concepts and Ab-stract Syntax Recommendation W3C

[Laframboise and Loko 2012] Laframboise N and LokoB 2012 Natural Disasters Mitigating Impact Man-aging Risks IMF Working Paper International Mon-etary Fund httpwwwimforgexternalpubsftwp

2012wp12245pdf[Lin and Mishne 2012] Lin J and Mishne G 2012 AStudy of ldquoChurnrdquo in Tweets and Real-Time Search Queries(Extended Version) CoRR abs12056855

[Liu et al 2008] Liu S B Palen L Sutton J HughesA L and Vieweg S 2008 In Search of the Bigger Pic-ture The Emergent Role of On-line Photo Sharing in Timesof Disaster In Proceedings of the Information Systems forCrisis Response and Management Conference (ISCRAM)

[Matteis and Verborgh 2014] Matteis L and Verborgh R2014 Hosting queryable and highly available Linked Datafor free In Proceedings of the ISWC Developers Workshop2014 co-located with the 13th International Semantic WebConference (ISWC 2014) Riva del Garda Italy October 192014 13ndash18

[Okolloh 2009] Okolloh O 2009 Ushahidi or ldquotestimonyrdquoWeb 20 Tools for Crowdsourcing Crisis Information Par-ticipatory learning and action 59(1)65ndash70

[Ortmann et al 2011] Ortmann J Limbu M Wang Dand Kauppinen T 2011 Crowdsourcing Linked OpenData for Disaster Management In Gruumltter R Kolas DKoubarakis M and Pfoser D eds Proceedings of theTerra Cognita Workshop on Foundations Technologies andApplications of the Geospatial Web In conjunction with theInternational Semantic Web Conference (ISWC2011) vol-ume 798 11ndash22 Bonn Germany CEUR Workshop Pro-ceedings

[Sakaki Okazaki and Matsuo 2010] Sakaki T OkazakiM and Matsuo Y 2010 Earthquake Shakes Twitter UsersReal-time Event Detection by Social Sensors In Proceed-ings of the 19th International Conference on World WideWeb WWW rsquo10 851ndash860 New York NY USA ACM

[Steiner et al 2013] Steiner T Verborgh R Gabarro JMannens E and Van de Walle R 2013 Clustering Me-dia Items Stemming from Multiple Social Networks TheComputer Journal

[Steiner 2014a] Steiner T 2014a Bots vs WikipediansAnons vs Logged-Ins (Redux) A Global Study of Edit Ac-tivity on Wikipedia and Wikidata In Proceedings of TheInternational Symposium on Open Collaboration OpenSymrsquo14 251ndash257 New York NY USA ACM

[Steiner 2014b] Steiner T 2014b Comprehensive Wiki-pedia monitoring for global and realtime natural disaster de-tection In Proceedings of the ISWC Developers Workshop2014 co-located with the 13th International Semantic WebConference (ISWC 2014) Riva del Garda Italy October 192014 86ndash95

[Steiner 2014c] Steiner T 2014c Enriching UnstructuredMedia Content About Events to Enable Semi-AutomatedSummaries Compilations and Improved Search by Lever-aging Social Networks PhD Dissertation UniversitatPolitegravecnica de Catalunya

[Verborgh et al 2014] Verborgh R Hartig O De MeesterB Haesendonck G De Vocht L Vander Sande M Cy-ganiak R Colpaert P Mannens E and Van de Walle R2014 Querying datasets on the web with high availability InMika P Tudorache T Bernstein A Welty C KnoblockC Vrandecic D Groth P Noy N Janowicz K andGoble C eds The Semantic Web ndash ISWC 2014 volume8796 of Lecture Notes in Computer Science Springer Inter-national Publishing 180ndash196

[Westfall 2010] Westfall J 2010 Common Alerting Pro-tocol Version 12 Standard OASIS httpdocsoasis-

openorgemergencycapv12CAP-v12-osdoc

  • 1 Introduction
    • 11 Disaster Monitoring A Global Challenge
    • 12 Hypotheses and Research Questions
      • 2 Related Work and Enabling Technologies
        • 21 Disaster Detection
        • 22 The Common Alerting Protocol
        • 23 Linked Data and Linked Data Principles
        • 24 Linked Data Fragments
          • 3 Proposed Methodology
            • 31 Leveraging Wikipedia Link Structure
            • 32 Identification of Wikipedia Articles for Monitoring
            • 33 Monitoring Process
            • 34 Multimedia Content from Online Social Networking Sites
            • 35 Linked Data Publication
            • 36 Implementation Details
              • 4 Proposed Steps Toward an Evaluation
              • 5 Conclusions and Future Work
Page 3: Disaster Monitoring with Wikipedia

ltalert xmlns=urnoasisnamestcemergencycap12gt

ltidentifiergtGDACS_FL_4159_1ltidentifiergt

ltsendergtinfogdacsorgltsendergt ltsentgt2014-07-14T235959-0000ltsentgt

ltstatusgtActualltstatusgt ltmsgTypegtAlertltmsgTypegt

ltscopegtPublicltscopegt ltincidentsgt4159ltincidentsgt

ltinfogt

ltcategorygtGeoltcategorygtlteventgtFloodlteventgt

lturgencygtPastlturgencygtltseveritygtModerateltseveritygt

ltcertaintygtUnknownltcertaintygt

ltsenderNamegtGlobal Disaster Alert and Coordination SystemltsenderNamegt

ltheadline gtltdescription gt

ltwebgtprotectvrule width0ptprotecthrefhttpwwwgdacsorgreportsaspxeventype=FLhttpwwwgdacsorgreportsaspxeventype=FLamp

ampampeventid=4159ltwebgt

ltparametergtltvalueNamegteventidltvalueNamegtltvaluegt4159ltvaluegtltparametergt

ltparametergtltvalueNamegtcurrentepisodeidltvalueNamegtltvaluegt1ltvaluegtltparametergt

ltparametergtltvalueNamegtglideltvalueNamegtltvalue gtltparametergt

ltparametergtltvalueNamegtversionltvalueNamegtltvaluegt1ltvaluegtltparametergt

ltparametergtltvalueNamegtfromdateltvalueNamegtltvaluegtWed 21 May 2014 220000 GMTltvaluegtltparametergt

ltparametergtltvalueNamegttodateltvalueNamegtltvaluegtMon 14 Jul 2014 215959 GMTltvaluegtltparametergt

ltparametergtltvalueNamegteventtypeltvalueNamegtltvaluegtFLltvaluegtltparametergt

ltparametergtltvalueNamegtalertlevelltvalueNamegtltvaluegtGreenltvaluegtltparametergt

ltparametergtltvalueNamegtalerttypeltvalueNamegtltvaluegtautomaticltvaluegtltparametergt

ltparametergtltvalueNamegtlinkltvalueNamegtltvaluegtprotectvrule width0ptprotecthrefhttpwwwgdacsorgreportaspxeventtype=FLhttp

wwwgdacsorgreportaspxeventtype=FLampampampeventid=4159ltvaluegtltparametergt

ltparametergtltvalueNamegtcountryltvalueNamegtltvaluegtBrazilltvaluegtltparametergt

ltparametergtltvalueNamegteventnameltvalueNamegtltvalue gtltparametergt

ltparametergtltvalueNamegtseverityltvalueNamegtltvaluegtMagnitude 744ltvaluegtltparametergt

ltparametergtltvalueNamegtpopulationltvalueNamegtltvaluegt0 killed and 0 displacedltvaluegtltparametergt

ltparametergtltvalueNamegtvulnerabilityltvalueNamegtltvalue gtltparametergt

ltparametergtltvalueNamegtsourceidltvalueNamegtltvaluegtDFOltvaluegtltparametergt

ltparametergtltvalueNamegtiso3ltvalueNamegtltvalue gtltparametergt

ltparametergt

ltvalueNamegthazardcomponentsltvalueNamegtltvaluegtFLdead=0displaced=0main_cause=Heavy Rainseverity=2sqkm=25656457ltvaluegt

ltparametergt

ltparametergtltvalueNamegtdatemodifiedltvalueNamegtltvaluegtMon 01 Jan 0001 000000 GMTltvaluegtltparametergt

ltareagtltareaDescgtPolygonltareaDescgtltpolygongt100ltpolygongtltareagt

ltinfogt

ltalertgt

Listing 1 Common Alerting Protocol feed via the Global Disaster Alert and Coordination System (httpwwwgdacsorgxmlgdacs_capxml 2014-07-16)

complex queries on RDF data but they are not always avail-able While endpoints are more convenient for clients indi-vidual requests are considerably more expensive for serversAlternatively a data dump allows you to query locally

Linked Data Fragments (Verborgh et al 2014) providea uniform view on all such possible interfaces to LinkedData by describing each specific type of interface by thekind of fragments through which it allows access to thedataset Each fragment consists of three parts

data all triples of this dataset that match a specific selector

metadata triples that describe the dataset andor the LinkedData Fragment

controls hypermedia links andor forms that lead to otherLinked Data Fragments

This view allows to describe new interfaces with differenttrade-off combinations One such interface is triple patternfragments (Verborgh et al 2014) which enables users tohost Linked Data on low-cost servers with higher availability

than public SPARQL endpoints Such a light-weight mech-anism is ideal to expose live disaster monitoring data

3 Proposed Methodology31 Leveraging Wikipedia Link StructureWikipedia is an international online encyclopedia currentlyavailable in 287 languages14 with these characteristics

1 Articles in one language are interlinked with versionsof the same article in other languages eg the articleldquoNatural disasterrdquo on the English Wikipedia (httpenwikipediaorgwikiNatural_disaster) links to 74 ver-sions of this article in different languages15 We note thatthere exist similarities and differences among Wikipedias

14All Wikipedias httpmetawikimediaorgwikiList_of_

Wikipedias15Article language links httpenwikipediaorgwapiphp

action=queryampprop=langlinksamplllimit=maxamptitles=Natural_

disaster

with ldquosalient informationrdquo that is unique to each languageas well as more widely shared facts (Bao et al 2012)

2 Each article can have redirects ie alternative URLs thatpoint to the article For the English ldquoNatural disasterrdquo ar-ticle there are eight redirects16 eg ldquoNatural Hazardrdquo(synonym) ldquoExamples of natural disasterrdquo (refinement)or ldquoNatural disastersrdquo (plural)

3 For each article the list of back links that link to the cur-rent article is available ie inbound links other than redi-rects The article ldquoNatural disasterrdquo has more than 500 ar-ticles that link to it17 Likewise the list of outbound linksie other articles that the current article links to is avail-able18

By combining an articlersquos in- and outbound links we deter-mine the set of mutual links ie the set of articles that thecurrent article links to (outbound links) and at the same timereceives links from (inbound links)

32 Identification of Wikipedia Articles forMonitoring

Starting with the well-curated English seed article ldquoNaturaldisasterrdquo we programmatically follow each of the thereincontained links of type ldquoMain articlerdquo which leads to an ex-haustive list of English articles of concrete types of disasterseg ldquoTsunamirdquo (httpenwikipediaorgwikiTsunami)ldquoFloodrdquo (httpenwikipediaorgwikiFlood) ldquoEarth-quakerdquo (httpenwikipediaorgwikiEarthquake) etcIn total we obtain links to 20 English articles about differ-ent types of disasters19 For each of these English disastersarticles we obtain all versions of each article in differentlanguages [step (i) above] and of the resulting list of in-ternational articles in turn all their redirect URLs [step (ii)above] The intermediate result is a complete list of all (cur-rently 1270) articles in all Wikipedia languages and all theirredirects that have any type of disaster as their subject Wecall this list the ldquodisasters listrdquo and make it publicly avail-able in different formats (txt tsv and json) where theJSON version is the most flexible and recommended one20

Finally we obtain for each of the 1270 articles in the ldquodis-asters listrdquo all their back links ie their inbound links [step

16Article redirects httpenwikipediaorgw

apiphpaction=queryamplist=backlinksampblfilterredir=

redirectsampbllimit=maxampbltitle=Natural_disaster17Article inbound links httpenwikipediaorgwapi

phpaction=queryamplist=backlinksampbllimit=maxampblnamespace=

0ampbltitle=Natural_disaster18Article outbound links httpenwikipediaorgwapiphp

action=queryampprop=linksampplnamespace=0ampformat=jsonamppllimit=

maxamptitles=Natural_disaster19ldquoAvalancherdquo ldquoBlizzardrdquo ldquoCyclonerdquo ldquoDroughtrdquo ldquoEarth-

quakerdquo ldquoEpidemicrdquo ldquoExtratropical cyclonerdquo ldquoFloodrdquo ldquoGamma-ray burstrdquo ldquoHailrdquo ldquoHeat waverdquo ldquoImpact eventrdquo ldquoLimnic erup-tionrdquo ldquoMeteorological disasterrdquo ldquoSolar flarerdquo ldquoTornadordquo ldquoTropi-cal cyclonerdquo ldquoTsunamirdquo ldquoVolcanic eruptionrdquo ldquoWildfirerdquo

20ldquoDisasters listrdquo httpsgithubcomtomayacpostdoc

blobmasterpaperscomprehensive-wikipedia-monitoring-

for-global-and-realtime-natural-disaster-detectiondata

disasters-listjson

(iii) above] which serves to detect instances of disastersunknown at system creation time For example the articleldquoTyphoon Rammasun (2014)rdquo (httpenwikipediaorgwikiTyphoon_Rammasun_(2014))mdashwhich as a concrete in-stance of a disaster of type tropical cyclone is not containedin our ldquodisasters listrdquomdashlinks back to ldquoTropical cyclonerdquo(httpenwikipediaorgwikiTropical_cyclone) so wecan identify ldquoTyphoon Rammasun (2014)rdquo as related totropical cyclones (but not necessarily identify as a tropi-cal cyclone) even if at the systemrsquos creation time the ty-phoon did not exist yet Analog to the inbound links weobtain all outbound links of all articles in the ldquodisasterslistrdquo eg ldquoTropical cyclonerdquo has an outbound link to ldquo2014Pacific typhoon seasonrdquo (httpenwikipediaorgwiki2014_Pacific_typhoon_season) which also happens to bean inbound link of ldquoTropical cyclonerdquo so we have detecteda mutual circular link structure Figure 1 shows the examplein its entirety starting from the seed level to the disaster typelevel to the in-outbound link level The end result is a largelist called the ldquomonitoring listrdquo of all articles in all Wiki-pedia languages that are somehowmdashvia a redirect inboundor outbound link (or resulting mutual link)mdashrelated to anyof the articles in the ldquodisasters listrdquo We make a snapshot ofthis dynamic ldquomonitoring listrdquo available for reference21 butnote that it will be out-of-date soon and should be regener-ated on a regular basis The current version holds 141001different articles

33 Monitoring ProcessIn the past we have worked on a Server-Sent Events (SSE)API (Steiner 2014a) capable of monitoring realtime editingactivity on all language versions of Wikipedia This APIallows us to easily analyze Wikipedia edits by reacting onevents fired by the API Whenever an edit event occurs wecheck if it is for one of the articles on our ldquomonitoring listrdquoWe keep track of the historic one-day-window editing activ-ity for each article on the ldquomonitoring listrdquo including theirversions in other languages and upon a sudden spike ofediting activity trigger an alert about a potential new in-stance of a disaster type that the spiking article is an inboundor outbound link of (or both) To illustrate this if eg theGerman article ldquoPazifische Taifunsaison 2014rdquo including allof its language links is spiking we can infer that this is re-lated to a disaster of type ldquoTropical cyclonerdquo due to the de-tected mutual link structure mentioned earlier (Figure 1)

In order to detect spikes we apply exponential smooth-ing to the last n edit intervals (we require n ge 5) that oc-curred in the past 24 hours with a smoothing factor α = 05The therefore required edit events are retrieved programmat-ically via the Wikipedia API22 As a spike occurs when anedit interval gets ldquoshort enoughrdquo compared to historic edit-

21ldquoMonitoring listrdquo httpsgithubcomtomayacpostdoc

blobmasterpaperscomprehensive-wikipedia-monitoring-for-

global-and-realtime-disaster-detectiondatamonitoring-

listjson22Wikipedia last revisions httpenwikipediaorgw

apiphpaction=queryampprop=revisionsamprvlimit=6amprvprop=

timestamp|useramptitles=Typhoon_Rammasun_(2014)

enTyphoonRammasun (2014)

en2014 Pacifictyphoon season

enTropicalcyclone

enNaturaldisaster

enTropical storm

deTropischerWirbelsturm enDisaster

preparedness

(seed article)

(redirect)

(language link)

(inbound link)

(outbound link)

(mutual link)

seed level

disaster type level

in-outbound link level

Legend

dePazifischeTaifunsaison 2014

(inbound link)

English German

Figure 1 Extracted Wikipedia link structure (tiny excerpt) starting from the seed article ldquoNatural disasterrdquo

ing activity we report a spike whenever the latest edit inter-val is shorter than half a standard deviation 05timesσ

A subset of all Wikipedia articles are geo-referenced23

so when we detect a spiking article we try to obtain geo co-ordinates for the article itself (eg ldquoPazifische Taifunsaison2014rdquo) or any of its language links thatmdashas a consequenceof the assumption in H2mdashmay provide more local details(eg ldquo2014 Pacific typhoon seasonrdquo in English or ldquo2014年太平洋季rdquo in Chinese) We then calculate the center pointof all obtained latitudelongitude pairs

34 Multimedia Content from Online SocialNetworking Sites

In the past we have worked on an application called So-cial Media Illustrator (Steiner 2014c) that provides a so-cial multimedia search framework that enables searching forand extraction of multimedia data from the online socialnetworking sites Google+24 Facebook25 Twitter26 Insta-gram27 YouTube28 Flickr29 MobyPicture30 TwitPic31 and

23Article geo coordinates httpenwikipediaorgwapi

phpaction=queryampprop=coordinatesampformat=jsonampcolimit=

maxampcoprop=dim|country|region|globeampcoprimary=allamptitles=

September_11_attacks24Google+ httpsplusgooglecom25Facebook httpswwwfacebookcom26Twitter httpstwittercom27Instagram httpinstagramcom28YouTube httpwwwyoutubecom29Flickr httpwwwflickrcom30MobyPicture httpwwwmobypicturecom31TwitPic httptwitpiccom

Wikimedia Commons32 In a first step it deduplicates exact-and near-duplicate social multimedia data based on a previ-ously describe algorithm (Steiner et al 2013) It then rankssocial multimedia data by social signals (Steiner 2014c)based on an abstraction layer on top of the online social net-working sites mentioned above and in a final step allowsfor the creation of media galleries following aesthetic princi-ples (Steiner 2014c) of the two kinds Strict Order Equal Sizeand Loose Order Varying Size defined in (Steiner 2014c)We have ported crucial parts of the code of Social Media Il-lustrator from the client-side to the server-side enabling usnow to create media galleries at scale and on demand basedon the titles of spiking Wikipedia articles that are used asseparate search terms for each language The social mediacontent therefore does not have to link to Wikipedia Oneexemplary media gallery can be seen in Figure 2 each indi-vidual media item in the gallery is clickable and links backto the original post on the particular online social networkingsite allowing crisis responders to monitor the media galleryas a whole and to investigate interesting media items at thesource and potentially get in contact with the originator

35 Linked Data PublicationIn a final step once a given confidence threshold has beenreached and upon human inspection we plan to send outa notification according to the Common Alerting Protocolfollowing the format that (for GDACS) can be seen in List-ing 1 While Common Alerting Protocol messages are gen-erally well understood additional synergies can be unlockedby leveraging Linked Data sources like DBpedia Wikidata

32Wikimedia Commons httpcommonswikimediaorgwikiMain_Page

and Freebase and interlinking them with detected poten-tially relevant multimedia data from online social network-ing sites Listing 2 shows an early-stage proposal for doingso The alerts can be exposed as triple pattern fragments toenable live querying at low cost This can also include pushpull and streaming models as Linked Data Fragments (Ver-borgh et al 2014) allow for all A further approach consistsin converting CAP messages to Linked Data by transform-ing the CAP eXtensible Markup Language (XML) format toResource Description Format (RDF) and publishing it

36 Implementation DetailsWe have created a publicly available prototypal demo appli-cation deployed33 at httpdisaster-monitorherokuappcom that internally connects to the SSE API from (Steiner2014a) It is implemented in Nodejs on the server and asa JavaScript Web application on the client This applicationuses an hourly refreshed version of the ldquomonitoring listrdquofrom Section 32 and whenever an edit event sent throughthe SSE API matches any of the articles in the list it checksif given this articlersquos and its language linksrsquo edit history of

33Source code httpsgithubcomtomayacpostdoctree

masterdemosdisaster-monitor

Figure 2 Screenshot of the Disaster Monitor ap-plication prototype available at httpdisaster-

monitorherokuappcom showing detected past disas-ters on a heatmap and a media gallery for a currentlyspiking disaster around ldquoHurricane Gonzalordquo

the past 24 hours the current edit event shows spiking be-havior as outlined in Section 33 The core source code ofthe monitoring loop can be seen in Section 3 a screenshotof the application is shown in Figure 2

(function()

fired whenever an edit event happens on any Wikipedia

var parseWikipediaEdit = function(data)

var article = datalanguage + rsquorsquo + dataarticle

var disasterObj = monitoringList[article]

the article is on the monitoring list

if (disasterObj)

showCandidateArticle(dataarticle datalanguage

disasterObj)

fired whenever an article is on the monitoring list

var showCandidateArticle = function(article language roles)

getGeoData(article language function(err geoData)

getRevisionsData(article language function(err

revisionsData)

if (revisionsDataspiking)

spiking article

if (geoDataaverageCoordinateslat)

geo-referenced article create map

trigger alert if article is spiking

)

)

getMonitoringList(seedArticle function(err data)

get the initial monitoring list

if (err) return consolelog(rsquoError initializing the apprsquo)

monitoringList = data

consolelog(rsquoMonitoring rsquo + Objectkeys(monitoringList)length

+ rsquo candidate Wikipedia articlesrsquo)

start monitoring process once we have a monitoring list

var wikiSource = new EventSource(wikipediaEdits)

wikiSourceaddEventListener(rsquomessagersquo function(e)

return parseWikipediaEdit(JSONparse(edata))

)

auto-refresh monitoring list every hour

setInterval(function()

getMonitoringList(seedArticle function(err data)

if (err) return consolelog(rsquoError refreshing monitoring

listrsquo)

monitoringList = data

consolelog(rsquoMonitoring rsquo + Objectkeys(monitoringList)

length +

rsquo candidate Wikipedia articlesrsquo)

)

1000 60 60)

)

)()

Listing 3 Monitoring loop of the disaster monitor

ltprotectvrule width0ptprotecthrefhttpexorgdisasterenHurricane_GonzalohttpexorgdisasterenHurricane_Gonzalogt owlsameAs

httpenwikipediaorgwikiHurricane_Gonzalo

httplivedbpediaorgpageHurricane_Gonzalo

httpwwwfreebasecomm0123kcg5

exrelatedMediaItems _video1

exrelatedMediaItems _photo1_video1 exmediaUrl httpsmtccdnvinecorvideos82796227091134303173323251712_2ca88ba54445116698738182474199804mp4

exmicropostUrl httptwittercomgpessoaostatus527603540860997632

exposterUrl httpsvcdnvinecorthumbs231E0009CF1134303174572797952_25116698738182474199804mp4jpg

expublicationDate 2014-10-30T031501Z

exsocialInteractions [ exlikes 1 exshares 0 ]

extimestamp 1414638901000

extype video

exuserProfileUrl httptwittercomalejandroriano

exmicropost [

exhtml Herersquos Hurricane Gonzalo as seen from the Space_Station as it orbited above today httpstcoRpJt0P2bXa

explainText Herersquos Hurricane Gonzalo as seen from the Space_Station as it orbited above today ]_photo1 exmediaUrl httpsuploadwikimediaorgwikipediacommonsbbbSchiffsanleger_Wittenbergen_-_Orkan_Gonzalojpg

exmicropostUrl httpscommonswikimediaorgwikiFileSchiffsanleger_Wittenbergen_-_Orkan_Gonzalo_(22102014)_01jpg

exposterUrl httpsuploadwikimediaorgwikipediacommonsthumbbbbSchiffsanleger_Wittenbergen_-_Orkan_Gonzalo_

282210201429_01jpg500px-Schiffsanleger_Wittenbergen_-_Orkan_Gonzalo_(22102014)_01jpg

expublicationDate 2014-10-24T084016Z

exsocialInteractions [ exshares 0 ]

extimestamp 1414140016000

extype photo

exuserProfileUrl httpscommonswikimediaorgwikiUserHuhu Uet

exmicropost [

exhtml Schiffsanleger Wittenbergen - Orkan Gonzalo (22102014) 01

explainText Schiffsanleger Wittenbergen - Orkan Gonzalo (22102014) 01 ]

Listing 2 Exemplary Linked Data for Hurricane Gonzalo using a yet to-be-defined vocabulary (potentially HXL httphxl

humanitarianresponseinfonsindexhtml or MOAC httpobservedchangecommoacns) that interlinks the disaster withseveral other Linked Data sources and relates it to multimedia content on online social networking sites

4 Proposed Steps Toward an EvaluationWe recall our core research questions that were Q1 Howtimely and accurate for the purpose of disaster detection andongoing monitoring is content from Wikipedia compared toauthoritative sources mentioned above and Q2 Does thedisambiguated nature of Wikipedia surpass keyword-baseddisaster detection approaches eg via online social net-working sites or search logs Regarding Q1 only a manualcomparison covering several months worth of disaster dataof the relevant authoritative data sources mentioned in Sec-tion 11 with the output of our system can help respond to thequestion Regarding Q2 we propose an evaluation strategyfor the OSN site Twitter loosely inspired by the approachof Sakaki et al in (Sakaki Okazaki and Matsuo 2010) Wechoose Twitter as a data source due to the publicly avail-able user data through its streaming APIs34 which wouldbe considerably harder if not impossible with other OSNsor search logs due to privacy concerns and API limitationsBased on the articles in the ldquomonitoring listrdquo we put forwardusing article titles as search terms but without disambigua-tion hints in parentheses eg instead of the complete articletitle ldquoTyphoon Rammasun (2014)rdquo we suggest using ldquoTy-phoon Rammasunrdquo alone We advise monitoring the sample

34Twitter streaming APIs httpsdevtwittercomdocs

streaming-apisstreamspublic

stream35 for the appearance of any of the search terms asthe filtered stream36 is too limited regarding the number ofsupported search terms In order to avoid ambiguity issueswith the international multi-language tweet stream we rec-ommend matching search terms only if the Twitter-detectedtweet language equals the search termrsquos language eg En-glish as in ldquoTyphoon Rammasunrdquo

5 Conclusions and Future WorkIn this paper we have presented the first steps of our ongoingresearch on the creation of a Wikipedia-based disaster mon-itoring system In particular we finished its underlying codescaffolding and connected the system to several online so-cial networking sites allowing for the automatic generationof media galleries Further we propose to publish data aboutdetected and monitored disasters as live queryable LinkedData which can be made accessible in a scalable and adhoc manner using triple pattern fragments (Verborgh et al2014) by leveraging free cloud hosting offers (Matteis andVerborgh 2014) While the system itself already functionsa good chunk of work still lies ahead with the fine-tuning of

35Twitter sample stream httpsdevtwittercomdocsapi

11getstatusessample36Twitter filtered stream httpsdevtwittercomdocsapi

11poststatusesfilter

its parameters A first examples are the exponential smooth-ing parameters of the revision intervals responsible for de-termining whether an article is spiking and thus a potentialnew disaster or not A second example is the role that dis-asters play with articles they can be inbound outbound ormutual links and their importance for actual occurrences ofdisasters will vary Future work will mainly focus on findinganswers to our research questions Q1 and Q2 and the verifi-cation of the hypotheses H1ndashH3 We will focus on the eval-uation of the systemrsquos usefulness accuracy and timelinessin comparison to other keyword-based approaches An inter-esting aspect of our work is that the monitoring system is notlimited to disasters Using an analogous approach we canmonitor for human-made disasters (called ldquoAnthropogenichazardrdquo on Wikipedia) like terrorism war power outagesair disasters etc We have created an exemplary ldquomonitor-ing listrdquo and made it available37

Concluding we are excited about this research and lookforward to putting the final system into operational practicein the weeks and months to come Be safe

References[Abel et al 2012] Abel F Hauff C Houben G-JStronkman R and Tao K 2012 Twitcident Fighting Firewith Information from Social Web Streams In Proceedingsof the 21st International Conference Companion on WorldWide Web WWW rsquo12 Companion 305ndash308 New YorkNY USA ACM

[Bao et al 2012] Bao P Hecht B Carton S Quaderi MHorn M and Gergle D 2012 Omnipedia Bridging thewikipedia language gap In Proceedings of the SIGCHI Con-ference on Human Factors in Computing Systems CHI rsquo121075ndash1084 New York NY USA ACM

[Berners-Lee Fielding and Masinter 2005] Berners-Lee TFielding R T and Masinter L 2005 Uniform ResourceIdentifier (URI) Generic Syntax RFC 3986 IETF

[Berners-Lee 2006] Berners-Lee T 2006 Linked Datahttpwwww3orgDesignIssuesLinkedDatahtml

[Falor 2011] Falor R 2011 Search data re-veals people turn to the internet in criseshttpbloggoogleorg201108search-data-reveals-people-turn-tohtml

[Fielding et al 1999] Fielding R Gettys J Mogul JFrystyk H Masinter L Leach P and Berners-Lee T1999 Hypertext Transfer Protocol ndash HTTP11 RFC 2616IETF

[Goodchild and Glennon 2010] Goodchild M F and Glen-non J A 2010 Crowdsourcing Geographic Informationfor Disaster Response A Research Frontier InternationalJournal of Digital Earth 3(3)231ndash241

[Heath and Bizer 2011] Heath T and Bizer C 2011Linked Data Evolving the Web into a Global Data Space

37Anthropogenic hazard ldquomonitoring listrdquo httpsgithub

comtomayacpostdocblobmasterpaperscomprehensive-

wikipedia-monitoring-for-global-and-realtime-disaster-

detectiondatamonitoring-list-anthropogenic-hazardjson

Synthesis Lectures on the Semantic Web Theory and Tech-nology Morgan amp Claypool

[Klyne and Carroll 2004] Klyne G and Carroll J J 2004Resource Description Framework (RDF) Concepts and Ab-stract Syntax Recommendation W3C

[Laframboise and Loko 2012] Laframboise N and LokoB 2012 Natural Disasters Mitigating Impact Man-aging Risks IMF Working Paper International Mon-etary Fund httpwwwimforgexternalpubsftwp

2012wp12245pdf[Lin and Mishne 2012] Lin J and Mishne G 2012 AStudy of ldquoChurnrdquo in Tweets and Real-Time Search Queries(Extended Version) CoRR abs12056855

[Liu et al 2008] Liu S B Palen L Sutton J HughesA L and Vieweg S 2008 In Search of the Bigger Pic-ture The Emergent Role of On-line Photo Sharing in Timesof Disaster In Proceedings of the Information Systems forCrisis Response and Management Conference (ISCRAM)

[Matteis and Verborgh 2014] Matteis L and Verborgh R2014 Hosting queryable and highly available Linked Datafor free In Proceedings of the ISWC Developers Workshop2014 co-located with the 13th International Semantic WebConference (ISWC 2014) Riva del Garda Italy October 192014 13ndash18

[Okolloh 2009] Okolloh O 2009 Ushahidi or ldquotestimonyrdquoWeb 20 Tools for Crowdsourcing Crisis Information Par-ticipatory learning and action 59(1)65ndash70

[Ortmann et al 2011] Ortmann J Limbu M Wang Dand Kauppinen T 2011 Crowdsourcing Linked OpenData for Disaster Management In Gruumltter R Kolas DKoubarakis M and Pfoser D eds Proceedings of theTerra Cognita Workshop on Foundations Technologies andApplications of the Geospatial Web In conjunction with theInternational Semantic Web Conference (ISWC2011) vol-ume 798 11ndash22 Bonn Germany CEUR Workshop Pro-ceedings

[Sakaki Okazaki and Matsuo 2010] Sakaki T OkazakiM and Matsuo Y 2010 Earthquake Shakes Twitter UsersReal-time Event Detection by Social Sensors In Proceed-ings of the 19th International Conference on World WideWeb WWW rsquo10 851ndash860 New York NY USA ACM

[Steiner et al 2013] Steiner T Verborgh R Gabarro JMannens E and Van de Walle R 2013 Clustering Me-dia Items Stemming from Multiple Social Networks TheComputer Journal

[Steiner 2014a] Steiner T 2014a Bots vs WikipediansAnons vs Logged-Ins (Redux) A Global Study of Edit Ac-tivity on Wikipedia and Wikidata In Proceedings of TheInternational Symposium on Open Collaboration OpenSymrsquo14 251ndash257 New York NY USA ACM

[Steiner 2014b] Steiner T 2014b Comprehensive Wiki-pedia monitoring for global and realtime natural disaster de-tection In Proceedings of the ISWC Developers Workshop2014 co-located with the 13th International Semantic WebConference (ISWC 2014) Riva del Garda Italy October 192014 86ndash95

[Steiner 2014c] Steiner T 2014c Enriching UnstructuredMedia Content About Events to Enable Semi-AutomatedSummaries Compilations and Improved Search by Lever-aging Social Networks PhD Dissertation UniversitatPolitegravecnica de Catalunya

[Verborgh et al 2014] Verborgh R Hartig O De MeesterB Haesendonck G De Vocht L Vander Sande M Cy-ganiak R Colpaert P Mannens E and Van de Walle R2014 Querying datasets on the web with high availability InMika P Tudorache T Bernstein A Welty C KnoblockC Vrandecic D Groth P Noy N Janowicz K andGoble C eds The Semantic Web ndash ISWC 2014 volume8796 of Lecture Notes in Computer Science Springer Inter-national Publishing 180ndash196

[Westfall 2010] Westfall J 2010 Common Alerting Pro-tocol Version 12 Standard OASIS httpdocsoasis-

openorgemergencycapv12CAP-v12-osdoc

  • 1 Introduction
    • 11 Disaster Monitoring A Global Challenge
    • 12 Hypotheses and Research Questions
      • 2 Related Work and Enabling Technologies
        • 21 Disaster Detection
        • 22 The Common Alerting Protocol
        • 23 Linked Data and Linked Data Principles
        • 24 Linked Data Fragments
          • 3 Proposed Methodology
            • 31 Leveraging Wikipedia Link Structure
            • 32 Identification of Wikipedia Articles for Monitoring
            • 33 Monitoring Process
            • 34 Multimedia Content from Online Social Networking Sites
            • 35 Linked Data Publication
            • 36 Implementation Details
              • 4 Proposed Steps Toward an Evaluation
              • 5 Conclusions and Future Work
Page 4: Disaster Monitoring with Wikipedia

with ldquosalient informationrdquo that is unique to each languageas well as more widely shared facts (Bao et al 2012)

2 Each article can have redirects ie alternative URLs thatpoint to the article For the English ldquoNatural disasterrdquo ar-ticle there are eight redirects16 eg ldquoNatural Hazardrdquo(synonym) ldquoExamples of natural disasterrdquo (refinement)or ldquoNatural disastersrdquo (plural)

3 For each article the list of back links that link to the cur-rent article is available ie inbound links other than redi-rects The article ldquoNatural disasterrdquo has more than 500 ar-ticles that link to it17 Likewise the list of outbound linksie other articles that the current article links to is avail-able18

By combining an articlersquos in- and outbound links we deter-mine the set of mutual links ie the set of articles that thecurrent article links to (outbound links) and at the same timereceives links from (inbound links)

32 Identification of Wikipedia Articles forMonitoring

Starting with the well-curated English seed article ldquoNaturaldisasterrdquo we programmatically follow each of the thereincontained links of type ldquoMain articlerdquo which leads to an ex-haustive list of English articles of concrete types of disasterseg ldquoTsunamirdquo (httpenwikipediaorgwikiTsunami)ldquoFloodrdquo (httpenwikipediaorgwikiFlood) ldquoEarth-quakerdquo (httpenwikipediaorgwikiEarthquake) etcIn total we obtain links to 20 English articles about differ-ent types of disasters19 For each of these English disastersarticles we obtain all versions of each article in differentlanguages [step (i) above] and of the resulting list of in-ternational articles in turn all their redirect URLs [step (ii)above] The intermediate result is a complete list of all (cur-rently 1270) articles in all Wikipedia languages and all theirredirects that have any type of disaster as their subject Wecall this list the ldquodisasters listrdquo and make it publicly avail-able in different formats (txt tsv and json) where theJSON version is the most flexible and recommended one20

Finally we obtain for each of the 1270 articles in the ldquodis-asters listrdquo all their back links ie their inbound links [step

16Article redirects httpenwikipediaorgw

apiphpaction=queryamplist=backlinksampblfilterredir=

redirectsampbllimit=maxampbltitle=Natural_disaster17Article inbound links httpenwikipediaorgwapi

phpaction=queryamplist=backlinksampbllimit=maxampblnamespace=

0ampbltitle=Natural_disaster18Article outbound links httpenwikipediaorgwapiphp

action=queryampprop=linksampplnamespace=0ampformat=jsonamppllimit=

maxamptitles=Natural_disaster19ldquoAvalancherdquo ldquoBlizzardrdquo ldquoCyclonerdquo ldquoDroughtrdquo ldquoEarth-

quakerdquo ldquoEpidemicrdquo ldquoExtratropical cyclonerdquo ldquoFloodrdquo ldquoGamma-ray burstrdquo ldquoHailrdquo ldquoHeat waverdquo ldquoImpact eventrdquo ldquoLimnic erup-tionrdquo ldquoMeteorological disasterrdquo ldquoSolar flarerdquo ldquoTornadordquo ldquoTropi-cal cyclonerdquo ldquoTsunamirdquo ldquoVolcanic eruptionrdquo ldquoWildfirerdquo

20ldquoDisasters listrdquo httpsgithubcomtomayacpostdoc

blobmasterpaperscomprehensive-wikipedia-monitoring-

for-global-and-realtime-natural-disaster-detectiondata

disasters-listjson

(iii) above] which serves to detect instances of disastersunknown at system creation time For example the articleldquoTyphoon Rammasun (2014)rdquo (httpenwikipediaorgwikiTyphoon_Rammasun_(2014))mdashwhich as a concrete in-stance of a disaster of type tropical cyclone is not containedin our ldquodisasters listrdquomdashlinks back to ldquoTropical cyclonerdquo(httpenwikipediaorgwikiTropical_cyclone) so wecan identify ldquoTyphoon Rammasun (2014)rdquo as related totropical cyclones (but not necessarily identify as a tropi-cal cyclone) even if at the systemrsquos creation time the ty-phoon did not exist yet Analog to the inbound links weobtain all outbound links of all articles in the ldquodisasterslistrdquo eg ldquoTropical cyclonerdquo has an outbound link to ldquo2014Pacific typhoon seasonrdquo (httpenwikipediaorgwiki2014_Pacific_typhoon_season) which also happens to bean inbound link of ldquoTropical cyclonerdquo so we have detecteda mutual circular link structure Figure 1 shows the examplein its entirety starting from the seed level to the disaster typelevel to the in-outbound link level The end result is a largelist called the ldquomonitoring listrdquo of all articles in all Wiki-pedia languages that are somehowmdashvia a redirect inboundor outbound link (or resulting mutual link)mdashrelated to anyof the articles in the ldquodisasters listrdquo We make a snapshot ofthis dynamic ldquomonitoring listrdquo available for reference21 butnote that it will be out-of-date soon and should be regener-ated on a regular basis The current version holds 141001different articles

33 Monitoring ProcessIn the past we have worked on a Server-Sent Events (SSE)API (Steiner 2014a) capable of monitoring realtime editingactivity on all language versions of Wikipedia This APIallows us to easily analyze Wikipedia edits by reacting onevents fired by the API Whenever an edit event occurs wecheck if it is for one of the articles on our ldquomonitoring listrdquoWe keep track of the historic one-day-window editing activ-ity for each article on the ldquomonitoring listrdquo including theirversions in other languages and upon a sudden spike ofediting activity trigger an alert about a potential new in-stance of a disaster type that the spiking article is an inboundor outbound link of (or both) To illustrate this if eg theGerman article ldquoPazifische Taifunsaison 2014rdquo including allof its language links is spiking we can infer that this is re-lated to a disaster of type ldquoTropical cyclonerdquo due to the de-tected mutual link structure mentioned earlier (Figure 1)

In order to detect spikes we apply exponential smooth-ing to the last n edit intervals (we require n ge 5) that oc-curred in the past 24 hours with a smoothing factor α = 05The therefore required edit events are retrieved programmat-ically via the Wikipedia API22 As a spike occurs when anedit interval gets ldquoshort enoughrdquo compared to historic edit-

21ldquoMonitoring listrdquo httpsgithubcomtomayacpostdoc

blobmasterpaperscomprehensive-wikipedia-monitoring-for-

global-and-realtime-disaster-detectiondatamonitoring-

listjson22Wikipedia last revisions httpenwikipediaorgw

apiphpaction=queryampprop=revisionsamprvlimit=6amprvprop=

timestamp|useramptitles=Typhoon_Rammasun_(2014)

enTyphoonRammasun (2014)

en2014 Pacifictyphoon season

enTropicalcyclone

enNaturaldisaster

enTropical storm

deTropischerWirbelsturm enDisaster

preparedness

(seed article)

(redirect)

(language link)

(inbound link)

(outbound link)

(mutual link)

seed level

disaster type level

in-outbound link level

Legend

dePazifischeTaifunsaison 2014

(inbound link)

English German

Figure 1 Extracted Wikipedia link structure (tiny excerpt) starting from the seed article ldquoNatural disasterrdquo

ing activity we report a spike whenever the latest edit inter-val is shorter than half a standard deviation 05timesσ

A subset of all Wikipedia articles are geo-referenced23

so when we detect a spiking article we try to obtain geo co-ordinates for the article itself (eg ldquoPazifische Taifunsaison2014rdquo) or any of its language links thatmdashas a consequenceof the assumption in H2mdashmay provide more local details(eg ldquo2014 Pacific typhoon seasonrdquo in English or ldquo2014年太平洋季rdquo in Chinese) We then calculate the center pointof all obtained latitudelongitude pairs

34 Multimedia Content from Online SocialNetworking Sites

In the past we have worked on an application called So-cial Media Illustrator (Steiner 2014c) that provides a so-cial multimedia search framework that enables searching forand extraction of multimedia data from the online socialnetworking sites Google+24 Facebook25 Twitter26 Insta-gram27 YouTube28 Flickr29 MobyPicture30 TwitPic31 and

23Article geo coordinates httpenwikipediaorgwapi

phpaction=queryampprop=coordinatesampformat=jsonampcolimit=

maxampcoprop=dim|country|region|globeampcoprimary=allamptitles=

September_11_attacks24Google+ httpsplusgooglecom25Facebook httpswwwfacebookcom26Twitter httpstwittercom27Instagram httpinstagramcom28YouTube httpwwwyoutubecom29Flickr httpwwwflickrcom30MobyPicture httpwwwmobypicturecom31TwitPic httptwitpiccom

Wikimedia Commons32 In a first step it deduplicates exact-and near-duplicate social multimedia data based on a previ-ously describe algorithm (Steiner et al 2013) It then rankssocial multimedia data by social signals (Steiner 2014c)based on an abstraction layer on top of the online social net-working sites mentioned above and in a final step allowsfor the creation of media galleries following aesthetic princi-ples (Steiner 2014c) of the two kinds Strict Order Equal Sizeand Loose Order Varying Size defined in (Steiner 2014c)We have ported crucial parts of the code of Social Media Il-lustrator from the client-side to the server-side enabling usnow to create media galleries at scale and on demand basedon the titles of spiking Wikipedia articles that are used asseparate search terms for each language The social mediacontent therefore does not have to link to Wikipedia Oneexemplary media gallery can be seen in Figure 2 each indi-vidual media item in the gallery is clickable and links backto the original post on the particular online social networkingsite allowing crisis responders to monitor the media galleryas a whole and to investigate interesting media items at thesource and potentially get in contact with the originator

35 Linked Data PublicationIn a final step once a given confidence threshold has beenreached and upon human inspection we plan to send outa notification according to the Common Alerting Protocolfollowing the format that (for GDACS) can be seen in List-ing 1 While Common Alerting Protocol messages are gen-erally well understood additional synergies can be unlockedby leveraging Linked Data sources like DBpedia Wikidata

32Wikimedia Commons httpcommonswikimediaorgwikiMain_Page

and Freebase and interlinking them with detected poten-tially relevant multimedia data from online social network-ing sites Listing 2 shows an early-stage proposal for doingso The alerts can be exposed as triple pattern fragments toenable live querying at low cost This can also include pushpull and streaming models as Linked Data Fragments (Ver-borgh et al 2014) allow for all A further approach consistsin converting CAP messages to Linked Data by transform-ing the CAP eXtensible Markup Language (XML) format toResource Description Format (RDF) and publishing it

36 Implementation DetailsWe have created a publicly available prototypal demo appli-cation deployed33 at httpdisaster-monitorherokuappcom that internally connects to the SSE API from (Steiner2014a) It is implemented in Nodejs on the server and asa JavaScript Web application on the client This applicationuses an hourly refreshed version of the ldquomonitoring listrdquofrom Section 32 and whenever an edit event sent throughthe SSE API matches any of the articles in the list it checksif given this articlersquos and its language linksrsquo edit history of

33Source code httpsgithubcomtomayacpostdoctree

masterdemosdisaster-monitor

Figure 2 Screenshot of the Disaster Monitor ap-plication prototype available at httpdisaster-

monitorherokuappcom showing detected past disas-ters on a heatmap and a media gallery for a currentlyspiking disaster around ldquoHurricane Gonzalordquo

the past 24 hours the current edit event shows spiking be-havior as outlined in Section 33 The core source code ofthe monitoring loop can be seen in Section 3 a screenshotof the application is shown in Figure 2

(function()

fired whenever an edit event happens on any Wikipedia

var parseWikipediaEdit = function(data)

var article = datalanguage + rsquorsquo + dataarticle

var disasterObj = monitoringList[article]

the article is on the monitoring list

if (disasterObj)

showCandidateArticle(dataarticle datalanguage

disasterObj)

fired whenever an article is on the monitoring list

var showCandidateArticle = function(article language roles)

getGeoData(article language function(err geoData)

getRevisionsData(article language function(err

revisionsData)

if (revisionsDataspiking)

spiking article

if (geoDataaverageCoordinateslat)

geo-referenced article create map

trigger alert if article is spiking

)

)

getMonitoringList(seedArticle function(err data)

get the initial monitoring list

if (err) return consolelog(rsquoError initializing the apprsquo)

monitoringList = data

consolelog(rsquoMonitoring rsquo + Objectkeys(monitoringList)length

+ rsquo candidate Wikipedia articlesrsquo)

start monitoring process once we have a monitoring list

var wikiSource = new EventSource(wikipediaEdits)

wikiSourceaddEventListener(rsquomessagersquo function(e)

return parseWikipediaEdit(JSONparse(edata))

)

auto-refresh monitoring list every hour

setInterval(function()

getMonitoringList(seedArticle function(err data)

if (err) return consolelog(rsquoError refreshing monitoring

listrsquo)

monitoringList = data

consolelog(rsquoMonitoring rsquo + Objectkeys(monitoringList)

length +

rsquo candidate Wikipedia articlesrsquo)

)

1000 60 60)

)

)()

Listing 3 Monitoring loop of the disaster monitor

ltprotectvrule width0ptprotecthrefhttpexorgdisasterenHurricane_GonzalohttpexorgdisasterenHurricane_Gonzalogt owlsameAs

httpenwikipediaorgwikiHurricane_Gonzalo

httplivedbpediaorgpageHurricane_Gonzalo

httpwwwfreebasecomm0123kcg5

exrelatedMediaItems _video1

exrelatedMediaItems _photo1_video1 exmediaUrl httpsmtccdnvinecorvideos82796227091134303173323251712_2ca88ba54445116698738182474199804mp4

exmicropostUrl httptwittercomgpessoaostatus527603540860997632

exposterUrl httpsvcdnvinecorthumbs231E0009CF1134303174572797952_25116698738182474199804mp4jpg

expublicationDate 2014-10-30T031501Z

exsocialInteractions [ exlikes 1 exshares 0 ]

extimestamp 1414638901000

extype video

exuserProfileUrl httptwittercomalejandroriano

exmicropost [

exhtml Herersquos Hurricane Gonzalo as seen from the Space_Station as it orbited above today httpstcoRpJt0P2bXa

explainText Herersquos Hurricane Gonzalo as seen from the Space_Station as it orbited above today ]_photo1 exmediaUrl httpsuploadwikimediaorgwikipediacommonsbbbSchiffsanleger_Wittenbergen_-_Orkan_Gonzalojpg

exmicropostUrl httpscommonswikimediaorgwikiFileSchiffsanleger_Wittenbergen_-_Orkan_Gonzalo_(22102014)_01jpg

exposterUrl httpsuploadwikimediaorgwikipediacommonsthumbbbbSchiffsanleger_Wittenbergen_-_Orkan_Gonzalo_

282210201429_01jpg500px-Schiffsanleger_Wittenbergen_-_Orkan_Gonzalo_(22102014)_01jpg

expublicationDate 2014-10-24T084016Z

exsocialInteractions [ exshares 0 ]

extimestamp 1414140016000

extype photo

exuserProfileUrl httpscommonswikimediaorgwikiUserHuhu Uet

exmicropost [

exhtml Schiffsanleger Wittenbergen - Orkan Gonzalo (22102014) 01

explainText Schiffsanleger Wittenbergen - Orkan Gonzalo (22102014) 01 ]

Listing 2 Exemplary Linked Data for Hurricane Gonzalo using a yet to-be-defined vocabulary (potentially HXL httphxl

humanitarianresponseinfonsindexhtml or MOAC httpobservedchangecommoacns) that interlinks the disaster withseveral other Linked Data sources and relates it to multimedia content on online social networking sites

4 Proposed Steps Toward an EvaluationWe recall our core research questions that were Q1 Howtimely and accurate for the purpose of disaster detection andongoing monitoring is content from Wikipedia compared toauthoritative sources mentioned above and Q2 Does thedisambiguated nature of Wikipedia surpass keyword-baseddisaster detection approaches eg via online social net-working sites or search logs Regarding Q1 only a manualcomparison covering several months worth of disaster dataof the relevant authoritative data sources mentioned in Sec-tion 11 with the output of our system can help respond to thequestion Regarding Q2 we propose an evaluation strategyfor the OSN site Twitter loosely inspired by the approachof Sakaki et al in (Sakaki Okazaki and Matsuo 2010) Wechoose Twitter as a data source due to the publicly avail-able user data through its streaming APIs34 which wouldbe considerably harder if not impossible with other OSNsor search logs due to privacy concerns and API limitationsBased on the articles in the ldquomonitoring listrdquo we put forwardusing article titles as search terms but without disambigua-tion hints in parentheses eg instead of the complete articletitle ldquoTyphoon Rammasun (2014)rdquo we suggest using ldquoTy-phoon Rammasunrdquo alone We advise monitoring the sample

34Twitter streaming APIs httpsdevtwittercomdocs

streaming-apisstreamspublic

stream35 for the appearance of any of the search terms asthe filtered stream36 is too limited regarding the number ofsupported search terms In order to avoid ambiguity issueswith the international multi-language tweet stream we rec-ommend matching search terms only if the Twitter-detectedtweet language equals the search termrsquos language eg En-glish as in ldquoTyphoon Rammasunrdquo

5 Conclusions and Future WorkIn this paper we have presented the first steps of our ongoingresearch on the creation of a Wikipedia-based disaster mon-itoring system In particular we finished its underlying codescaffolding and connected the system to several online so-cial networking sites allowing for the automatic generationof media galleries Further we propose to publish data aboutdetected and monitored disasters as live queryable LinkedData which can be made accessible in a scalable and adhoc manner using triple pattern fragments (Verborgh et al2014) by leveraging free cloud hosting offers (Matteis andVerborgh 2014) While the system itself already functionsa good chunk of work still lies ahead with the fine-tuning of

35Twitter sample stream httpsdevtwittercomdocsapi

11getstatusessample36Twitter filtered stream httpsdevtwittercomdocsapi

11poststatusesfilter

its parameters A first examples are the exponential smooth-ing parameters of the revision intervals responsible for de-termining whether an article is spiking and thus a potentialnew disaster or not A second example is the role that dis-asters play with articles they can be inbound outbound ormutual links and their importance for actual occurrences ofdisasters will vary Future work will mainly focus on findinganswers to our research questions Q1 and Q2 and the verifi-cation of the hypotheses H1ndashH3 We will focus on the eval-uation of the systemrsquos usefulness accuracy and timelinessin comparison to other keyword-based approaches An inter-esting aspect of our work is that the monitoring system is notlimited to disasters Using an analogous approach we canmonitor for human-made disasters (called ldquoAnthropogenichazardrdquo on Wikipedia) like terrorism war power outagesair disasters etc We have created an exemplary ldquomonitor-ing listrdquo and made it available37

Concluding we are excited about this research and lookforward to putting the final system into operational practicein the weeks and months to come Be safe

References[Abel et al 2012] Abel F Hauff C Houben G-JStronkman R and Tao K 2012 Twitcident Fighting Firewith Information from Social Web Streams In Proceedingsof the 21st International Conference Companion on WorldWide Web WWW rsquo12 Companion 305ndash308 New YorkNY USA ACM

[Bao et al 2012] Bao P Hecht B Carton S Quaderi MHorn M and Gergle D 2012 Omnipedia Bridging thewikipedia language gap In Proceedings of the SIGCHI Con-ference on Human Factors in Computing Systems CHI rsquo121075ndash1084 New York NY USA ACM

[Berners-Lee Fielding and Masinter 2005] Berners-Lee TFielding R T and Masinter L 2005 Uniform ResourceIdentifier (URI) Generic Syntax RFC 3986 IETF

[Berners-Lee 2006] Berners-Lee T 2006 Linked Datahttpwwww3orgDesignIssuesLinkedDatahtml

[Falor 2011] Falor R 2011 Search data re-veals people turn to the internet in criseshttpbloggoogleorg201108search-data-reveals-people-turn-tohtml

[Fielding et al 1999] Fielding R Gettys J Mogul JFrystyk H Masinter L Leach P and Berners-Lee T1999 Hypertext Transfer Protocol ndash HTTP11 RFC 2616IETF

[Goodchild and Glennon 2010] Goodchild M F and Glen-non J A 2010 Crowdsourcing Geographic Informationfor Disaster Response A Research Frontier InternationalJournal of Digital Earth 3(3)231ndash241

[Heath and Bizer 2011] Heath T and Bizer C 2011Linked Data Evolving the Web into a Global Data Space

37Anthropogenic hazard ldquomonitoring listrdquo httpsgithub

comtomayacpostdocblobmasterpaperscomprehensive-

wikipedia-monitoring-for-global-and-realtime-disaster-

detectiondatamonitoring-list-anthropogenic-hazardjson

Synthesis Lectures on the Semantic Web Theory and Tech-nology Morgan amp Claypool

[Klyne and Carroll 2004] Klyne G and Carroll J J 2004Resource Description Framework (RDF) Concepts and Ab-stract Syntax Recommendation W3C

[Laframboise and Loko 2012] Laframboise N and LokoB 2012 Natural Disasters Mitigating Impact Man-aging Risks IMF Working Paper International Mon-etary Fund httpwwwimforgexternalpubsftwp

2012wp12245pdf[Lin and Mishne 2012] Lin J and Mishne G 2012 AStudy of ldquoChurnrdquo in Tweets and Real-Time Search Queries(Extended Version) CoRR abs12056855

[Liu et al 2008] Liu S B Palen L Sutton J HughesA L and Vieweg S 2008 In Search of the Bigger Pic-ture The Emergent Role of On-line Photo Sharing in Timesof Disaster In Proceedings of the Information Systems forCrisis Response and Management Conference (ISCRAM)

[Matteis and Verborgh 2014] Matteis L and Verborgh R2014 Hosting queryable and highly available Linked Datafor free In Proceedings of the ISWC Developers Workshop2014 co-located with the 13th International Semantic WebConference (ISWC 2014) Riva del Garda Italy October 192014 13ndash18

[Okolloh 2009] Okolloh O 2009 Ushahidi or ldquotestimonyrdquoWeb 20 Tools for Crowdsourcing Crisis Information Par-ticipatory learning and action 59(1)65ndash70

[Ortmann et al 2011] Ortmann J Limbu M Wang Dand Kauppinen T 2011 Crowdsourcing Linked OpenData for Disaster Management In Gruumltter R Kolas DKoubarakis M and Pfoser D eds Proceedings of theTerra Cognita Workshop on Foundations Technologies andApplications of the Geospatial Web In conjunction with theInternational Semantic Web Conference (ISWC2011) vol-ume 798 11ndash22 Bonn Germany CEUR Workshop Pro-ceedings

[Sakaki Okazaki and Matsuo 2010] Sakaki T OkazakiM and Matsuo Y 2010 Earthquake Shakes Twitter UsersReal-time Event Detection by Social Sensors In Proceed-ings of the 19th International Conference on World WideWeb WWW rsquo10 851ndash860 New York NY USA ACM

[Steiner et al 2013] Steiner T Verborgh R Gabarro JMannens E and Van de Walle R 2013 Clustering Me-dia Items Stemming from Multiple Social Networks TheComputer Journal

[Steiner 2014a] Steiner T 2014a Bots vs WikipediansAnons vs Logged-Ins (Redux) A Global Study of Edit Ac-tivity on Wikipedia and Wikidata In Proceedings of TheInternational Symposium on Open Collaboration OpenSymrsquo14 251ndash257 New York NY USA ACM

[Steiner 2014b] Steiner T 2014b Comprehensive Wiki-pedia monitoring for global and realtime natural disaster de-tection In Proceedings of the ISWC Developers Workshop2014 co-located with the 13th International Semantic WebConference (ISWC 2014) Riva del Garda Italy October 192014 86ndash95

[Steiner 2014c] Steiner T 2014c Enriching UnstructuredMedia Content About Events to Enable Semi-AutomatedSummaries Compilations and Improved Search by Lever-aging Social Networks PhD Dissertation UniversitatPolitegravecnica de Catalunya

[Verborgh et al 2014] Verborgh R Hartig O De MeesterB Haesendonck G De Vocht L Vander Sande M Cy-ganiak R Colpaert P Mannens E and Van de Walle R2014 Querying datasets on the web with high availability InMika P Tudorache T Bernstein A Welty C KnoblockC Vrandecic D Groth P Noy N Janowicz K andGoble C eds The Semantic Web ndash ISWC 2014 volume8796 of Lecture Notes in Computer Science Springer Inter-national Publishing 180ndash196

[Westfall 2010] Westfall J 2010 Common Alerting Pro-tocol Version 12 Standard OASIS httpdocsoasis-

openorgemergencycapv12CAP-v12-osdoc

  • 1 Introduction
    • 11 Disaster Monitoring A Global Challenge
    • 12 Hypotheses and Research Questions
      • 2 Related Work and Enabling Technologies
        • 21 Disaster Detection
        • 22 The Common Alerting Protocol
        • 23 Linked Data and Linked Data Principles
        • 24 Linked Data Fragments
          • 3 Proposed Methodology
            • 31 Leveraging Wikipedia Link Structure
            • 32 Identification of Wikipedia Articles for Monitoring
            • 33 Monitoring Process
            • 34 Multimedia Content from Online Social Networking Sites
            • 35 Linked Data Publication
            • 36 Implementation Details
              • 4 Proposed Steps Toward an Evaluation
              • 5 Conclusions and Future Work
Page 5: Disaster Monitoring with Wikipedia

enTyphoonRammasun (2014)

en2014 Pacifictyphoon season

enTropicalcyclone

enNaturaldisaster

enTropical storm

deTropischerWirbelsturm enDisaster

preparedness

(seed article)

(redirect)

(language link)

(inbound link)

(outbound link)

(mutual link)

seed level

disaster type level

in-outbound link level

Legend

dePazifischeTaifunsaison 2014

(inbound link)

English German

Figure 1 Extracted Wikipedia link structure (tiny excerpt) starting from the seed article ldquoNatural disasterrdquo

ing activity we report a spike whenever the latest edit inter-val is shorter than half a standard deviation 05timesσ

A subset of all Wikipedia articles are geo-referenced23

so when we detect a spiking article we try to obtain geo co-ordinates for the article itself (eg ldquoPazifische Taifunsaison2014rdquo) or any of its language links thatmdashas a consequenceof the assumption in H2mdashmay provide more local details(eg ldquo2014 Pacific typhoon seasonrdquo in English or ldquo2014年太平洋季rdquo in Chinese) We then calculate the center pointof all obtained latitudelongitude pairs

34 Multimedia Content from Online SocialNetworking Sites

In the past we have worked on an application called So-cial Media Illustrator (Steiner 2014c) that provides a so-cial multimedia search framework that enables searching forand extraction of multimedia data from the online socialnetworking sites Google+24 Facebook25 Twitter26 Insta-gram27 YouTube28 Flickr29 MobyPicture30 TwitPic31 and

23Article geo coordinates httpenwikipediaorgwapi

phpaction=queryampprop=coordinatesampformat=jsonampcolimit=

maxampcoprop=dim|country|region|globeampcoprimary=allamptitles=

September_11_attacks24Google+ httpsplusgooglecom25Facebook httpswwwfacebookcom26Twitter httpstwittercom27Instagram httpinstagramcom28YouTube httpwwwyoutubecom29Flickr httpwwwflickrcom30MobyPicture httpwwwmobypicturecom31TwitPic httptwitpiccom

Wikimedia Commons32 In a first step it deduplicates exact-and near-duplicate social multimedia data based on a previ-ously describe algorithm (Steiner et al 2013) It then rankssocial multimedia data by social signals (Steiner 2014c)based on an abstraction layer on top of the online social net-working sites mentioned above and in a final step allowsfor the creation of media galleries following aesthetic princi-ples (Steiner 2014c) of the two kinds Strict Order Equal Sizeand Loose Order Varying Size defined in (Steiner 2014c)We have ported crucial parts of the code of Social Media Il-lustrator from the client-side to the server-side enabling usnow to create media galleries at scale and on demand basedon the titles of spiking Wikipedia articles that are used asseparate search terms for each language The social mediacontent therefore does not have to link to Wikipedia Oneexemplary media gallery can be seen in Figure 2 each indi-vidual media item in the gallery is clickable and links backto the original post on the particular online social networkingsite allowing crisis responders to monitor the media galleryas a whole and to investigate interesting media items at thesource and potentially get in contact with the originator

35 Linked Data PublicationIn a final step once a given confidence threshold has beenreached and upon human inspection we plan to send outa notification according to the Common Alerting Protocolfollowing the format that (for GDACS) can be seen in List-ing 1 While Common Alerting Protocol messages are gen-erally well understood additional synergies can be unlockedby leveraging Linked Data sources like DBpedia Wikidata

32Wikimedia Commons httpcommonswikimediaorgwikiMain_Page

and Freebase and interlinking them with detected poten-tially relevant multimedia data from online social network-ing sites Listing 2 shows an early-stage proposal for doingso The alerts can be exposed as triple pattern fragments toenable live querying at low cost This can also include pushpull and streaming models as Linked Data Fragments (Ver-borgh et al 2014) allow for all A further approach consistsin converting CAP messages to Linked Data by transform-ing the CAP eXtensible Markup Language (XML) format toResource Description Format (RDF) and publishing it

36 Implementation DetailsWe have created a publicly available prototypal demo appli-cation deployed33 at httpdisaster-monitorherokuappcom that internally connects to the SSE API from (Steiner2014a) It is implemented in Nodejs on the server and asa JavaScript Web application on the client This applicationuses an hourly refreshed version of the ldquomonitoring listrdquofrom Section 32 and whenever an edit event sent throughthe SSE API matches any of the articles in the list it checksif given this articlersquos and its language linksrsquo edit history of

33Source code httpsgithubcomtomayacpostdoctree

masterdemosdisaster-monitor

Figure 2 Screenshot of the Disaster Monitor ap-plication prototype available at httpdisaster-

monitorherokuappcom showing detected past disas-ters on a heatmap and a media gallery for a currentlyspiking disaster around ldquoHurricane Gonzalordquo

the past 24 hours the current edit event shows spiking be-havior as outlined in Section 33 The core source code ofthe monitoring loop can be seen in Section 3 a screenshotof the application is shown in Figure 2

(function()

fired whenever an edit event happens on any Wikipedia

var parseWikipediaEdit = function(data)

var article = datalanguage + rsquorsquo + dataarticle

var disasterObj = monitoringList[article]

the article is on the monitoring list

if (disasterObj)

showCandidateArticle(dataarticle datalanguage

disasterObj)

fired whenever an article is on the monitoring list

var showCandidateArticle = function(article language roles)

getGeoData(article language function(err geoData)

getRevisionsData(article language function(err

revisionsData)

if (revisionsDataspiking)

spiking article

if (geoDataaverageCoordinateslat)

geo-referenced article create map

trigger alert if article is spiking

)

)

getMonitoringList(seedArticle function(err data)

get the initial monitoring list

if (err) return consolelog(rsquoError initializing the apprsquo)

monitoringList = data

consolelog(rsquoMonitoring rsquo + Objectkeys(monitoringList)length

+ rsquo candidate Wikipedia articlesrsquo)

start monitoring process once we have a monitoring list

var wikiSource = new EventSource(wikipediaEdits)

wikiSourceaddEventListener(rsquomessagersquo function(e)

return parseWikipediaEdit(JSONparse(edata))

)

auto-refresh monitoring list every hour

setInterval(function()

getMonitoringList(seedArticle function(err data)

if (err) return consolelog(rsquoError refreshing monitoring

listrsquo)

monitoringList = data

consolelog(rsquoMonitoring rsquo + Objectkeys(monitoringList)

length +

rsquo candidate Wikipedia articlesrsquo)

)

1000 60 60)

)

)()

Listing 3 Monitoring loop of the disaster monitor

ltprotectvrule width0ptprotecthrefhttpexorgdisasterenHurricane_GonzalohttpexorgdisasterenHurricane_Gonzalogt owlsameAs

httpenwikipediaorgwikiHurricane_Gonzalo

httplivedbpediaorgpageHurricane_Gonzalo

httpwwwfreebasecomm0123kcg5

exrelatedMediaItems _video1

exrelatedMediaItems _photo1_video1 exmediaUrl httpsmtccdnvinecorvideos82796227091134303173323251712_2ca88ba54445116698738182474199804mp4

exmicropostUrl httptwittercomgpessoaostatus527603540860997632

exposterUrl httpsvcdnvinecorthumbs231E0009CF1134303174572797952_25116698738182474199804mp4jpg

expublicationDate 2014-10-30T031501Z

exsocialInteractions [ exlikes 1 exshares 0 ]

extimestamp 1414638901000

extype video

exuserProfileUrl httptwittercomalejandroriano

exmicropost [

exhtml Herersquos Hurricane Gonzalo as seen from the Space_Station as it orbited above today httpstcoRpJt0P2bXa

explainText Herersquos Hurricane Gonzalo as seen from the Space_Station as it orbited above today ]_photo1 exmediaUrl httpsuploadwikimediaorgwikipediacommonsbbbSchiffsanleger_Wittenbergen_-_Orkan_Gonzalojpg

exmicropostUrl httpscommonswikimediaorgwikiFileSchiffsanleger_Wittenbergen_-_Orkan_Gonzalo_(22102014)_01jpg

exposterUrl httpsuploadwikimediaorgwikipediacommonsthumbbbbSchiffsanleger_Wittenbergen_-_Orkan_Gonzalo_

282210201429_01jpg500px-Schiffsanleger_Wittenbergen_-_Orkan_Gonzalo_(22102014)_01jpg

expublicationDate 2014-10-24T084016Z

exsocialInteractions [ exshares 0 ]

extimestamp 1414140016000

extype photo

exuserProfileUrl httpscommonswikimediaorgwikiUserHuhu Uet

exmicropost [

exhtml Schiffsanleger Wittenbergen - Orkan Gonzalo (22102014) 01

explainText Schiffsanleger Wittenbergen - Orkan Gonzalo (22102014) 01 ]

Listing 2 Exemplary Linked Data for Hurricane Gonzalo using a yet to-be-defined vocabulary (potentially HXL httphxl

humanitarianresponseinfonsindexhtml or MOAC httpobservedchangecommoacns) that interlinks the disaster withseveral other Linked Data sources and relates it to multimedia content on online social networking sites

4 Proposed Steps Toward an EvaluationWe recall our core research questions that were Q1 Howtimely and accurate for the purpose of disaster detection andongoing monitoring is content from Wikipedia compared toauthoritative sources mentioned above and Q2 Does thedisambiguated nature of Wikipedia surpass keyword-baseddisaster detection approaches eg via online social net-working sites or search logs Regarding Q1 only a manualcomparison covering several months worth of disaster dataof the relevant authoritative data sources mentioned in Sec-tion 11 with the output of our system can help respond to thequestion Regarding Q2 we propose an evaluation strategyfor the OSN site Twitter loosely inspired by the approachof Sakaki et al in (Sakaki Okazaki and Matsuo 2010) Wechoose Twitter as a data source due to the publicly avail-able user data through its streaming APIs34 which wouldbe considerably harder if not impossible with other OSNsor search logs due to privacy concerns and API limitationsBased on the articles in the ldquomonitoring listrdquo we put forwardusing article titles as search terms but without disambigua-tion hints in parentheses eg instead of the complete articletitle ldquoTyphoon Rammasun (2014)rdquo we suggest using ldquoTy-phoon Rammasunrdquo alone We advise monitoring the sample

34Twitter streaming APIs httpsdevtwittercomdocs

streaming-apisstreamspublic

stream35 for the appearance of any of the search terms asthe filtered stream36 is too limited regarding the number ofsupported search terms In order to avoid ambiguity issueswith the international multi-language tweet stream we rec-ommend matching search terms only if the Twitter-detectedtweet language equals the search termrsquos language eg En-glish as in ldquoTyphoon Rammasunrdquo

5 Conclusions and Future WorkIn this paper we have presented the first steps of our ongoingresearch on the creation of a Wikipedia-based disaster mon-itoring system In particular we finished its underlying codescaffolding and connected the system to several online so-cial networking sites allowing for the automatic generationof media galleries Further we propose to publish data aboutdetected and monitored disasters as live queryable LinkedData which can be made accessible in a scalable and adhoc manner using triple pattern fragments (Verborgh et al2014) by leveraging free cloud hosting offers (Matteis andVerborgh 2014) While the system itself already functionsa good chunk of work still lies ahead with the fine-tuning of

35Twitter sample stream httpsdevtwittercomdocsapi

11getstatusessample36Twitter filtered stream httpsdevtwittercomdocsapi

11poststatusesfilter

its parameters A first examples are the exponential smooth-ing parameters of the revision intervals responsible for de-termining whether an article is spiking and thus a potentialnew disaster or not A second example is the role that dis-asters play with articles they can be inbound outbound ormutual links and their importance for actual occurrences ofdisasters will vary Future work will mainly focus on findinganswers to our research questions Q1 and Q2 and the verifi-cation of the hypotheses H1ndashH3 We will focus on the eval-uation of the systemrsquos usefulness accuracy and timelinessin comparison to other keyword-based approaches An inter-esting aspect of our work is that the monitoring system is notlimited to disasters Using an analogous approach we canmonitor for human-made disasters (called ldquoAnthropogenichazardrdquo on Wikipedia) like terrorism war power outagesair disasters etc We have created an exemplary ldquomonitor-ing listrdquo and made it available37

Concluding we are excited about this research and lookforward to putting the final system into operational practicein the weeks and months to come Be safe

References[Abel et al 2012] Abel F Hauff C Houben G-JStronkman R and Tao K 2012 Twitcident Fighting Firewith Information from Social Web Streams In Proceedingsof the 21st International Conference Companion on WorldWide Web WWW rsquo12 Companion 305ndash308 New YorkNY USA ACM

[Bao et al 2012] Bao P Hecht B Carton S Quaderi MHorn M and Gergle D 2012 Omnipedia Bridging thewikipedia language gap In Proceedings of the SIGCHI Con-ference on Human Factors in Computing Systems CHI rsquo121075ndash1084 New York NY USA ACM

[Berners-Lee Fielding and Masinter 2005] Berners-Lee TFielding R T and Masinter L 2005 Uniform ResourceIdentifier (URI) Generic Syntax RFC 3986 IETF

[Berners-Lee 2006] Berners-Lee T 2006 Linked Datahttpwwww3orgDesignIssuesLinkedDatahtml

[Falor 2011] Falor R 2011 Search data re-veals people turn to the internet in criseshttpbloggoogleorg201108search-data-reveals-people-turn-tohtml

[Fielding et al 1999] Fielding R Gettys J Mogul JFrystyk H Masinter L Leach P and Berners-Lee T1999 Hypertext Transfer Protocol ndash HTTP11 RFC 2616IETF

[Goodchild and Glennon 2010] Goodchild M F and Glen-non J A 2010 Crowdsourcing Geographic Informationfor Disaster Response A Research Frontier InternationalJournal of Digital Earth 3(3)231ndash241

[Heath and Bizer 2011] Heath T and Bizer C 2011Linked Data Evolving the Web into a Global Data Space

37Anthropogenic hazard ldquomonitoring listrdquo httpsgithub

comtomayacpostdocblobmasterpaperscomprehensive-

wikipedia-monitoring-for-global-and-realtime-disaster-

detectiondatamonitoring-list-anthropogenic-hazardjson

Synthesis Lectures on the Semantic Web Theory and Tech-nology Morgan amp Claypool

[Klyne and Carroll 2004] Klyne G and Carroll J J 2004Resource Description Framework (RDF) Concepts and Ab-stract Syntax Recommendation W3C

[Laframboise and Loko 2012] Laframboise N and LokoB 2012 Natural Disasters Mitigating Impact Man-aging Risks IMF Working Paper International Mon-etary Fund httpwwwimforgexternalpubsftwp

2012wp12245pdf[Lin and Mishne 2012] Lin J and Mishne G 2012 AStudy of ldquoChurnrdquo in Tweets and Real-Time Search Queries(Extended Version) CoRR abs12056855

[Liu et al 2008] Liu S B Palen L Sutton J HughesA L and Vieweg S 2008 In Search of the Bigger Pic-ture The Emergent Role of On-line Photo Sharing in Timesof Disaster In Proceedings of the Information Systems forCrisis Response and Management Conference (ISCRAM)

[Matteis and Verborgh 2014] Matteis L and Verborgh R2014 Hosting queryable and highly available Linked Datafor free In Proceedings of the ISWC Developers Workshop2014 co-located with the 13th International Semantic WebConference (ISWC 2014) Riva del Garda Italy October 192014 13ndash18

[Okolloh 2009] Okolloh O 2009 Ushahidi or ldquotestimonyrdquoWeb 20 Tools for Crowdsourcing Crisis Information Par-ticipatory learning and action 59(1)65ndash70

[Ortmann et al 2011] Ortmann J Limbu M Wang Dand Kauppinen T 2011 Crowdsourcing Linked OpenData for Disaster Management In Gruumltter R Kolas DKoubarakis M and Pfoser D eds Proceedings of theTerra Cognita Workshop on Foundations Technologies andApplications of the Geospatial Web In conjunction with theInternational Semantic Web Conference (ISWC2011) vol-ume 798 11ndash22 Bonn Germany CEUR Workshop Pro-ceedings

[Sakaki Okazaki and Matsuo 2010] Sakaki T OkazakiM and Matsuo Y 2010 Earthquake Shakes Twitter UsersReal-time Event Detection by Social Sensors In Proceed-ings of the 19th International Conference on World WideWeb WWW rsquo10 851ndash860 New York NY USA ACM

[Steiner et al 2013] Steiner T Verborgh R Gabarro JMannens E and Van de Walle R 2013 Clustering Me-dia Items Stemming from Multiple Social Networks TheComputer Journal

[Steiner 2014a] Steiner T 2014a Bots vs WikipediansAnons vs Logged-Ins (Redux) A Global Study of Edit Ac-tivity on Wikipedia and Wikidata In Proceedings of TheInternational Symposium on Open Collaboration OpenSymrsquo14 251ndash257 New York NY USA ACM

[Steiner 2014b] Steiner T 2014b Comprehensive Wiki-pedia monitoring for global and realtime natural disaster de-tection In Proceedings of the ISWC Developers Workshop2014 co-located with the 13th International Semantic WebConference (ISWC 2014) Riva del Garda Italy October 192014 86ndash95

[Steiner 2014c] Steiner T 2014c Enriching UnstructuredMedia Content About Events to Enable Semi-AutomatedSummaries Compilations and Improved Search by Lever-aging Social Networks PhD Dissertation UniversitatPolitegravecnica de Catalunya

[Verborgh et al 2014] Verborgh R Hartig O De MeesterB Haesendonck G De Vocht L Vander Sande M Cy-ganiak R Colpaert P Mannens E and Van de Walle R2014 Querying datasets on the web with high availability InMika P Tudorache T Bernstein A Welty C KnoblockC Vrandecic D Groth P Noy N Janowicz K andGoble C eds The Semantic Web ndash ISWC 2014 volume8796 of Lecture Notes in Computer Science Springer Inter-national Publishing 180ndash196

[Westfall 2010] Westfall J 2010 Common Alerting Pro-tocol Version 12 Standard OASIS httpdocsoasis-

openorgemergencycapv12CAP-v12-osdoc

  • 1 Introduction
    • 11 Disaster Monitoring A Global Challenge
    • 12 Hypotheses and Research Questions
      • 2 Related Work and Enabling Technologies
        • 21 Disaster Detection
        • 22 The Common Alerting Protocol
        • 23 Linked Data and Linked Data Principles
        • 24 Linked Data Fragments
          • 3 Proposed Methodology
            • 31 Leveraging Wikipedia Link Structure
            • 32 Identification of Wikipedia Articles for Monitoring
            • 33 Monitoring Process
            • 34 Multimedia Content from Online Social Networking Sites
            • 35 Linked Data Publication
            • 36 Implementation Details
              • 4 Proposed Steps Toward an Evaluation
              • 5 Conclusions and Future Work
Page 6: Disaster Monitoring with Wikipedia

and Freebase and interlinking them with detected poten-tially relevant multimedia data from online social network-ing sites Listing 2 shows an early-stage proposal for doingso The alerts can be exposed as triple pattern fragments toenable live querying at low cost This can also include pushpull and streaming models as Linked Data Fragments (Ver-borgh et al 2014) allow for all A further approach consistsin converting CAP messages to Linked Data by transform-ing the CAP eXtensible Markup Language (XML) format toResource Description Format (RDF) and publishing it

36 Implementation DetailsWe have created a publicly available prototypal demo appli-cation deployed33 at httpdisaster-monitorherokuappcom that internally connects to the SSE API from (Steiner2014a) It is implemented in Nodejs on the server and asa JavaScript Web application on the client This applicationuses an hourly refreshed version of the ldquomonitoring listrdquofrom Section 32 and whenever an edit event sent throughthe SSE API matches any of the articles in the list it checksif given this articlersquos and its language linksrsquo edit history of

33Source code httpsgithubcomtomayacpostdoctree

masterdemosdisaster-monitor

Figure 2 Screenshot of the Disaster Monitor ap-plication prototype available at httpdisaster-

monitorherokuappcom showing detected past disas-ters on a heatmap and a media gallery for a currentlyspiking disaster around ldquoHurricane Gonzalordquo

the past 24 hours the current edit event shows spiking be-havior as outlined in Section 33 The core source code ofthe monitoring loop can be seen in Section 3 a screenshotof the application is shown in Figure 2

(function()

fired whenever an edit event happens on any Wikipedia

var parseWikipediaEdit = function(data)

var article = datalanguage + rsquorsquo + dataarticle

var disasterObj = monitoringList[article]

the article is on the monitoring list

if (disasterObj)

showCandidateArticle(dataarticle datalanguage

disasterObj)

fired whenever an article is on the monitoring list

var showCandidateArticle = function(article language roles)

getGeoData(article language function(err geoData)

getRevisionsData(article language function(err

revisionsData)

if (revisionsDataspiking)

spiking article

if (geoDataaverageCoordinateslat)

geo-referenced article create map

trigger alert if article is spiking

)

)

getMonitoringList(seedArticle function(err data)

get the initial monitoring list

if (err) return consolelog(rsquoError initializing the apprsquo)

monitoringList = data

consolelog(rsquoMonitoring rsquo + Objectkeys(monitoringList)length

+ rsquo candidate Wikipedia articlesrsquo)

start monitoring process once we have a monitoring list

var wikiSource = new EventSource(wikipediaEdits)

wikiSourceaddEventListener(rsquomessagersquo function(e)

return parseWikipediaEdit(JSONparse(edata))

)

auto-refresh monitoring list every hour

setInterval(function()

getMonitoringList(seedArticle function(err data)

if (err) return consolelog(rsquoError refreshing monitoring

listrsquo)

monitoringList = data

consolelog(rsquoMonitoring rsquo + Objectkeys(monitoringList)

length +

rsquo candidate Wikipedia articlesrsquo)

)

1000 60 60)

)

)()

Listing 3 Monitoring loop of the disaster monitor

ltprotectvrule width0ptprotecthrefhttpexorgdisasterenHurricane_GonzalohttpexorgdisasterenHurricane_Gonzalogt owlsameAs

httpenwikipediaorgwikiHurricane_Gonzalo

httplivedbpediaorgpageHurricane_Gonzalo

httpwwwfreebasecomm0123kcg5

exrelatedMediaItems _video1

exrelatedMediaItems _photo1_video1 exmediaUrl httpsmtccdnvinecorvideos82796227091134303173323251712_2ca88ba54445116698738182474199804mp4

exmicropostUrl httptwittercomgpessoaostatus527603540860997632

exposterUrl httpsvcdnvinecorthumbs231E0009CF1134303174572797952_25116698738182474199804mp4jpg

expublicationDate 2014-10-30T031501Z

exsocialInteractions [ exlikes 1 exshares 0 ]

extimestamp 1414638901000

extype video

exuserProfileUrl httptwittercomalejandroriano

exmicropost [

exhtml Herersquos Hurricane Gonzalo as seen from the Space_Station as it orbited above today httpstcoRpJt0P2bXa

explainText Herersquos Hurricane Gonzalo as seen from the Space_Station as it orbited above today ]_photo1 exmediaUrl httpsuploadwikimediaorgwikipediacommonsbbbSchiffsanleger_Wittenbergen_-_Orkan_Gonzalojpg

exmicropostUrl httpscommonswikimediaorgwikiFileSchiffsanleger_Wittenbergen_-_Orkan_Gonzalo_(22102014)_01jpg

exposterUrl httpsuploadwikimediaorgwikipediacommonsthumbbbbSchiffsanleger_Wittenbergen_-_Orkan_Gonzalo_

282210201429_01jpg500px-Schiffsanleger_Wittenbergen_-_Orkan_Gonzalo_(22102014)_01jpg

expublicationDate 2014-10-24T084016Z

exsocialInteractions [ exshares 0 ]

extimestamp 1414140016000

extype photo

exuserProfileUrl httpscommonswikimediaorgwikiUserHuhu Uet

exmicropost [

exhtml Schiffsanleger Wittenbergen - Orkan Gonzalo (22102014) 01

explainText Schiffsanleger Wittenbergen - Orkan Gonzalo (22102014) 01 ]

Listing 2 Exemplary Linked Data for Hurricane Gonzalo using a yet to-be-defined vocabulary (potentially HXL httphxl

humanitarianresponseinfonsindexhtml or MOAC httpobservedchangecommoacns) that interlinks the disaster withseveral other Linked Data sources and relates it to multimedia content on online social networking sites

4 Proposed Steps Toward an EvaluationWe recall our core research questions that were Q1 Howtimely and accurate for the purpose of disaster detection andongoing monitoring is content from Wikipedia compared toauthoritative sources mentioned above and Q2 Does thedisambiguated nature of Wikipedia surpass keyword-baseddisaster detection approaches eg via online social net-working sites or search logs Regarding Q1 only a manualcomparison covering several months worth of disaster dataof the relevant authoritative data sources mentioned in Sec-tion 11 with the output of our system can help respond to thequestion Regarding Q2 we propose an evaluation strategyfor the OSN site Twitter loosely inspired by the approachof Sakaki et al in (Sakaki Okazaki and Matsuo 2010) Wechoose Twitter as a data source due to the publicly avail-able user data through its streaming APIs34 which wouldbe considerably harder if not impossible with other OSNsor search logs due to privacy concerns and API limitationsBased on the articles in the ldquomonitoring listrdquo we put forwardusing article titles as search terms but without disambigua-tion hints in parentheses eg instead of the complete articletitle ldquoTyphoon Rammasun (2014)rdquo we suggest using ldquoTy-phoon Rammasunrdquo alone We advise monitoring the sample

34Twitter streaming APIs httpsdevtwittercomdocs

streaming-apisstreamspublic

stream35 for the appearance of any of the search terms asthe filtered stream36 is too limited regarding the number ofsupported search terms In order to avoid ambiguity issueswith the international multi-language tweet stream we rec-ommend matching search terms only if the Twitter-detectedtweet language equals the search termrsquos language eg En-glish as in ldquoTyphoon Rammasunrdquo

5 Conclusions and Future WorkIn this paper we have presented the first steps of our ongoingresearch on the creation of a Wikipedia-based disaster mon-itoring system In particular we finished its underlying codescaffolding and connected the system to several online so-cial networking sites allowing for the automatic generationof media galleries Further we propose to publish data aboutdetected and monitored disasters as live queryable LinkedData which can be made accessible in a scalable and adhoc manner using triple pattern fragments (Verborgh et al2014) by leveraging free cloud hosting offers (Matteis andVerborgh 2014) While the system itself already functionsa good chunk of work still lies ahead with the fine-tuning of

35Twitter sample stream httpsdevtwittercomdocsapi

11getstatusessample36Twitter filtered stream httpsdevtwittercomdocsapi

11poststatusesfilter

its parameters A first examples are the exponential smooth-ing parameters of the revision intervals responsible for de-termining whether an article is spiking and thus a potentialnew disaster or not A second example is the role that dis-asters play with articles they can be inbound outbound ormutual links and their importance for actual occurrences ofdisasters will vary Future work will mainly focus on findinganswers to our research questions Q1 and Q2 and the verifi-cation of the hypotheses H1ndashH3 We will focus on the eval-uation of the systemrsquos usefulness accuracy and timelinessin comparison to other keyword-based approaches An inter-esting aspect of our work is that the monitoring system is notlimited to disasters Using an analogous approach we canmonitor for human-made disasters (called ldquoAnthropogenichazardrdquo on Wikipedia) like terrorism war power outagesair disasters etc We have created an exemplary ldquomonitor-ing listrdquo and made it available37

Concluding we are excited about this research and lookforward to putting the final system into operational practicein the weeks and months to come Be safe

References[Abel et al 2012] Abel F Hauff C Houben G-JStronkman R and Tao K 2012 Twitcident Fighting Firewith Information from Social Web Streams In Proceedingsof the 21st International Conference Companion on WorldWide Web WWW rsquo12 Companion 305ndash308 New YorkNY USA ACM

[Bao et al 2012] Bao P Hecht B Carton S Quaderi MHorn M and Gergle D 2012 Omnipedia Bridging thewikipedia language gap In Proceedings of the SIGCHI Con-ference on Human Factors in Computing Systems CHI rsquo121075ndash1084 New York NY USA ACM

[Berners-Lee Fielding and Masinter 2005] Berners-Lee TFielding R T and Masinter L 2005 Uniform ResourceIdentifier (URI) Generic Syntax RFC 3986 IETF

[Berners-Lee 2006] Berners-Lee T 2006 Linked Datahttpwwww3orgDesignIssuesLinkedDatahtml

[Falor 2011] Falor R 2011 Search data re-veals people turn to the internet in criseshttpbloggoogleorg201108search-data-reveals-people-turn-tohtml

[Fielding et al 1999] Fielding R Gettys J Mogul JFrystyk H Masinter L Leach P and Berners-Lee T1999 Hypertext Transfer Protocol ndash HTTP11 RFC 2616IETF

[Goodchild and Glennon 2010] Goodchild M F and Glen-non J A 2010 Crowdsourcing Geographic Informationfor Disaster Response A Research Frontier InternationalJournal of Digital Earth 3(3)231ndash241

[Heath and Bizer 2011] Heath T and Bizer C 2011Linked Data Evolving the Web into a Global Data Space

37Anthropogenic hazard ldquomonitoring listrdquo httpsgithub

comtomayacpostdocblobmasterpaperscomprehensive-

wikipedia-monitoring-for-global-and-realtime-disaster-

detectiondatamonitoring-list-anthropogenic-hazardjson

Synthesis Lectures on the Semantic Web Theory and Tech-nology Morgan amp Claypool

[Klyne and Carroll 2004] Klyne G and Carroll J J 2004Resource Description Framework (RDF) Concepts and Ab-stract Syntax Recommendation W3C

[Laframboise and Loko 2012] Laframboise N and LokoB 2012 Natural Disasters Mitigating Impact Man-aging Risks IMF Working Paper International Mon-etary Fund httpwwwimforgexternalpubsftwp

2012wp12245pdf[Lin and Mishne 2012] Lin J and Mishne G 2012 AStudy of ldquoChurnrdquo in Tweets and Real-Time Search Queries(Extended Version) CoRR abs12056855

[Liu et al 2008] Liu S B Palen L Sutton J HughesA L and Vieweg S 2008 In Search of the Bigger Pic-ture The Emergent Role of On-line Photo Sharing in Timesof Disaster In Proceedings of the Information Systems forCrisis Response and Management Conference (ISCRAM)

[Matteis and Verborgh 2014] Matteis L and Verborgh R2014 Hosting queryable and highly available Linked Datafor free In Proceedings of the ISWC Developers Workshop2014 co-located with the 13th International Semantic WebConference (ISWC 2014) Riva del Garda Italy October 192014 13ndash18

[Okolloh 2009] Okolloh O 2009 Ushahidi or ldquotestimonyrdquoWeb 20 Tools for Crowdsourcing Crisis Information Par-ticipatory learning and action 59(1)65ndash70

[Ortmann et al 2011] Ortmann J Limbu M Wang Dand Kauppinen T 2011 Crowdsourcing Linked OpenData for Disaster Management In Gruumltter R Kolas DKoubarakis M and Pfoser D eds Proceedings of theTerra Cognita Workshop on Foundations Technologies andApplications of the Geospatial Web In conjunction with theInternational Semantic Web Conference (ISWC2011) vol-ume 798 11ndash22 Bonn Germany CEUR Workshop Pro-ceedings

[Sakaki Okazaki and Matsuo 2010] Sakaki T OkazakiM and Matsuo Y 2010 Earthquake Shakes Twitter UsersReal-time Event Detection by Social Sensors In Proceed-ings of the 19th International Conference on World WideWeb WWW rsquo10 851ndash860 New York NY USA ACM

[Steiner et al 2013] Steiner T Verborgh R Gabarro JMannens E and Van de Walle R 2013 Clustering Me-dia Items Stemming from Multiple Social Networks TheComputer Journal

[Steiner 2014a] Steiner T 2014a Bots vs WikipediansAnons vs Logged-Ins (Redux) A Global Study of Edit Ac-tivity on Wikipedia and Wikidata In Proceedings of TheInternational Symposium on Open Collaboration OpenSymrsquo14 251ndash257 New York NY USA ACM

[Steiner 2014b] Steiner T 2014b Comprehensive Wiki-pedia monitoring for global and realtime natural disaster de-tection In Proceedings of the ISWC Developers Workshop2014 co-located with the 13th International Semantic WebConference (ISWC 2014) Riva del Garda Italy October 192014 86ndash95

[Steiner 2014c] Steiner T 2014c Enriching UnstructuredMedia Content About Events to Enable Semi-AutomatedSummaries Compilations and Improved Search by Lever-aging Social Networks PhD Dissertation UniversitatPolitegravecnica de Catalunya

[Verborgh et al 2014] Verborgh R Hartig O De MeesterB Haesendonck G De Vocht L Vander Sande M Cy-ganiak R Colpaert P Mannens E and Van de Walle R2014 Querying datasets on the web with high availability InMika P Tudorache T Bernstein A Welty C KnoblockC Vrandecic D Groth P Noy N Janowicz K andGoble C eds The Semantic Web ndash ISWC 2014 volume8796 of Lecture Notes in Computer Science Springer Inter-national Publishing 180ndash196

[Westfall 2010] Westfall J 2010 Common Alerting Pro-tocol Version 12 Standard OASIS httpdocsoasis-

openorgemergencycapv12CAP-v12-osdoc

  • 1 Introduction
    • 11 Disaster Monitoring A Global Challenge
    • 12 Hypotheses and Research Questions
      • 2 Related Work and Enabling Technologies
        • 21 Disaster Detection
        • 22 The Common Alerting Protocol
        • 23 Linked Data and Linked Data Principles
        • 24 Linked Data Fragments
          • 3 Proposed Methodology
            • 31 Leveraging Wikipedia Link Structure
            • 32 Identification of Wikipedia Articles for Monitoring
            • 33 Monitoring Process
            • 34 Multimedia Content from Online Social Networking Sites
            • 35 Linked Data Publication
            • 36 Implementation Details
              • 4 Proposed Steps Toward an Evaluation
              • 5 Conclusions and Future Work
Page 7: Disaster Monitoring with Wikipedia

ltprotectvrule width0ptprotecthrefhttpexorgdisasterenHurricane_GonzalohttpexorgdisasterenHurricane_Gonzalogt owlsameAs

httpenwikipediaorgwikiHurricane_Gonzalo

httplivedbpediaorgpageHurricane_Gonzalo

httpwwwfreebasecomm0123kcg5

exrelatedMediaItems _video1

exrelatedMediaItems _photo1_video1 exmediaUrl httpsmtccdnvinecorvideos82796227091134303173323251712_2ca88ba54445116698738182474199804mp4

exmicropostUrl httptwittercomgpessoaostatus527603540860997632

exposterUrl httpsvcdnvinecorthumbs231E0009CF1134303174572797952_25116698738182474199804mp4jpg

expublicationDate 2014-10-30T031501Z

exsocialInteractions [ exlikes 1 exshares 0 ]

extimestamp 1414638901000

extype video

exuserProfileUrl httptwittercomalejandroriano

exmicropost [

exhtml Herersquos Hurricane Gonzalo as seen from the Space_Station as it orbited above today httpstcoRpJt0P2bXa

explainText Herersquos Hurricane Gonzalo as seen from the Space_Station as it orbited above today ]_photo1 exmediaUrl httpsuploadwikimediaorgwikipediacommonsbbbSchiffsanleger_Wittenbergen_-_Orkan_Gonzalojpg

exmicropostUrl httpscommonswikimediaorgwikiFileSchiffsanleger_Wittenbergen_-_Orkan_Gonzalo_(22102014)_01jpg

exposterUrl httpsuploadwikimediaorgwikipediacommonsthumbbbbSchiffsanleger_Wittenbergen_-_Orkan_Gonzalo_

282210201429_01jpg500px-Schiffsanleger_Wittenbergen_-_Orkan_Gonzalo_(22102014)_01jpg

expublicationDate 2014-10-24T084016Z

exsocialInteractions [ exshares 0 ]

extimestamp 1414140016000

extype photo

exuserProfileUrl httpscommonswikimediaorgwikiUserHuhu Uet

exmicropost [

exhtml Schiffsanleger Wittenbergen - Orkan Gonzalo (22102014) 01

explainText Schiffsanleger Wittenbergen - Orkan Gonzalo (22102014) 01 ]

Listing 2 Exemplary Linked Data for Hurricane Gonzalo using a yet to-be-defined vocabulary (potentially HXL httphxl

humanitarianresponseinfonsindexhtml or MOAC httpobservedchangecommoacns) that interlinks the disaster withseveral other Linked Data sources and relates it to multimedia content on online social networking sites

4 Proposed Steps Toward an EvaluationWe recall our core research questions that were Q1 Howtimely and accurate for the purpose of disaster detection andongoing monitoring is content from Wikipedia compared toauthoritative sources mentioned above and Q2 Does thedisambiguated nature of Wikipedia surpass keyword-baseddisaster detection approaches eg via online social net-working sites or search logs Regarding Q1 only a manualcomparison covering several months worth of disaster dataof the relevant authoritative data sources mentioned in Sec-tion 11 with the output of our system can help respond to thequestion Regarding Q2 we propose an evaluation strategyfor the OSN site Twitter loosely inspired by the approachof Sakaki et al in (Sakaki Okazaki and Matsuo 2010) Wechoose Twitter as a data source due to the publicly avail-able user data through its streaming APIs34 which wouldbe considerably harder if not impossible with other OSNsor search logs due to privacy concerns and API limitationsBased on the articles in the ldquomonitoring listrdquo we put forwardusing article titles as search terms but without disambigua-tion hints in parentheses eg instead of the complete articletitle ldquoTyphoon Rammasun (2014)rdquo we suggest using ldquoTy-phoon Rammasunrdquo alone We advise monitoring the sample

34Twitter streaming APIs httpsdevtwittercomdocs

streaming-apisstreamspublic

stream35 for the appearance of any of the search terms asthe filtered stream36 is too limited regarding the number ofsupported search terms In order to avoid ambiguity issueswith the international multi-language tweet stream we rec-ommend matching search terms only if the Twitter-detectedtweet language equals the search termrsquos language eg En-glish as in ldquoTyphoon Rammasunrdquo

5 Conclusions and Future WorkIn this paper we have presented the first steps of our ongoingresearch on the creation of a Wikipedia-based disaster mon-itoring system In particular we finished its underlying codescaffolding and connected the system to several online so-cial networking sites allowing for the automatic generationof media galleries Further we propose to publish data aboutdetected and monitored disasters as live queryable LinkedData which can be made accessible in a scalable and adhoc manner using triple pattern fragments (Verborgh et al2014) by leveraging free cloud hosting offers (Matteis andVerborgh 2014) While the system itself already functionsa good chunk of work still lies ahead with the fine-tuning of

35Twitter sample stream httpsdevtwittercomdocsapi

11getstatusessample36Twitter filtered stream httpsdevtwittercomdocsapi

11poststatusesfilter

its parameters A first examples are the exponential smooth-ing parameters of the revision intervals responsible for de-termining whether an article is spiking and thus a potentialnew disaster or not A second example is the role that dis-asters play with articles they can be inbound outbound ormutual links and their importance for actual occurrences ofdisasters will vary Future work will mainly focus on findinganswers to our research questions Q1 and Q2 and the verifi-cation of the hypotheses H1ndashH3 We will focus on the eval-uation of the systemrsquos usefulness accuracy and timelinessin comparison to other keyword-based approaches An inter-esting aspect of our work is that the monitoring system is notlimited to disasters Using an analogous approach we canmonitor for human-made disasters (called ldquoAnthropogenichazardrdquo on Wikipedia) like terrorism war power outagesair disasters etc We have created an exemplary ldquomonitor-ing listrdquo and made it available37

Concluding we are excited about this research and lookforward to putting the final system into operational practicein the weeks and months to come Be safe

References[Abel et al 2012] Abel F Hauff C Houben G-JStronkman R and Tao K 2012 Twitcident Fighting Firewith Information from Social Web Streams In Proceedingsof the 21st International Conference Companion on WorldWide Web WWW rsquo12 Companion 305ndash308 New YorkNY USA ACM

[Bao et al 2012] Bao P Hecht B Carton S Quaderi MHorn M and Gergle D 2012 Omnipedia Bridging thewikipedia language gap In Proceedings of the SIGCHI Con-ference on Human Factors in Computing Systems CHI rsquo121075ndash1084 New York NY USA ACM

[Berners-Lee Fielding and Masinter 2005] Berners-Lee TFielding R T and Masinter L 2005 Uniform ResourceIdentifier (URI) Generic Syntax RFC 3986 IETF

[Berners-Lee 2006] Berners-Lee T 2006 Linked Datahttpwwww3orgDesignIssuesLinkedDatahtml

[Falor 2011] Falor R 2011 Search data re-veals people turn to the internet in criseshttpbloggoogleorg201108search-data-reveals-people-turn-tohtml

[Fielding et al 1999] Fielding R Gettys J Mogul JFrystyk H Masinter L Leach P and Berners-Lee T1999 Hypertext Transfer Protocol ndash HTTP11 RFC 2616IETF

[Goodchild and Glennon 2010] Goodchild M F and Glen-non J A 2010 Crowdsourcing Geographic Informationfor Disaster Response A Research Frontier InternationalJournal of Digital Earth 3(3)231ndash241

[Heath and Bizer 2011] Heath T and Bizer C 2011Linked Data Evolving the Web into a Global Data Space

37Anthropogenic hazard ldquomonitoring listrdquo httpsgithub

comtomayacpostdocblobmasterpaperscomprehensive-

wikipedia-monitoring-for-global-and-realtime-disaster-

detectiondatamonitoring-list-anthropogenic-hazardjson

Synthesis Lectures on the Semantic Web Theory and Tech-nology Morgan amp Claypool

[Klyne and Carroll 2004] Klyne G and Carroll J J 2004Resource Description Framework (RDF) Concepts and Ab-stract Syntax Recommendation W3C

[Laframboise and Loko 2012] Laframboise N and LokoB 2012 Natural Disasters Mitigating Impact Man-aging Risks IMF Working Paper International Mon-etary Fund httpwwwimforgexternalpubsftwp

2012wp12245pdf[Lin and Mishne 2012] Lin J and Mishne G 2012 AStudy of ldquoChurnrdquo in Tweets and Real-Time Search Queries(Extended Version) CoRR abs12056855

[Liu et al 2008] Liu S B Palen L Sutton J HughesA L and Vieweg S 2008 In Search of the Bigger Pic-ture The Emergent Role of On-line Photo Sharing in Timesof Disaster In Proceedings of the Information Systems forCrisis Response and Management Conference (ISCRAM)

[Matteis and Verborgh 2014] Matteis L and Verborgh R2014 Hosting queryable and highly available Linked Datafor free In Proceedings of the ISWC Developers Workshop2014 co-located with the 13th International Semantic WebConference (ISWC 2014) Riva del Garda Italy October 192014 13ndash18

[Okolloh 2009] Okolloh O 2009 Ushahidi or ldquotestimonyrdquoWeb 20 Tools for Crowdsourcing Crisis Information Par-ticipatory learning and action 59(1)65ndash70

[Ortmann et al 2011] Ortmann J Limbu M Wang Dand Kauppinen T 2011 Crowdsourcing Linked OpenData for Disaster Management In Gruumltter R Kolas DKoubarakis M and Pfoser D eds Proceedings of theTerra Cognita Workshop on Foundations Technologies andApplications of the Geospatial Web In conjunction with theInternational Semantic Web Conference (ISWC2011) vol-ume 798 11ndash22 Bonn Germany CEUR Workshop Pro-ceedings

[Sakaki Okazaki and Matsuo 2010] Sakaki T OkazakiM and Matsuo Y 2010 Earthquake Shakes Twitter UsersReal-time Event Detection by Social Sensors In Proceed-ings of the 19th International Conference on World WideWeb WWW rsquo10 851ndash860 New York NY USA ACM

[Steiner et al 2013] Steiner T Verborgh R Gabarro JMannens E and Van de Walle R 2013 Clustering Me-dia Items Stemming from Multiple Social Networks TheComputer Journal

[Steiner 2014a] Steiner T 2014a Bots vs WikipediansAnons vs Logged-Ins (Redux) A Global Study of Edit Ac-tivity on Wikipedia and Wikidata In Proceedings of TheInternational Symposium on Open Collaboration OpenSymrsquo14 251ndash257 New York NY USA ACM

[Steiner 2014b] Steiner T 2014b Comprehensive Wiki-pedia monitoring for global and realtime natural disaster de-tection In Proceedings of the ISWC Developers Workshop2014 co-located with the 13th International Semantic WebConference (ISWC 2014) Riva del Garda Italy October 192014 86ndash95

[Steiner 2014c] Steiner T 2014c Enriching UnstructuredMedia Content About Events to Enable Semi-AutomatedSummaries Compilations and Improved Search by Lever-aging Social Networks PhD Dissertation UniversitatPolitegravecnica de Catalunya

[Verborgh et al 2014] Verborgh R Hartig O De MeesterB Haesendonck G De Vocht L Vander Sande M Cy-ganiak R Colpaert P Mannens E and Van de Walle R2014 Querying datasets on the web with high availability InMika P Tudorache T Bernstein A Welty C KnoblockC Vrandecic D Groth P Noy N Janowicz K andGoble C eds The Semantic Web ndash ISWC 2014 volume8796 of Lecture Notes in Computer Science Springer Inter-national Publishing 180ndash196

[Westfall 2010] Westfall J 2010 Common Alerting Pro-tocol Version 12 Standard OASIS httpdocsoasis-

openorgemergencycapv12CAP-v12-osdoc

  • 1 Introduction
    • 11 Disaster Monitoring A Global Challenge
    • 12 Hypotheses and Research Questions
      • 2 Related Work and Enabling Technologies
        • 21 Disaster Detection
        • 22 The Common Alerting Protocol
        • 23 Linked Data and Linked Data Principles
        • 24 Linked Data Fragments
          • 3 Proposed Methodology
            • 31 Leveraging Wikipedia Link Structure
            • 32 Identification of Wikipedia Articles for Monitoring
            • 33 Monitoring Process
            • 34 Multimedia Content from Online Social Networking Sites
            • 35 Linked Data Publication
            • 36 Implementation Details
              • 4 Proposed Steps Toward an Evaluation
              • 5 Conclusions and Future Work
Page 8: Disaster Monitoring with Wikipedia

its parameters A first examples are the exponential smooth-ing parameters of the revision intervals responsible for de-termining whether an article is spiking and thus a potentialnew disaster or not A second example is the role that dis-asters play with articles they can be inbound outbound ormutual links and their importance for actual occurrences ofdisasters will vary Future work will mainly focus on findinganswers to our research questions Q1 and Q2 and the verifi-cation of the hypotheses H1ndashH3 We will focus on the eval-uation of the systemrsquos usefulness accuracy and timelinessin comparison to other keyword-based approaches An inter-esting aspect of our work is that the monitoring system is notlimited to disasters Using an analogous approach we canmonitor for human-made disasters (called ldquoAnthropogenichazardrdquo on Wikipedia) like terrorism war power outagesair disasters etc We have created an exemplary ldquomonitor-ing listrdquo and made it available37

Concluding we are excited about this research and lookforward to putting the final system into operational practicein the weeks and months to come Be safe

References[Abel et al 2012] Abel F Hauff C Houben G-JStronkman R and Tao K 2012 Twitcident Fighting Firewith Information from Social Web Streams In Proceedingsof the 21st International Conference Companion on WorldWide Web WWW rsquo12 Companion 305ndash308 New YorkNY USA ACM

[Bao et al 2012] Bao P Hecht B Carton S Quaderi MHorn M and Gergle D 2012 Omnipedia Bridging thewikipedia language gap In Proceedings of the SIGCHI Con-ference on Human Factors in Computing Systems CHI rsquo121075ndash1084 New York NY USA ACM

[Berners-Lee Fielding and Masinter 2005] Berners-Lee TFielding R T and Masinter L 2005 Uniform ResourceIdentifier (URI) Generic Syntax RFC 3986 IETF

[Berners-Lee 2006] Berners-Lee T 2006 Linked Datahttpwwww3orgDesignIssuesLinkedDatahtml

[Falor 2011] Falor R 2011 Search data re-veals people turn to the internet in criseshttpbloggoogleorg201108search-data-reveals-people-turn-tohtml

[Fielding et al 1999] Fielding R Gettys J Mogul JFrystyk H Masinter L Leach P and Berners-Lee T1999 Hypertext Transfer Protocol ndash HTTP11 RFC 2616IETF

[Goodchild and Glennon 2010] Goodchild M F and Glen-non J A 2010 Crowdsourcing Geographic Informationfor Disaster Response A Research Frontier InternationalJournal of Digital Earth 3(3)231ndash241

[Heath and Bizer 2011] Heath T and Bizer C 2011Linked Data Evolving the Web into a Global Data Space

37Anthropogenic hazard ldquomonitoring listrdquo httpsgithub

comtomayacpostdocblobmasterpaperscomprehensive-

wikipedia-monitoring-for-global-and-realtime-disaster-

detectiondatamonitoring-list-anthropogenic-hazardjson

Synthesis Lectures on the Semantic Web Theory and Tech-nology Morgan amp Claypool

[Klyne and Carroll 2004] Klyne G and Carroll J J 2004Resource Description Framework (RDF) Concepts and Ab-stract Syntax Recommendation W3C

[Laframboise and Loko 2012] Laframboise N and LokoB 2012 Natural Disasters Mitigating Impact Man-aging Risks IMF Working Paper International Mon-etary Fund httpwwwimforgexternalpubsftwp

2012wp12245pdf[Lin and Mishne 2012] Lin J and Mishne G 2012 AStudy of ldquoChurnrdquo in Tweets and Real-Time Search Queries(Extended Version) CoRR abs12056855

[Liu et al 2008] Liu S B Palen L Sutton J HughesA L and Vieweg S 2008 In Search of the Bigger Pic-ture The Emergent Role of On-line Photo Sharing in Timesof Disaster In Proceedings of the Information Systems forCrisis Response and Management Conference (ISCRAM)

[Matteis and Verborgh 2014] Matteis L and Verborgh R2014 Hosting queryable and highly available Linked Datafor free In Proceedings of the ISWC Developers Workshop2014 co-located with the 13th International Semantic WebConference (ISWC 2014) Riva del Garda Italy October 192014 13ndash18

[Okolloh 2009] Okolloh O 2009 Ushahidi or ldquotestimonyrdquoWeb 20 Tools for Crowdsourcing Crisis Information Par-ticipatory learning and action 59(1)65ndash70

[Ortmann et al 2011] Ortmann J Limbu M Wang Dand Kauppinen T 2011 Crowdsourcing Linked OpenData for Disaster Management In Gruumltter R Kolas DKoubarakis M and Pfoser D eds Proceedings of theTerra Cognita Workshop on Foundations Technologies andApplications of the Geospatial Web In conjunction with theInternational Semantic Web Conference (ISWC2011) vol-ume 798 11ndash22 Bonn Germany CEUR Workshop Pro-ceedings

[Sakaki Okazaki and Matsuo 2010] Sakaki T OkazakiM and Matsuo Y 2010 Earthquake Shakes Twitter UsersReal-time Event Detection by Social Sensors In Proceed-ings of the 19th International Conference on World WideWeb WWW rsquo10 851ndash860 New York NY USA ACM

[Steiner et al 2013] Steiner T Verborgh R Gabarro JMannens E and Van de Walle R 2013 Clustering Me-dia Items Stemming from Multiple Social Networks TheComputer Journal

[Steiner 2014a] Steiner T 2014a Bots vs WikipediansAnons vs Logged-Ins (Redux) A Global Study of Edit Ac-tivity on Wikipedia and Wikidata In Proceedings of TheInternational Symposium on Open Collaboration OpenSymrsquo14 251ndash257 New York NY USA ACM

[Steiner 2014b] Steiner T 2014b Comprehensive Wiki-pedia monitoring for global and realtime natural disaster de-tection In Proceedings of the ISWC Developers Workshop2014 co-located with the 13th International Semantic WebConference (ISWC 2014) Riva del Garda Italy October 192014 86ndash95

[Steiner 2014c] Steiner T 2014c Enriching UnstructuredMedia Content About Events to Enable Semi-AutomatedSummaries Compilations and Improved Search by Lever-aging Social Networks PhD Dissertation UniversitatPolitegravecnica de Catalunya

[Verborgh et al 2014] Verborgh R Hartig O De MeesterB Haesendonck G De Vocht L Vander Sande M Cy-ganiak R Colpaert P Mannens E and Van de Walle R2014 Querying datasets on the web with high availability InMika P Tudorache T Bernstein A Welty C KnoblockC Vrandecic D Groth P Noy N Janowicz K andGoble C eds The Semantic Web ndash ISWC 2014 volume8796 of Lecture Notes in Computer Science Springer Inter-national Publishing 180ndash196

[Westfall 2010] Westfall J 2010 Common Alerting Pro-tocol Version 12 Standard OASIS httpdocsoasis-

openorgemergencycapv12CAP-v12-osdoc

  • 1 Introduction
    • 11 Disaster Monitoring A Global Challenge
    • 12 Hypotheses and Research Questions
      • 2 Related Work and Enabling Technologies
        • 21 Disaster Detection
        • 22 The Common Alerting Protocol
        • 23 Linked Data and Linked Data Principles
        • 24 Linked Data Fragments
          • 3 Proposed Methodology
            • 31 Leveraging Wikipedia Link Structure
            • 32 Identification of Wikipedia Articles for Monitoring
            • 33 Monitoring Process
            • 34 Multimedia Content from Online Social Networking Sites
            • 35 Linked Data Publication
            • 36 Implementation Details
              • 4 Proposed Steps Toward an Evaluation
              • 5 Conclusions and Future Work
Page 9: Disaster Monitoring with Wikipedia

[Steiner 2014c] Steiner T 2014c Enriching UnstructuredMedia Content About Events to Enable Semi-AutomatedSummaries Compilations and Improved Search by Lever-aging Social Networks PhD Dissertation UniversitatPolitegravecnica de Catalunya

[Verborgh et al 2014] Verborgh R Hartig O De MeesterB Haesendonck G De Vocht L Vander Sande M Cy-ganiak R Colpaert P Mannens E and Van de Walle R2014 Querying datasets on the web with high availability InMika P Tudorache T Bernstein A Welty C KnoblockC Vrandecic D Groth P Noy N Janowicz K andGoble C eds The Semantic Web ndash ISWC 2014 volume8796 of Lecture Notes in Computer Science Springer Inter-national Publishing 180ndash196

[Westfall 2010] Westfall J 2010 Common Alerting Pro-tocol Version 12 Standard OASIS httpdocsoasis-

openorgemergencycapv12CAP-v12-osdoc

  • 1 Introduction
    • 11 Disaster Monitoring A Global Challenge
    • 12 Hypotheses and Research Questions
      • 2 Related Work and Enabling Technologies
        • 21 Disaster Detection
        • 22 The Common Alerting Protocol
        • 23 Linked Data and Linked Data Principles
        • 24 Linked Data Fragments
          • 3 Proposed Methodology
            • 31 Leveraging Wikipedia Link Structure
            • 32 Identification of Wikipedia Articles for Monitoring
            • 33 Monitoring Process
            • 34 Multimedia Content from Online Social Networking Sites
            • 35 Linked Data Publication
            • 36 Implementation Details
              • 4 Proposed Steps Toward an Evaluation
              • 5 Conclusions and Future Work