on-demand information portals for disaster situations - university of

On-Demand Information Portals for DisasterSituations

Yiming Ma, Dmitri V. Kalashnikov, Ram Hariharan, Sharad Mehrotra, Nalini Venkatasubramanian,Naveen Ashish, Jay Lickfett

Dept. of Information and Computer ScienceUniversity of California, Irvine, USA

Abstract—This paper describes our work on developing tech-nology for rapidly assembling information portals that provideintegrated access to and analysis of information from multiplesources in the case of any disaster. Many recent disasters (theS.E. Asian Tsunamis, the London subway bombings, the Katrinahurricane, to name a few) have demonstrated that a lot ofvaluable information becomes available in the hours and daysimmediately following the disaster, and such information is indeedvaluable to disaster managers or even citizens in their response.In this paper we describe our work on developing informationportals for disasters in general; we describe many key infor-mation processing capabilities and challenges that we considerimportant in such portals and also describe our approach todeveloping such capabilities.

I. I NTRODUCTION

A deluge of information, potentially very useful in analysisand response, becomes available in online and other sourcesin the hours and days immediately following practicallyany major disaster. The information may be in online newsreports, community blogs, specialized web-sites put up forvarious tasks such as locating resources, disseminating currentsituation reports, and re-establishing communications. Suchservices and the information they provide can be critical todifferent communities, but can also be inefficient and incon-sistent during large-scaled disaster situations. For instance,during the first 3 days of the Tsunami disaster in South-east Asia, more than 4,000 reports were gathered by theGoogle news service. Within a month of the incident, we havecollected more than 200,000 distinct news reports from severalonline sources. The content obtained from online sources wasdiverse and heterogeneous in nature with text, multi-media(e.g., images, video), GIS information (e.g., satellite imagery,maps). The availability of all such information (scatteredoriginally at multiple places) combined with other (possiblyprior) information such as area satellite maps and data such assensor data, audio-visual footage related to the disaster etc., atone place in an integrated fashion can prove to be very usefulto disaster managers or analysts and even ordinary citizens inthe response and planning during or after the disaster. Thismotivates us in the direction of an “Information Portal” whichis a web-based, easy to access one-stop point for retrieval andanalysis of all such disaster related information.

In this paper we describe our concept of and work inprogress on developing such information portals. This researchwas sponsored by NSF Award number 0331707 and 0331690

Fig. 1. System Architecture of PSAP

to the RESCUE project. We are interested in generic capabil-ities that prove to be useful across a variety of disasters andinstances. Also, to be of practical use, it is important that aportal for any disaster can be assembled relatively quickly soas to assist in timely decisions. The key capabilities that weexpect of such a portal include providing for identifying andobtaining information from multiple online sources, effectivebrowsing, search and querying of the integrated information,and information visualization. Ideally, the Information Portalwill activate at the onset of the disaster and assemble in an“on-demand” fashion. In fact we have implemented many ofthe concepts in two specific applications; one is an informationportal for the S.E.Asian Tsunami disaster and the other is anemergency information portal, under development, for the Cityof Ontario.

II. RELATED WORK

There are several portals and sites currently deployed whichare similar in spirit to what we want to achieve. The maindifferences of these portals from ours are at three areas: 1)the information sources, 2) the target users, and 3) the analysistasks. Our portal mainly deals with information streamed fromthe Web sources. We aim to help individuals (citizens orfirst responders) to quickly gather personalized informationthrough various search and visualization interfaces. Our usersuse this information to accomplish their individual tasks.Therefore, our portal system potentially deals with largerand more diverse information sources, and tries to enable

individualized and “deeper” analysis. In the next section, basedon our research in the disaster response areas, we summarizevarious research components that achieve our goals.

The portal Web site developed by the homeland securitydepartment [3] that helps different agencies and communitiesto gain situational awareness and locating resources during andafter disasters (https://www.disasterhelp.gov/). The informa-tion sources are mainly from different government agencies’situational reports (much smaller than our domain). The mainpurpose is to help coordinating rescue operations, and isnot for analysis tasks. There are also other government Websites with similar functionalities (e.g., Katrina InformationPortal [6] (http://katrina.louisiana.gov/).

AlertEarth [1] (http://www.alertearth.org/) is another portalsystem that monitors different disasters and disseminates theinformation to different subscribers. The information sourcesare not the general Web source. It focus on a few well-established disaster centers. Therefore, it cannot effectivelyadapt to the potentially important sources during the disastersituations.

Other relevant work on building our portal system will bereviewed in the next section.

III. SYSTEM ARCHITECTURE

In this section we describe the key technical componentsof such an information portal and how to connect themas an integrated system. Figure 1 illustrates the modulesand interactions between them. PSAP (Personalized SituationAwareness Portal) employs a standard client/server architec-ture and information access is provided over a convenientweb-based access interface. As described below, a varietyof data management and processing technologies such ascrawling, indexing, extraction, integration, visualization, andinformation filtering come into play.

Data Collection and Access:This component col-lects and stores information from multiple online informationsources (news reports, blogs, satellite images, governmentdatabases, etc.) and also provides real-time access to sourceswhere information needs to be accessed real-time (for instanceweather sources). This component provides capabilities fordynamically registering new information sources, configurablecrawling of relevant online sources and automated download ofinformation from various sites. The data collected, primarilyonline documents and other multi-media data are stored inan IBM DB2 database (Raw Reports in Figure 1). Metainformation such as the source identification and time-stampsare stored as well. Multimedia data downloaded and storedin the DB2 database includes data such as maps or satelliteimagery from organizations such as the USGS, audio-visualfootage captured related to the disaster etc. Some of this datacan be downloaded and stored in advance; for instance it isexpected that maps of the an area are invariably useful and wemay store the maps in advance of the disaster. Other on-lineinformation needs to be identified, crawled and obtained ondemand as described above.

Topic Modeling for Browse and Search:Throughautomatic topic modeling tools (e.g., Figure 3), we can quicklyorganize data by its topic. We have implemented the ap-proaches for topic modeling proposed in [9]. In the currentimplementation, topics are formed by features such as sets ofkeywords. Keyword based topics can be used to facilitate atopic-driven browse and search over the document collection.For instance, instead of using keywords to match documents,we can use topics to enhance the search performance. In Fig-ure 3, we show an example of search on the topics relatedto keyword “earthquake”. Some topics containing earthquakeare also related to keywords “sumatra, missing”. If the user islooking for the earthquake at Sumatra island, the documentsin the topics will be directly retrieved. As such, the user canutilize the topic search to effectively enhance his/her search.

Given the large amount of documents that constantly streamin, valid and up-to-date topics can enable an analyst gaininsights at a semantically enriched level. Through an interac-tive process, an analyst can further examine these topics andintegrate these topics to an existing knowledge base.

Information Extraction: To be able to capture knowl-edge and concepts embedded in the collected documents, topicmodeling alone is not sufficient. For instance, deeper semanticconcepts like casualty and aid situations at specific placesand specific times cannot be efficiently represented by thekeyword based topics. It requires deeper information extractiontechniques. However different extraction tasks can be of vary-ing complexity, and require us to build customized extractionsystems. We have been using off-the-shelf systems for textextraction such as GATE [2] which come with capabilities fortasks such as named entity extraction and are further extensiblefor developing more sophisticated extraction applications. Ex-traction beyond that of named entities or tokens includes taskssuch asrelation or event extraction from text, where besidesextracting useful tokens or entities we are also interested inextracting the relationships between sets of such entities. In anews story, an example of a (reported) event is say a mentionof a relief ship having arrived at some place with certainsupplies. We are interested in extracting the significant detailsabout this event, for instance the particular ship referred to,the supplies, the location it arrived at, and the date and timeit arrived. The extraction of such details is referred to as”slot-filling”. We are currently developing a next-generationinformation extraction system that exploits available lowerlevel text extractors such as GATE for identifying tokens orentities, and provides further deductive rule-based capabilitieson top for slot-filling required for relation or event extraction.Initial results on the efficacy of this system for extractingevents from a corpus of the S.E. Asian Tsunami related newsstories are presented in a separate paper currently under review.

Modeling uncertain information: Informa-tion/knowledge extracted from the raw reports might not bealways precise due to incomplete knowledge representationsor extraction errors. We need to model the uncertaintyinvolved in such information, and efficiently support analysisqueries. We adopt the probabilistic modeling framework

Fig. 2. Main Interface Fig. 3. Topic Search Interface Fig. 4. Ranked Documents Fig. 5. GIS Enabled Video Display

proposed in [5] to capture the uncertainties in the spatialinformation. We believe our techniques can be extended toother non-spatial domains as well. As an illustrative example,consider an information portal for analysis of the responseto the September 11, 2001 WTC attacks. The following areexcerpts from two real reports1 filed by Port Authority PoliceDepartment(PAPD) Officers:

1) “ . . . the PAPD Mobile Command Post was located onWest St. north of WTC and there was equipment beingstaged there . . .”

2) “ . . . a PAPD Command Truck parked on the west side ofBroadway St. and north of Vesey St. . . .”

These two reports refer to the same location, i.e. the samecommand post – a point-location in the New York, Manhattanarea. However, neither of the reports specify the exact locationof the events; they do not even mention the same street names.Our objective is to represent and index such reports in amanner that enables efficient evaluation of spatial queries andsubsequent analysis using the spatial data. Obviously, merelystoring location in the database as free text is not sufficienteither to answer spatial queries or to disambiguate reportsbased on spatial locations. In the above example, spatial querysuch as ‘retrieve events near WTC’, based on keywords alone,can only retrieve the first report mentioned earlier.

The probabilistic modeling framework in [5] defines thesemantics of the uncertain spatial descriptions (e.g., Near,Around, and etc.). We project the spatial properties of the eventdescribed in the report onto the 2-dimensional domainΩ andanswer queries within this domain. We model uncertain eventlocations as random variables that have certain probabilitydensity functions (pdfs) associated with them. Assisted by GISand probabilistic modeling tools, we map uncertain textuallocations into the corresponding pdfs defined inΩ.

Based on the probabilistic semantics, we can answer spatialqueries – such as range, nearest neighbor, spatial join queries –more accurately. For instance, the representation must enableus to retrieve events in a given geographical region (e.g.,around World Trade Center). Likewise, it should enable usto determine similarity between reports based on their spatialproperties; e.g., we should be able to determine that theabove events might refer to the same location (assuming atemporal correlation of the events). Given that a large number

1original audio data available in converted text form

1000 800 600 400 200 0

x(meter)

1000 800

600 400

200y(meter)

-0.01-0.005

0 0.005

0.01 0.015

0.02 0.025

0.03 0.035

Histogram pdf

(a) 2 Models:Near(Bld1), Near(Bld2)

1000 800 600 400 200 0

x(meter)

1000 800

600 400

200y(meter)

-0.01 0

0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08

Histogram pdf

(b) Combined:Near(Bld1)

∧Near(Bld2)

Fig. 6. Probabilistic Spatial Models

(a) Heat Map Display (b) Zoomed in Display

Fig. 7. Visualize Spatial Models

of spatially uncertain events can potentially arise during crisissituations2, scalable algorithms proposed in [4] can be usedto efficiently process the analytical queries.

We must mention that such a spatial-reasoning capability isalso under development for the afore mentioned InformationPortal for the City of Ontario. Our collaborators at the OntarioFire Department have expressed the need for a capabilitywhere location/spatial references can be extracted from the(transcribed) audio communications between fire departmentresponders and plotted on maps of the area in question; we areworking on developing and incorporating such a capability intheir portal.

Data Visualization: We have developed capabilities forusers to perceive spatial, temporal, topical contexts in a visualGIS-based form. Figure 2 illustrate the main interface for theInformation Portal. With this interface, a user is able to zoomin to get more details or zoom out to get a broader view ofthe entire disaster area. The analyst can interactively examinethe returned documents from the map interface based ontheir geographical information stored in the documents. Other

2For instance, more than 1000 such events can be extracted from just 164reports filed by Police Officers who participated in the disaster of September11th, 2001.

materials that match the user interests will also be displayedon the map. For example, user can view GIS tagged videoassociated with an event via appropriate interface extensions(Figure 5). The portal also provides an integrated rankingcomponent to rank the documents based on the similaritymeasures to the user profile (Figure 4).

We have also developed tools to help users visualize theprobabilistic spatial models described earlier (Figure 7(a)). Ifthe location of an event is certain, it is represented as a dot ona GIS map. If it is uncertain, it is represented as a ‘heat-map’,with more intensive colors corresponding to areas with moreprobability for the event to be located in that area (Figure 7(b)).That probability is computed in a principled way, detailedin [5]. For instance, if the text of a report mentions an eventthat has happened “near building1 and building2”, the systemfirst computes the individual pdf models, one forNear(bld1)and the other forNear(bld2) as illustrated in Figure 6(a).That figure shows two peaks, one forNear(bld1) and one forNear(bld2). The areas with higher pdf values correspond tohigher chance of the event to be located in those areas. Next,using these two pdfs, the system computes the overall pdf forthe event to be locatedNear(bld1) and Near(bld2) at thesame time (Figure 6(b)). With the help of this probabilisticvisualization tool, the analyst achieves a more accurate andmeaningful display of the possible location of the events.

Personalized Retrieval Using Profiles:To achievepersonalized information dissemination during disaster situa-tions, a portal system must have flexible and accurate mech-anisms to represent and update the information needs fromdifferent users. In our portal prototype, user characteristicsare captured as auser filters/profiles. When new informationstreaming by, it will be scored by the individual filters. Ifthe relevance score is beyond certain threshold, it will bedisseminated to the user. We can also prioritize these eventsbased on the relevance scores, so the user will receive themwith an order of preference.

Not only the uncertainties can come from different infor-mation source, it can also arise when a user specifies his/herprofiles. How to effectively help the user specify vaguelydefined concepts and update the concepts as more informationavailable is one of the major challenges in building personal-ized information portal. In our previously proposed similarityretrieval and refinement framework [7], [8], we attempt toaddress this challenge from two different aspects: 1) it allowsthe user to express his/her initial profile using similaritysemantics, and 2) it automatically enhance the profile basedon the user feedback.

In the similarity retrieval and refinement framework [7],a similarity filter/profile is represented as a similarity searchcondition, which has a conditional expression, a set of weights,a ranking function with a cut-off value. Conditional expres-sions are basically composed of conditionalpredicates. Thepredicates (basically “atomic” search conditions in a sense)can also be imprecise (for instance“near WTC”). We havethe notion ofsimilarity functionsthat take two data attributevalues and return a similarity score between 0 and 1. We also

havesimilarity predicatesthat take an attribute, a target valueand a similarity function and which returns a similarity scorebetween an attribute value and the target value. Similaritysearch differs from traditional exact-match search in that itgenerates domain- dependent scores that indicate how closelya property of an object (tuple) matches some target valuefrom a query. Aconditional expressionis a DNF (DisjunctiveNormal Form) expression over similarity predicates. Arankingfunctionaggregates the scores from individual similarity pred-icates according to the structure of the DNF search conditionand a corresponding set of weights that indicate the relativeimportance of each similarity predicate. The resulting overallscore represents the degree of match between a data instanceand the filter (profile). The weights are represented in the formof a template of weights that correspond to the structure ofthe conditional expression i.e., weights are associated to eachpredicate in a conjunction (calledpredicate weights) and eachconjunction in the overall disjunction (calledclause weights).

When a user receives disseminated events, he/she couldprovide relevance feedback to the system. There are differentways to capture these feedback as illustrated in [7], [8].Essentially, the feedback corresponds to how well the systemranks the delivered events. Based on the feedback, we can usevarious learning mechanisms to calibrate the filters, as such itwill produce more accurate scores for the future events.

IV. CONCLUSION

This paper has described our work on on-demand informa-tion portals for disaster situations. While including many stan-dard useful capabilities we have and are addressing more chal-lenging tasks in the ares of extracting information, informationanalysis and personalized retrieval, and spatial reasoning.Our collaboration with first responder organizations furtherstrengthens the potential practical validity of the directionsbeing pursued.

REFERENCES

[1] Alertearth (http://www.alertearth.org/).[2] H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. GATE: A

Framework and Graphical Development Environment for Robust NLPTools and Applications. InProceedings of the 40th Anniversary Meetingof the Association for Computational Linguistics (ACL’02), 2002.

[3] H. S. Department. Disaster help (https://www.disasterhelp.gov/).[4] D. V. Kalashnikov, Y. Ma, S. Mehrotra, and R. Hariharan. Index for fast

retrieval of uncertain spatial point data. InProc. of Int’l Symposiumon Advances in Geographic Information Systems (ACM GIS 2006),Arlington, Va, USA, November 10–11 2006.

[5] D. V. Kalashnikov, Y. Ma, S. Mehrotra, and R. Hariharan. Modelingand querying uncertain spatial information for situational awarenessapplications. InProc. of ACM GIS, 2006.

[6] S. O. Louisiana. Katrina information portal (http://katrina.louisiana.gov/).[7] Y. Ma, S. Mehrotra, D. Seid, and Q. Zhong. RAF: An Activation

Framework for Refining Similarity Queries using Learning Techniques.In In Proceedings of the DASFAA. 2006.

[8] Y. Ma, D. Seid, and S. Mehrotra. Interactive Refinement of FilteringQueries on Streaming Intelligence Data. InIEEE-ISI. 2006.

[9] M. Steyvers, P. Smyth, M. Rosen-Zvi, and T. Griffiths. ProbabilisticAuthor-Topic Models for Information Discovery. InACM SIGKDD. 2004.