deliverable d3.2.1 data analysis method description 1 · 2016-09-22 · deliverable d3.2.1 provides...

European Project Nº: FP7-614154 Brazilian Project Nº: CNPq-490084/2013-3

Project Acronym: RESCUER

Project Title: Reliable and Smart Crowdsourcing Solution for Emergency and Crisis Management

Instrument: Collaborative Project

European Call Identifier: FP7-ICT-2013-EU-Brazil Brazilian Call Identifier: MCTI/CNPq 13/2012

Deliverable D3.2.1

Data Analysis Method Description 1

Due date of deliverable: PM12 Actual submission date: January 31, 2015

Start date of the project: October 1, 2013 (Europe) | February 1, 2014 (Brazil) Duration: 30 months Organization name of lead contractor for this deliverable: Universidade de São Paulo (USP)

Dissemination level PU Public PP Restricted to other program participants (including Commission Services) RE Restricted to a group specified by the consortium (including Commission Services) CO Confidential, only for members of the consortium (including Commission Services)

Executive Summary This document presents deliverable D3.2.1 (Data Analysis Method Description 1) of project FP7-614154 | CNPq-490084/2013-3 (RESCUER), a Collaborative Project supported by the European Commission and MCTI/CNPq (Brazil). Full information on this project is available online at http://www.rescuer-project.org.

Deliverable D3.2.1 provides the first results of RESCUER Task 3.2 (Data Analysis) that is concerned with the implementation of a module for semi-automatic analysis of image, video, and text data. The module will receive data previously filtered and aggregated with fusion methods by another module under development in Task 3.1 (Data Fusion and Filtering); it will produce data (labelled images/videos/text) for the Emergency Response Toolkit (ERTK) of Work Package 4. List of Authors

Agma Traina - USP Jose F Rodrigues Jr - USP Willian Oliveira - USP Mirela Cazzolato - USP Letrícia Avalhais - USP Daniel Chino - USP Jonathan Ramos - USP Marcos Bedo - USP Juan Torres - UPM Ana Mejía - UPM Andreas Poxrucker - DFKI Tobias Franke - DFKI

List of Internal Reviewers

Laia Gasparin – VOMATEC Vaninha Vieira – UFBA

2

Contents

1 Introduction ..................................................................................................................................... 4

1.1 Purpose .................................................................................................................................... 4

1.2 Partners’ Roles and Contributions........................................................................................... 5

1.3 Document Overview ................................................................................................................ 5

2 Image Analysis ................................................................................................................................. 6

2.1 Overview .................................................................................................................................. 6

2.2 Pre-Selection and ROI Selection .............................................................................................. 7

2.3 Feature Extraction ................................................................................................................... 8

2.4 ROI/GRID Classification ......................................................................................................... 10

3 Video Analysis ................................................................................................................................ 14

3.1 Goals ...................................................................................................................................... 14

3.2 State of the art ...................................................................................................................... 16

3.3 Proposal ................................................................................................................................. 20

4 Text Analysis .................................................................................................................................. 23

4.1 Goals ...................................................................................................................................... 23

4.2 State of the art ...................................................................................................................... 24

4.3 Characteristics of Incident Reports and Their Content ......................................................... 26

4.4 The RESCUER Text Analysis Module ...................................................................................... 28

4.5 Next steps .............................................................................................................................. 33

5 Integration of Image, Video, and Text solutions ........................................................................... 34

6 Conclusions .................................................................................................................................... 35

References ............................................................................................................................................. 36

Glossary ................................................................................................................................................. 42

3

1 Introduction Disasters in large-scale events and industrial areas can have a great impact on human life,

property and the environment. An adequate response is essential to prevent physical injuries as well as damage to the public image of the involved organizations. As a result, many actions are taken, and periodically reinforced, to ensure effective management of crisis. The use of software systems is among these actions along with trainings, simulation exercises and team actions ruled by a command and control centre. The challenge for the command and control centre is to quickly answer the emergency and ensure the correct decisions. Decisions based on incorrectness or absence of information have the potential to cause even more damage [67].

In order to timely produce accurate decisions, automatic processing of image, video, and text is a desired feature. However, there is a lack of resources for automatic analysis of data collected from different media sources [30], especially in the context of crisis management. The Data Analysis Solution component of the RESCUER project aims at fulfilling this gap.

1.1 Purpose

According to the RESCUER project, the Work Package 3-Data Analysis Solutions has four parts: Task 3.1 Data Fusion and Filtering, Task 3.2 Data Analysis, Task 3.3 Data Integration, and Task 3.4 Data Usage Control. This document concerns Task 3.2, whose goal is to provide multimedia descriptions of crisis situations by means of analytical techniques that deal with image (in charge of USP) aiming at detecting fire and smoke, video (in charge of UPM) also aiming at detecting explosion and crowd, and text (in charge of DFKI). The goal is to monitor fire, smoke, explosion, and crowd as detected from images, videos, and texts in order to assist control centres to identify the places, the time, the victims, and the circumstances of crisis. Such monitoring may improve decision making by augmenting the amount of useful information available for rescuer teams.

More specifically, Task 3.2 has three Deliverables: - D3.2.1 [month 12]: design of the image, video, and text modules; and the definition of how

the filtered data is aggregated; - D3.2.2 [month 20]: second version of the modules, along with the first prototype in

preparation for integration with the RESCUER solution; - D3.2.3 [month 28]: final version of the modules, with a validated prototype prepared for

system integration. Task 3.2, hence, is contextualized in the goals of the RESCUER project concerning automatic data

analysis. The goal of the task is to build a data analysis module that should be able to assist the Emergency Response Toolkit (ERTK) by providing:

1) Monitoring mechanisms, especially designed for emergencies; 2) Significantly improve awareness about the situation to the command and control centre in timely manner, through methods of multimedia data analysis; and 3) Assist in the generation of semi-automatic alerts to the community and the general public.

For this part of the project, it is assumed that images, videos, and texts gathered from the Mobile Crowdsourcing Solution will have been previously filtered and aggregated by the modules of Task 3.1

4

Data Fusion and Filtering. It is also assumed that data will contain geolocalization (tag “where”), timestamp (tag “when”), and authorship (tag “who”). The modules of Task 3.2 shall add an extra tag to the data to describe what is the type of the reported situation; this new information refers to tag “what”, which can have the values: fire, smoke, fire and smoke, explosion, or crowd. The values for tag “what” were chosen according to the possible occurrences initially casted for the situations predicted by the project, that is, massive events, and industrial areas; nevertheless, this set might eventually be expanded to handle other possibilities. The challenge here is to tag the incoming data with the most accurate value for tag “what”.

The result of our processing is to be used by the Emergency Response Toolkit (Work Package 4)

1.2 Partners’ Roles and Contributions

The partners involved in this task are UPM, USP and DFKI. UPM is concerned with video, USP with image, and DFKI with text analysis techniques. As a first step, each partner involved in this task will identify analysis techniques taking into consideration the need for information identified in Task 1.1 (Requirements Engineering) and the type of data to be analysed.

1.3 Document Overview

This document presents the techniques and architecture for data analysis modules, explained in chapters 2, 3, and 4. The remainder of this document is structured as follows:

• Chapter 2 presents the image analysis module; • Chapter 3 discusses the video analysis module; • Chapter 4 is focused on the text analysis module; and • Chapter 5 shows the conclusions of this deliverable.

In each chapter, it is discussed the requisites and implications over the RESCUER project.

5

2 Image Analysis

2.1 Overview

This chapter focuses on providing the architecture for crowdsourcing image data analysis based on colour methods, classification techniques, and similarity-queries. The architecture determines the pipeline of its processing modules, their components and intercommunication, as well as the interaction of the module and the Visualization Mechanisms for Emergency Coordination. Figure 1 shows the Architecture overview and the data flow between the modules: initially, the module receives reports from the filtering module (T3.1); these reports are transferred to the classification module, which proceeds with pre-selection, ROI selection, feature extraction, ROI classification, and reporting. The final product is tag “what” automatically defined. The storage and retrieval module assists the classification module during the whole process because, as explained further, the classification relies on similarity retrieval for instance-based learning. The image data-analysis module, regarding crisis scenarios, aims at helping the decision-making process by summarizing a large amount of image data and quickly classifying those regarding alert situations.

Figure 1: The data-flow in the Image Data Analysis architecture. Each submodule (box in blue) is designed to perform operations that together sum up to the system’s functionalities

In details, the Classification Module is based on the following set of components: • Pre-Selection: the Pre-Selection component filters images only by content; this stage is able

to save processing time, avoiding the analysis of non-related images. • ROI Selection: the ROI Selection is able to identify possible Regions of Interest. This can be

achieved by a Region Growing technique involving the colour spectrum of fire and smoke. • Feature Extraction: this component aims at representing the multimedia data (image or

video) gathering the information summarized regarding a specific criteria, adequate for further analysis or classification.

6

• ROI Classification: the classification implements a set of classifiers in order to have each ROI

classified as the detected incident (e.g., ''fire'', ''smoke'' or ''regular region''). • Report Fusion: this component performs a fusion of the classified ROIs and the original

image.

The following sections present a summarized review of the literature and an initial proposal to provide the services defined for the components of the Classification Module.

2.2 Pre-Selection and ROI Selection

Goals

Since there will be a great amount of image data, it is relevant to analyse only images with information that is actually useful for understanding the incidents. Hence, the pre-selection component will detect regions of the image containing fire and smoke (regions of interest); if no emergency-related information is detected, we can discard the image. Then, using the pre-selection output, the ROI selection component will extract only regions of the images with relevant information.

State of the art

Pre-Selection and ROI Selection will classify regions of the image that correspond to fire and smoke. There are various approaches for these tasks. Yamagishi et al. [89] proposed a fire detection algorithm based on the HSV (hue, saturation, value) colour space and a neural network. An auto adaptive method to detect edges of irregular flames was proposed by Qiu et al. [61] by discovering a continue edge and removing irrelevant borders. A method to detect fire on forests was proposed in Chen et al. [16]. This method segments the image using colour information, being able to detect fire and smoke. It is also possible to segment fire from urban and forest images using a Support Vector Machine (SVM) [19]. In order to detect smoke, an interesting method was introduced by Calderara et al. [12] that uses image energy and colour information. There are other approaches using chromatic information, as extract chromatic features of smoke according to a set of decision rules [14], or extract Local Binary Pattern to detect smoke. Maruta et al. [50] detected smoke by considering the smoke as a fractal, being able to extract smoke regions by calculating the Hurst exponent. Another way to detect smoke on images is by considering not only the colour features, but also by extracting shape features using wavelet transformations [15].

The state-of-the art reveals that colour, more than shape, is the main feature for fire/smoke detection. Therefore, we shall proceed with this same course of action. Furthermore, the review of works also reveals that machine learning plays an important hole in this kind of analysis; similarly, we shall use this kind of technology.

7

Proposal

The pre-selection component will perform a global classification, using histogram colours to rapidly discard non-relevant images. After this, the ROI Selection component will segment regions of interest, which may contain relevant information about fire and smoke. Figure 2 shows an example of the pre-selection module. A burning house image is given as input, the ROI Selection component will detect possible fire and smoke regions of the image and will mark their positions on the image, red to fire and grey to smoke. To detect these events, the module will segment regions using colour spaces (HSV, Lab, or YCbCr) to model fire and smoke pixels. The segmentation can be done by using Region Growing, SuperPixel, Grid division, Thresholding or pixel classification using machine learning techniques, as SVM. A comparative test will evaluate and define the best approach. It is important to note that the objective is to discard true negatives, so the false positives will be discarded on later steps.

Figure 2: Pre-selection component outputs the regions of the image with incident-related information

With the incident-related information regions marked on the image, the ROI selection can extract the ROIs of the image. Figure 3 shows an example. A grid of regions of the image is generated and with the information obtained from the output of the Pre-Selection Module, only regions that intersect with the regions marked as emergency information will be extracted. It is also possible to overlap different regions of interest.

Figure 3: ROI extraction focusing only on the incident regions (fire in this example) of the image

2.3 Feature Extraction

Goals

The feature extraction aims at representing the multimedia data (image or video) in a numerical domain adequate for further analysis or classification. The main challenges are near real-time execution, low memory usage and, for the RESCUER project context, the capability of representing coloured multimedia data, particularly describing fire, smoke and explosion.

8

State of the art

Classifier models are based on a numerical representation achieved through a computational function able to represent the original data (instances) S in an adequate domain F. Therefore, F is a domain of the Feature-Extractor Method. Formally, a Feature-Extractor Method (FEM) can be defined as: given an image domain S and an extractor domain F, a feature extractor method FEM: S F, is a computational function able to represent the original image as its summarization in F. The result element in F is referred to as a Feature Vector (FV).

FEMs play an important role because they represent the instances in a numerical domain with minimal loss of information. In the specific domain of images (or frames of videos) acquired from risk situations, another requirement is the near real-time processing, what demands fast processing.

Particularly, the MPEG-7 standard [74] provides a hierarchical standard for analysis of coloured-images and solid definitions of FEMs and their requirements that prevents architecture dependence. With standardized robust descriptors, MPEG-7 provides a means of navigating through visual content, which are useful for similarity queries and classification tasks. We highlight the low-level descriptors designed by MPEG-7 that are able to capture colour, texture and shape: 1. Edge Histogram: the original Edge Histogram was able to represent local edge distribution

dependent of the image size. To overcome this problem, several variations have been proposed by identifying global and semi-local edges generated from local histogram bins. By using a proper similarity measure, those features are able to represent the absolute location of edges as well as their global comparison, improving the classification hit ratio and allowing the use of this extractor for similarity-queries [58];

2. Scalable Colour: the goal of Scalable Colour is to allow a representation of the colour composition of an image that is scalable in both the number of coefficients it contains and in the number of bits each coefficient is assigned. The extractor usually describes the image as a quantized Hue-Saturation-Value (HSV) colour histogram. Scalable Colour can represent well colour distribution for both classification and similarity-queries [57];

3. Dominant Colour: this extractor is specifically suited for similarity queries due to its fast extraction that consists in describing representative colours and percentages by region. In this approach, a particular distance-function is able to measure the distance between two images based on the quadratic colour histogram measure. The resulting representation can be indexed in a three-dimensional colour space. Previous experiments have demonstrated the gains by orders-of-magnitude of this extractor over the traditional colour histogram [24];

4. Colour Layout: this extractor is able to capture the spatial distribution of colour based on a grid technique, where the grid can play the ''super-pixel'' rule. After splitting an image into a definite number of grids, a discrete-cosine transformation (DCT) is able to represent the features of the most prominent colours by regions of the image. Colour Layout has proven to be a precise strategy to detect similarity-colour patterns [24];

5. Total Colour: a simple extractor that captures the normalized frequency through histograms of each band of the RGB colour-system. It is easy to obtain and can be used for classification/clusterization [24]. Texture can also be represented by FEMs. The images’ textures can be described, for example, by

the roughness and homogeneity of their surfaces. Although it is trivial for a person to recognize an object’s texture, it is not easy to describe them qualitatively [48], since it can be differently

9

interpreted by each person. In this context, Saipullah and Kim [68] defined texture as an intrinsic feature of the image that represents its roughness and homogeneity. One of the main feature extractors is the Haralick's texture feature [33]. The Haralick's feature computes the contrast, entropy and homogeneity of the image co-occurrence matrix, among other measurements.

Finally, FEMs that describe shape can be considered the closest approximation to the human perception [90]. These methods describe objects on the image by their shapes. This kind of FEM depends on a pre-processing step that segments and detects the border of the objects on the image. There are two types of shape description methods [96]: region-based and boundary-based. The region-based analyses the object as a whole to extract the features and the boundary-based considers only the pixels at the border of the object. There are various methods to extract shape features, as the Zernike moments [36], Fourier descriptors [15] and the contour salience descriptors [81].

Proposal

The feature extraction module should be able to: 1) Represent coloured images through extractors, fulfilling the requirements of non-expensive

computation and capability of representing coloured images; 2) Integrate all FEMs into an atomic module to be used along with other modules, such as

classification or similarity queries; 3) Determine the best FEM to represent the classes ''Fire'', ''Smoke'', 'Fire and 'Smoke'', and

''Explosion'' through several experiments and comparisons to obtain statistical significance.

2.4 ROI/GRID Classification

Goals

According to Solomon and Breckon [76], in the context of image processing, the goal of classification is to identify features, patterns or structures and to assign a given image to a particular class. For this work, we consider the flow presented in Figure 4.

Figure 4 presents a flow diagram adapted from [76] with the main steps in a classification. After starting, it is necessary to define what classes are needed (1). Then, at step (2) we choose the variables that allow the discrimination of images. To define the feature space (3) the appropriated variables are selected to form the feature vectors. At step (4) the model is built based on the training data, and at step (5) the accuracy of this model is computed. Finally, at step (6) we verify if the classification performance is good enough. Various methods like SVM, k-means, decision tree, and association rules have been studied for image classification with features, such as Colour, Shape and Texture [82]. Some of the most known image classifiers of the literature are briefly presented in this section.

10

Figure 4: Flow diagram representing the steps in a classification process

State of the art

Support Vector Machines (SVM) [83] solve large classification problems as an implementation of statistical learning theory [9]. They are effective to learn the complex relationships among features. Unlike classical methods that merely minimize the empirical training error, SVM aims at minimizing an upper bound of the generalization error by maximizing the margin between the separating hyperplane and the data. In this method, the input vectors are mapped into a higher dimensional space and an optimal separating hyperplane is constructed. SVMs have been successfully applied to a number of applications ranging from Bioinformatics to text categorization and face or fingerprint identification [86].

Instance-Based Learning (IBL) is a framework [3] that generates classification predictions using only specific instances. In instance-based learning the training examples are stored verbatim, and a distance function is used to determine which member of the training set is closer to an unknown test instance. Once the nearest training instance has been located, its class is predicted for the test instance. According to [88] most instance-based learners use Euclidian distance.

According to [63] several ensemble classification methods have been proposed in recent years for improved classification accuracy. In ensemble classification, various classifiers are trained and their results are combined through a voting process. Boosting and bagging are examples of the most widely used of such methods. Random Decision Forests [35] build multiple trees in randomly selected subspaces in the feature space. According to the author, trees in different subspaces generalize their classification in a complementary way, and their combined classification can be monotonically improved. Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution of all trees in the forest.

1. Define the classes

2. Examine the data sample

3. Define the feature space

5. Test the classifier

4. Build the classifier

6. Satisfactory?no

yes

START STOP

11

A Multilayer Perceptron (MLP) is a feed forward artificial neural network model that maps sets of

input data into a set of appropriate output. MLP is the most widely used neural network [65], its structure is layered and consists of an input layer, an output layer and one or more hidden layers between these two layers (it could also have no hidden layer). Figure 5 shows an example of a 3-layer MLP, extracted from [65]. The input layer neurons act like buffers and their main task is feeding the network inputs to next layer neurons in training and operational phases. According to [4] MLP networks are general-purpose, flexible, nonlinear models consisting of a number of units organized into multiple layers. Given enough hidden units and enough data, it has been shown that MLPs can approximate virtually any function to any desired accuracy.

Figure 5: Example of a three-layer MLP

The Naïve Bayes (NB) is a simple probabilistic classifier based on Bayes theorem with strong assumptions. According to [69] NB is based on building a feature-independent probability model, assuming that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature, given the class variable. NB is the simplest form of Bayesian networks, in which all attributes are independent, given the value of the class variable. This assumption, called conditional independence, is rarely true in most real-world applications [96]. The parameter estimation for the NB classifier is done using the maximum likelihood algorithm, where a Bayes classifier is trained using supervised learning. One of the advantages of this classifier is that it does not need a large size of samples for good training.

Decision Trees (DT) are used to extract knowledge by inferring decision making rules from the available information [59]. Some of the advantages of DT are the ability to deal with continuous and categorical attributes, the robustness in the presence of noise, and the fact that decision trees are easy to interpret. On the other hand, some disadvantages are that irrelevant attributes may affect the construction of the tree, and there may be difficulties involved in designing an optimal decision tree. A DT is built from a training dataset, which consists of objects, and each of which is completely described by a set of attributes and a class label. Several methods have been proposed to construct decision trees. These methods generally use the recursive-partitioning algorithm, and its input requires a set of training examples, a splitting rule, and a stopping rule [2].

12

Proposal

In the analysis architecture, the ROI/GRID classification component is responsible for indicating if the ROI/GRID contains fire, smoke or both. This component is illustrated in Figure 6. A classifier will be used to train a model using a set of training data, which is already labelled. The component receives as input the features of an ROI/GRID, and classifies it using the current model. Then, the ROI/GRID is labelled as “fire”, “smoke”, or “fire and smoke”, and this result is given as the output of the module – we observe that explosion and crowd are left to the video analysis, as motion is necessary for such identification.

Figure 6: The Classification Module

13

3 Video Analysis

3.1 Goals

The main goal of the Video Analysis Module is to provide multimedia description of the emergency incident based on collected videos. In particular, this module pretends to provide the type of incident that is happening in the scenario in order to assist the operational forces. The main challenge of this module is to propose a detection algorithm under non-static and complex environments, especially those scenarios concerning the industrial and large events scenarios of the RESCUER project.

The input of the Video Analysis Module is the raw video sequences – acquired by the Mobile Crowdsourcing Solution – and the outputs of the Data Filtering Component: smoothed rotation vectors and priority level of the file, as Figure 7 shows. In fact, the raw data will be analysed according to the priority level that the Data Filtering Component assigned to them.

Figure 7. Video Analysis Module in the RESCUER platform

The Video Analysis task is related to the following research fields, which are explained in details in Section 3.2 in order to understand the video analysis system proposed in Section 3.3:

14

• Crowd density and flow estimation. • Flames detection and expanse velocity estimation. • Smoke and gas leaks detection.

Finally, the multimedia description of the emergency scenario that the Video Analysis Module provides is based on the density temporal analysis of the mass in motion, independently of the content of the scene. Therefore, the outputs of this analysis component are described in the Table 1. The visual multimedia description that the Video Analysis Module provides is detailed in the Figure 8.

Figure 8: Example of a visual multimedia description of the Video Analysis Module

Table 1. Video Analysis Module output

Parameter Definition Source Incident description: Kind of the incident Crowd, fire, smoke or combination. Video Analysis

Module Location Geo-reference. Raw data Timestamp Temporal reference. Raw data Objective measures: People/m2 Estimation of this measure. Video Analysis

Module Risk of congestion alarm:

Notification of the risk of congestion is provided based on the density temporal analysis.

Video Analysis Module

Fire/smoke expanse velocity:

Estimation of the fire/smoke expanse velocity based on density temporal analysis.

Video Analysis Module

The following subsections present the different works available in the literature in order to face

the research problem. Regarding the Video Analysis Module, its design and the main stages of the procedure are described too.

15

3.2 State of the art

The Video Analysis Module belongs to different research fields due to its several objectives to achieve. The aim is to offer an overview of the fields with the objective to understand the different available algorithms and formulate consequently the RESCUER proposal.

Therefore, this section is divided into three main fields belonging to the three kinds of events that this module will identify: crowd, fire, smoke. After the state of the art, the RESCUER Video Analysis Component proposal is presented.

3.2.1 Crowd Analysis

There are many proposals in the literature related to crowd event analysis in video sequences, whose efforts are focused on analysing the scene motion for extracting anomalous behaviours. They are usually based on motion estimation and tracking algorithms, so the main considerations about these algorithms and a general overview about them is offered in this section. In particular, the reference [94] offers a useful survey about crowd analysis methodologies, which is the basis of this section.

Therefore, the first step is to analyse the objective scenarios in order to identify those features that could influence directly in the algorithm design. The main features are the following:

• Camera typology and topology: o Static or moving camera. o Number of cameras. o Type of video sequence: greyscale, colour.

• Environmental conditions: o Indoor/outdoor. o Level of clutter. o Light conditions.

• Scene topology: o Individual. In this case, some related information to provide can be location, velocity

or appearance. o Collective. Crowd density or average speed of the crowd can be some measures that

the algorithm in this case of scenarios can offer.

On the one hand, the crowd density measurement that can be obtained is the levels of services for a pedestrian flow whose unit measurement is the number of pedestrians per unit area. In order to obtain this measure, some proposals employ background removal techniques [46]; and image processing and pattern recognition techniques [91]. Moreover, some of the high-level measurements provided are the detection of congestions and anomalous behaviours in crowded scenarios.

On the other hand, the different fields where the crowd analysis is present depending on the purpose. In this way, detecting and tracking are also important for crowd dynamics modelling as they provide the location and velocity features of the dynamics; but the detection of individuals in crowded scenes supposes a higher degree of complexity than the associated with the conventional

16

detection techniques. However, some proposals in the following fields have concentrated on methodologies for crowded situations:

• Face and head recognition. This kind of works sometimes requires a long training phase. • Pedestrian and crowd recognition. Several methods have been proposed for pedestrian and

crowd recognition, as combining chromatic and shape information or the background removal technique. Three categories have been identified regarding the pedestrian detection in crowded scenes:

1. Occlusion handling. It is a huge challenge in this field, since the complexity of the crowded scenarios.

2. Moving views. Another kind of method is required for moving-platform applications as on-board vision systems in cars.

3. Spatial-temporal methods. In this case, the pedestrians are considered as moving entities with space-temporal feature similarities.

Additionally, the tracking performance in complex scenes has been studied and the following methodologies have been highlighted:

• Likelihood. A statistical model for a set of parameter values is created with the objective to evaluate the probability that a pixel value belongs to a particular object.

• Human body model. • Inference strategies. These have been developed for the problem of tracking multiple

objects and they are called particle filter techniques; some examples are the condensation technique [39] or the Monte Carlo methods [25].

• Optical flow algorithm. This is one of the most relevant techniques in this field since it achieves good results of crowd density and direction; in this way, the solutions [5] and [41] propose systems that analyse crowd disaster videos basing on optical flow algorithm that offers a vector map as Figure 9 shows. Additionally, camera motion can be obtained from the optical flow by RANSAC algorithm presented in [23]; then, the relative movement between the regions of interest and the camera is extracted, generating a saliency map based on camera parameters, as [1] presents.

Figure 9: Example of optical flow map [5]

17

Finally, other references analyse how some crowd models from sociological and physiological

approaches can contribute to different computer vision techniques and they confirm that the different models combination might give intelligent systems with capabilities to understand crowd behaviours automatically. In the RESCUER project, the video sources are mobile cameras with different field of view and providing variable quality, so the complexity of the analysis algorithm increases.

3.2.2 Flame Detection

Fire scene is another scenario to be analysed by the Video Analysis Module. Many techniques have been proposed in recent years, some of them use colour information [60]. In addition, this work uses motion information with the aim to locate fire more accurately.

Figure 10: Boundary roughness of the blob [8]

However, additional dynamic features are used in [8] and [45], i.e., randomness of area size or spatial distribution. In particular, [8] uses other features as size, surface coarseness, boundary roughness and skewness within estimated fire regions (Figure 10), and it evaluates the behavioural change of each one of these features since the flickering and random characteristics of fire. Additionally, [87] extracts the motion and detects pixels with high intensity values (hot spot detection technique); then, the pixels are grouped into blobs (Binary Large OBjectS). Afterwards, the spatial-temporal structural feature extraction is performed (Figure 11), so the blobs are accumulated during a period of time to generate the accumulated motion mask (AMM) and the accumulated intensity template (AIT). AMM and AIT provide the following properties of flames, respectively: flickering and ring structure of fire with different temperature regions. The final feature vector associated to each blob will be composed by the histogram of gradient of AMM and the spatial ring structure of AIT.

On the other hand, some proposals focus on complex background, in which some common problems as shadows, object’s movements or light changes should be resolved; in this way, [45] proposes an adaptive brightness threshold method for fire detection under complex background.

Many proposals in the literature are based on videos acquired by static surveillance cameras. In fact the industrial environments are common in this field. In addition, those references whose work is based on spatial-temporal information have been taken into account in the Video Analysis Module

18

definition, since the motion of the scene is the main feature of this task and the colour is not enough to develop an efficient classifier that distinguishes between crowd, fire and smoke regions in the scene.

Figure 11: Spatial-temporal structural feature extraction [87]

3.2.3 Smoke Detection

Regarding smoke detection techniques, they have been proposed in the literature as supporting methods for fire detection for making detection module strengthen. Some proposals assume that the scene is stationary, so the motion of the smoke mass can be analysed [80]. This reference proposes a wavelet-based technique that analyses the periodic behaviour of the smoke boundaries and the convexity of smoke regions, basing on the fact that the sharpness of the edges in the image is decreased because of the smoke, as the blurring effect shows in Figure 12.

Figure 12: Smoke detection by [80]. Blurring in the edges is visible

However, the smoke detection techniques usually have a high false alarm rate but [93] try to improve the performance. This work presents an accumulative motion model based on integral image whose objective is to estimate the motion orientation of the smoke as Figure 13 shows. This work concludes that this technique is able to mostly eliminate the disturbance of artificial lights and

19

non-smoke moving objects. Additionally, this reference includes the link [11], where smoke and fire videos are available for testing.

3.2.4 (a) Motion orientation. (b) Orientation accumulation. (c) Histogram of accumulation.

Figure 13: Example of results of reference [93]

The most of the smoke detection algorithms found in the literature are focused on video sequences without camera motion while the false alarm rate is high, so the challenge addressed in the RESCUER project is high. Therefore, the motion compensation task is significant in the Video Analysis Module in order to reduce the false alarm rate in the final classification.

3.3 Proposal

The design of the Video Analysis Module is proposed in this section. The main stages of the procedure are described, as well as the current state of the task.

Before describing the system proposal, some previous considerations should be taken into account in order to understand every step that has been identified:

• Unique analysis algorithm for unknown content should be proposed. • Variability of content in video data influences the design of this system considerably:

o People (crowd). o Flames. o Smoke and gas leaks.

Therefore, these contents can be considered as a deformable mass in motion in the video sequences, so the analysis to be applied can be similar.

• Regarding the detection of the kind of content(s) is (are) present in the scene, the algorithm can be based on colour information and/or structural features.

Therefore, the first proposal of the Video Analysis Module is shown in Figure 14. The main steps of the Video Analysis Module have been defined according to the previous requirements. Note that

20

the high-level measures that the Video Analysis Module provides (Table 1) are sent to the Emergency Response Toolkit, which is in charge of visualizing the analysed results. In general, all of the steps of the processing are related to crowd analysis techniques in which the main source information is the motion vector map.

First, the inputs have been identified with the objective to specify the usefulness for the analysis task. They are listed as follows:

• Clusters of raw videos based on their location. The video cluster informs to the Video Analysis Component about content-based similarities between all videos of that cluster, so this consideration should be taken into account to provide more robust results in this analysis component.

• Per video: o Analysis priority level. According to [64], the data from the Mobile Crowdsourcing

Solution is ordered by the Data Filtering Component basing on its objective quality; so, the result of this component is the analysis priority level of every video. Therefore, the analysis component evaluates data according to the priority order since the most priority data has more probabilities to offer valid analysis results.

o Smoothed rotation vector of every frame. In the previous filtering stage, the rotation vectors associated with each frame are smoothed with the objective to reduce the shaky movements. Therefore, this new rotation is significant to stabilize the frames of the video sequence.

Figure 14: RESCUER Video Analysis Module proposal

Then, the stages of the Video Analysis Module proposal are described as follows: • Frame Stabilization. The stabilization process is carried out in the analysis task in order to

reduce the computational cost of the video filtering task. Therefore, every frame of the video sequences will be stabilized according to the new rotation vector. The Figure 15 presents an initial diagram of the Frame Stabilization module.

21

First, the homography module pretends to obtain the camera model with the objective to project the camera 2D world to the real world and can calculate some high-level measures. Therefore, the complete frame transformation equation is calculated taking the new matrix rotation as an input. Then, the rotation transformation of every frame of the video is taken place in the module called Transformation, where a mapping procedure is carried out. Therefore, new coordinates are calculated and these should be interpolated with the objective to offer a better visual appearance of the transformed frame.

Figure 15: Stabilization Module in the Video Analysis Component

• Motion Vector Map. This module calculates a motion vector map associated with each stabilized frame of the video sequences in order to detect homogeneous areas with motion similarities. The particular algorithm is not developed yet, but the optical flow will be evaluated in the following iteration of the RESCUER project. Moreover, a supporting camera motion algorithm should be evaluated at this point of the processing if the sequence is not stabilized enough.

• ROI Detection. Regions of interest in the RESCUER scenarios are detected from the motion vector map. The vectors are clustered in regions whose motion direction or speed is the same for example. At this stage of the processing, the kind of event that is being taking place is not known yet, so the features that will characterize the ROIs should be common to the three categories: crowd, fire, and smoke.

• Identification/Classification. This stage classifies the previously detected ROIs. Some spatial features can be taken into account to distinguish crowd, fire and smoke regions in the scene. Since the acquired videos in the RESCUER project provide colour frames, this information can be analysed in order to detect fire.

• Tracking. The tracking algorithm obtains high-level measures about the crowd. The tracking algorithm of the Video Analysis Component uses the motion vectors obtained in the previous stage as input and provides pixel-based tracking information. This module should be studied in detail in the following iteration of the project, and it will depend on the evolution of the previous modules of this analysis component.

• Measurements. Finally, the outputs of the Video Analysis Component for every evaluated video will be the kind of event/s that is/are present in the scene and crowd measurements as the density of people and the risk of congestion. Regarding the fire or smoke events, the expanse velocity will be provided too.

22

4 Text Analysis

4.1 Goals

Text based communication can be an important mean in case of catastrophic incidents. It has a number of advantages over classical phone based emergency communication. First and most important, it does not necessarily rely on the “classical” telephone infrastructure. Considering the self-organizing emergency network infrastructure, developed as part of RESCUER project, sending text messages to the command and control centre and vice versa is possible even if surrounding cellular transmitter stations are overstretched or damaged by the emergency. The command and control centre can thus receive incident reports and situation updates from crowd and first responders even in case of a breakdown of the surrounding communication infrastructure and is not partially “blinded”.

Additionally, it is possible to establish bidirectional communication between the command centre and crowd at the place of the emergency without the need for a constant connection like phone network. While eyewitnesses report about the incident sending text messages, the command and control centre in turn can either request additional information to refine situational overview using message-based communication or broadcast behavioural directives or important instructions like evacuation routes for example. In this context, text based communication may also be considered as an alternative to traditional radio communication for the operational forces.

Finally, eyewitnesses can report the course of an incident to the command and control centre faster and in a significantly higher number than possible with normal emergency hotlines. Emergency call centres usually have a limited capacity. With every phone call taking at least several seconds, this capacity is likely to be exhausted sooner or later in case of a mass emergency. This may lead to crucial information not being delivered to the command and control centre in time or at all. Additionally, there is no need for the people working at the command and control centre to log incident facts from phone calls. Available information is already present in textual form and only has to be evaluated and exploited. This reduces the risk of missing important parts of information and misunderstandings due to a high level of sound at the place of the incident.

Although text based communication circumvents the problems of an overstretched infrastructure and occupied emergency phones, it still requires a significant amount of human resources to analyse the incoming reports. In case of a disaster during a mass event, hundreds or thousands of messages are likely to arrive to the command and control centre at the same time. Analysing their content and prioritizing those having valuable information while discarding less useful texts is a very time and human resource consuming process.

The text analysis module is intended to support the process of information extraction from incident reports and should assist operational forces at the command and control centre in obtaining relevant information faster. We aim to develop a text analysis module to automatically deduce the content of text messages focusing on the three essential aspects, which are important for operational forces in every kind of emergency: What is happening, where is it happening, and who is affected by the incident? Additionally, the when information may be considered although we believe that RESCUER as an emergency platform deals with current messages anyway.

23

Although we want to apply state-of-the-art means to extract information from texts, the

techniques cannot be guaranteed to be 100 % accurate or free from errors. As the application in emergency situations is critical, the final decision whether a message and the information it contains is useful or not must nevertheless not be made automatically, but has to involve human interaction. The text analysis module will assist this process by providing automatically generated annotations indicating the key content of incident reports to give an immediate overview of what information a message probably contains.

4.2 State of the art

4.2.1 Information Extraction and Natural Language Processing

The process of automatically extracting structured information from unstructured data is called information extraction (IE) [71]. Unstructured data does not follow a strictly defined data model or is organized in a predefined way. This part of the document focuses on information extraction from machine-readable texts written in natural languages, although the term IE may also be extended to cover content extraction from multimedia data such as images and videos [40], or audio. The basic task of IE from texts is to extract information about entities, attributes of entities, and relations among them, which can be stored in some kind of database [21] or for further processing tasks.

A related research area is information retrieval (IR). It is concerned with obtaining resources from a database, which are likely to contain certain information of interest. Depending on the database and application, these resources can be text documents or arbitrary other multimedia objects like images, video, or sound files. Although they share some common processing (sub-)tasks, the purpose of IE is different to IR. While IE is about finding information from texts, IR is about determining suitable documents in a database, which may contain certain information. Other related areas of information extraction focusing textual data are automatic summarization, text clustering and text classification. The first one is concerned with creating summaries of texts distinguishing the key content from irrelevant parts [32] of information. The latter two are commonly found in the IR context. Text clustering is about identifying similarities between texts often considered on a semantic level. Text classification is concerned with assigning one or more class labels to text documents. Information extraction from texts involves means of natural language processing (NPL) to a large extend. According to [44], natural language processing is “a theoretically motivated range of computational techniques for analysing and representing naturally occurring texts at one or more levels of linguistic analysis for the purpose of achieving human-like language processing for a range of tasks or applications”. In other words, NLP is concerned with making computers understand human written texts to a certain extend for some kind of application. It combines techniques from linguistics, computer science, and artificial intelligence. The list of research tasks in NLP is long and diverse. Some tasks which are important in IE are named entity recognition [55; 79] (recognizing named entities like person names, organizations, and location), part-of-speech-tagging [10; 72] (assigning part of speech of words in sentences), and parsing [47; 53] (determining the parse tree of a sentence).

24

4.2.2 Knowledge Engineering vs. Automatic Learning Approaches

Literature review yields two main approaches to extract information from texts written in natural languages. The work present in [7] distinguishes between the so-called knowledge engineering approach and automatically trainable systems. The first one is concerned with creating extraction rules manually using the knowledge and skills of an expert, in the second approach these rules are created automatically based on a statistical model which is trained beforehand using machine learning techniques.

The Knowledge Engineering Approach assumes a “knowledge engineer”, i.e., a person who is familiar with the techniques to create rules, grammars, etc. to extract information from texts. The knowledge engineer usually needs access to a large enough corpus, i.e., a collection of text documents to extract information from, and deduces a set of rules based on his expert knowledge and experience. During this process, he may possibly be assisted by an expert from the domain of the information to extract. Examples of handcrafted rule-based systems can be found in [66; 43]. In general, knowledge-engineered systems perform very well, although the quality of the extracted information highly depends on the skills and experience of the engineer. However, creating rules and grammars manually can be very time-consuming and error-prone process. Furthermore, it may need a lot of effort to adapt existing rules if the underlying specification changes.

Automatic learning approaches on the other hand work with an annotated corpus, i.e., a collection of text documents to either obtain information extraction rules automatically [20, 75], or train an underlying statistical model, e.g., HMMs [27; 28] or Conditional Random Fields [42; 70]. The advantage of automatic learning methods over the manual engineering approach is obvious. The performance mainly depends on the training set and not on the knowledge and skills of an expert obtaining information extraction rules manually. Such systems can thus be created with less effort and are more flexible to be applied to new information domains. The most important disadvantage of the learning approach, however, is that a large amount of data in for training is needed. This data may be difficult to obtain in large quantity and has to be annotated manually in most cases. Furthermore, a change of the underlying specification may require an effortful re-annotation of the whole training corpus.

Due to a lack of data, i.e., a corpus of emergency text messages, which were actually created during an emergency event, it is hardly possible for us to use machine learning techniques to train processing components on our own. Despite of already existing ready-to-use components, we will thus mainly follow the knowledge engineering approach for now if new or adapted extraction rules are required. It is, however, important to state that we will design the architecture of our processing pipeline in such a way, that it allows changing the implementation of the individual components to easily replace handcrafted, rule-based ones with statistical ones. If it is possible to obtain enough corpuses of large emergency text messages and the engineering approach becomes too complex or performs poorly, it would be possible to experiment with learning-based components.

25

4.2.3 Situational Awareness from Text Messages During Catastrophic Incidents

An important source of information about incidents and disasters can be social media, which has gained increasing popularity over the last years as a means of communication and information sharing. Recent research has focused a lot on using texts from social media to get information about an incident and its course. In [85], the authors investigate how Twitter can be used to contribute to situational awareness during natural disaster events. Therefore, they analysed micro-blogging posts collected during the Oklahoma Grassfires (2009) and the Red River Floods (2009) to identify features of valuable information to enhance situational awareness for operational forces. According to the authors, the outcome was promising and has led to the development of a framework for the design and implementation of emergency IE software systems. The work at [84] presents a classifier which is able to automatically detect Twitter messages from a stream of incoming tweets that may contain valuable information increasing situational awareness. Therefore, they use a combination of hand-annotated and automatically extracted linguistic features.

According to the DoW, RESCUER focuses on textual messages coming from the mobile smartphone application, as we are particularly interested in current messages coming directly from the place of the incident. Facebook, Twitter, or similar platforms, however, are more likely to be used by people who are at a safe distance or who are not affected at all. In any case, the currentness of reports is probably not given which contradicts the aim of RESCUER as a platform to overview incidents while they are happening. However, at this point it is important to underline, that the text analysis module could be applied to social media texts as well with some effort as they exhibit very similar characteristics.

4.3 Characteristics of Incident Reports and Their Content

4.3.1 The RESCUER Mobile Application

The incident reports we are focusing on are sent by people using the RESCUER mobile smartphone application. Having reported an incident, i.e., a fire, explosion, or gas leak etc., to the command and control centre choosing from a predefined list of emergency events people can supply additional information using free-text messages. These messages may describe the incident itself (e.g., “the fire is spreading towards…“), provide location details (e.g., “on the second floor of the building”), or report about causalities. Furthermore, users can add textual descriptions to other multimedia objects like images and video.

We thereby distinguish between three groups of people, which may use the application supplying textual reports. Civilians on the one hand are persons who are directly at or at least very close to the place the incident happens and do not have a special training how to behave in case of an emergency. They are very likely to be overwhelmed by the situation or get into a panic which will certainly be reflected in the way they report as discussed in Section 5.3.3. Members of the operational forces, e.g. fire fighters, and other specialists, on the other hand are trained how to behave in case of catastrophic incidents and, in particular, know how to report events effectively to the command and control centre.

26

4.3.2 Information to Extract from Incident Reports

In the RESCUER text analysis module, we consider information extraction from texts coming from the domains of emergency and catastrophic incidents. The main purpose of our module is to extract the most important parts of the information in such texts presenting them in the form of annotations to allow a quicker distinction between more and valuable messages. During every emergency, there are three aspects that are most crucial for the operational forces:

1. What is happening?

First, it is important to identify what is happening or, in other words, the type of the incident e.g. fire, smoke, or panic. It is crucial for quick and efficient situation handling to identify the kind and severity of an incident and alert appropriate types and numbers of operational forces.

2. Where is it happening?

The next important information is where an event takes place. “Where” in this context indeed refers to the place of the incident, and not the position of the user, who may observe the situation from a distance and whose position is given by the GPS position system of the mobile device. We distinguish between three different ways to specify a location in free text:

- Named locations: e.g., “Allianz Arena” - Conceptual locations: e.g., “building”, “entrance”, “stage”, “stands” - Numerical positions: e.g., GPS coordinates Depending on the type a different processing approach may be necessary e.g. named entity

recognition for named locations or keyword spotting for conceptual ones. Prepositional phrases in front of a location, e.g., “in front of”, “on top of”, “below”, may give additional information.

3. Who is involved in the reported event?

Finally, it is important to know about people, who are involved in the incident. The RESCUER system thereby distinguishes between people who are affected by incident in, e.g., people who got injured for example. Descriptions of people can be enriched with adjectives or quantitative measures e.g., “many”, “a lot of”, “one hundred”, “group of”, but also figures “10000”.

One note on temporal (when) information: For now, we will rely on timestamps of the incident reports to evaluate their currentness assuming that a report arriving at the text analysis module is a “live” description of the situation and has been recently sent. In case the extraction of temporal information turns out to be valuable it is possible to extend the processing pipeline we present in 4.4. with this feature.

27

4.3.3 Characteristics of Incident Reports

Incident messages are mostly obtained from eyewitnesses who are likely to be within short distance of the incident. It is assumed that they are overwhelmed by the situation, stressed, shocked, and probably in panic. These circumstances are very likely to be reflected in the way, people report free-text information using their smartphones. Characteristics expected in incident reports to exhibit can be compared to typical properties of SMS texts, some of which can be found in [62]:

• Short length: The messages are expected to be short in length as people who are close to a catastrophic incident are likely to limit their descriptions to an essential minimum.

• Unusual grammar: If texts are written in stress or panic situations, or in a hurry, they possibly exhibit unusual grammar.

• Unusual punctuation • Orthographic errors including spelling errors and typing errors • Slang expressions and “SMS language” • Emoticons

4.4 The RESCUER Text Analysis Module

4.4.1 Requirements

Considering the application in mass emergency events and industry areas, a number of requirements are deduced our text analysis module should meet. The following listing presents the most important ones.

R1. The text analysis component should be able to deal with the special characteristics of emergency text messages presented in 4.3.3.

R2. The extraction of information from incoming texts should be done as fast and as efficiently as possible. Although there is no real-time critical deadline to meet, the faster information can be processed the sooner it can be provided to the command and control centre and contribute to situation overview.

R3. Closely related to fast and efficient processing is scalability. With the RESCUER platform being applied during mass events involving a large number of people (> 10.000), it is important for the text analysis module to be easily integrated into a scalable system architecture.

R4. It is possibly necessary to experiment with different realizations of processing components. The text analysis module should be designed flexible enough to allow adding, exchanging, and removing components without large effort.

28

4.4.2 Conceptual Architecture of the Text Analysis Module

The conceptual architecture of the text analysis module is depicted in Figure 16. As an input, it gets incident reports sent from users with the RESCUER mobile application. The analysis module then applies a series of processing steps to extract the what, where, and who information from the text, and annotates the report with the found keywords and phrases. In contrast to the image and video analysis module, it is not considered an explicit quality-filtering step before the actual processing of texts. The reason therefore is that reasonable filtering based on grammar, word choice, etc. already involves a number of processing steps required for the actual information extraction making an explicit division between filtering and IE steps questionable. Furthermore, processing of short text messages does not require a significant amount of time, but can be done far below 100ms (for the example architecture depicted in Figure 17) in contrast to image and video processing.

Figure 16: Conceptual architecture of the RESCUER text analysis module. As input the module gets incident reports containing free-text descriptions of the incident created by people who are at the place of the event. As an output, the semantic what, where, and who information is added to the

original report forming an emergency report

Text-processing is very often done using a pipeline of subsequently applied processing components starting from low-level syntactic pre-processing like tokenization or POS-tagging to high-level tasks like the extraction of semantic information of interest. Every component thereby has an exactly specified interface, making it possible to change its implementation without changing the overall architecture. Although some researchers argue that pre-processing should be done using a single component, it has been decided to go with the pipeline approach, as it is versatile and adequate for modularization. It allows a flexible design of the text analysis process as the individual processing steps are separated from each other. It allows adding new and removing existing components as well as changing the implementation of a processing component as long as the interface stays the same. This is very advantageous for experimental tasks, in particular. Figure 17 shows an example of a typical text-processing pipeline. Its components are explained in the following paragraphs.

29

Figure 17: Example text-processing pipeline containing a tokenizer, gazetteer, sentence splitter, POS tagger, chunker, and the semantic annotator component extracting semantic information

a) Tokenizer

The first step of almost every text information extraction system is tokenization. It is a mean of text segmentation and denotes the task of splitting text into logical text units called tokens. The corresponding processing component is called a tokenizer. Commonly, tokenization is applied to distinguish individual words, numbers, and punctuation marks in a text. For many European languages this task is basically very simple, as one can simply use whitespace characters and punctuation marks to identify individual tokens. However, some language depended special cases need to be taken into account like inverted commas in English or accents in French. For Asian languages such as Chinese [38; 37], or Arabic [31] this task is more complicated and needs special and careful consideration.

b) Gazetteer

Gazetteers perform lookup of words or word phrases from a text in a set of predefined lists of keywords and can assign them an arbitrary set of annotations. Sometimes, these lists themselves are also referred to as gazetteers. They are commonly used to identify names of entities such as names of people (e.g., “Peter”, “Andy” etc.) or places (e.g., “Allianz Arena”, “Frankfurt Airport”, “Empire State Building”), but can be applied to find any set of words as long as they are contained in one of the keyword lists. In the simplest case, these gazetteer lists can be created manually or from an existing database. Novel approaches aim at generating them automatically [56; 95]

30

c) Sentence Splitter

As the name implies, sentence splitters identify sentences in texts. In properly written texts this task is basically very simple as sentences usually end with punctuation marks like dots or exclamation marks. However, some special attention must be paid to the use of dots in abbreviations like “Dr.”. More sophisticated splitters use gazetteer lists for looking up abbreviations before the actual splitting to solve this problem. The identification of sentences is necessary for the subsequent application of the POS tagger determining the type of each token.

d) POS Tagger

POS (“part of speech”) taggers determine the word or lexical class of each word, which depends on its syntactical and morphological behaviour. Example classes, which are common in almost every language, are noun and verb. In European languages, other categories such as adjective, adverb, or preposition exist. Another factor influencing the part of speech of a word is the semantic context it is used in. The word “help”, for example, can be used either as a noun or as a verb and the exact classification depends on the context it is used in. Famous taggers are the rule-based Brill tagger [10] and an improved version of it proposed by Hepple [34].

e) Parser and Chunker

Parsing in computer linguistics is the process of analysing a sentence to identify its constituents and their syntactic relation with respect to the grammatical rules of the language. The result is a so-called parse tree. In contrast to a POS tagger, it does not only provide part-of-speech tags for individual words, but describes how these words can be grouped into phrases and how phrases in turn form the overall sentence. Examples of parsers can be found in [13; 17; 18].

Chunking or shallow parsing is less complex than parsing. It denotes the process of finding the constituents of a sentence, e.g., noun phrases or verb phrases. In contrast to parsing, however, it does not create a complete parse tree, reducing the runtime complexity, but only identifies the constituents of a sentence. Different approaches to chunking exist [22; 54; 73].

f) Semantic Annotation

In the last step, semantic annotations covering one or more domains of interest (e.g., locations or incident types) can be created based on the annotations obtained in the previous steps. In the simplest case, the results of the gazetteer lookup step may already contain semantic information, e.g., “football stadium” may have already been tagged as a location or “fire” as an incident. More sophisticated annotations covering a range of words, e.g., word phrases, can be created using rules and grammars which can be either obtained manually following the knowledge engineering approach, automatically using annotated training data, or statistically using trained statistical models.

One example of semantic annotation is the task of named entity recognition (NER), which is concerned with finding and classifying elements in texts into pre-defined categories. Classes commonly found in the context of NER are persons, organizations, and locations, as well as dates,

31

monetary values, and percentages. Named entity recognition is a well-researched NLP problem with many different approaches existing [26; 51; 52; 97].

4.4.3 The RESCUER Processing Pipeline

As information extraction from texts is already a very well researched area with many ready-to-use applications being available, it is not reasonable to develop a new information extraction system and its components completely from scratch. To create a first prototype version of the text analysis pipeline, GATE text engineering toolbox [29] is used. It offers a large number of ready-to-use processing resources as well as complete text analysis solutions. Famous tools include the OpenNLP processing pipeline [6], parts of the StanfordNLP [77] text-processing tools, and many other plugins offering a wide range of functionalities. As all the components have standardized interfaces, it is even possible to combine processing resources of different toolboxes to create new, customized pipelines. Furthermore, existing components based on statistical models can be trained with new data to specialize them to a certain information domain like emergencies or disasters in case the existing versions perform insufficiently. If certain functionality is not included in GATE, it is easy to extend it with new components. Figure 18 shows the GATE GUI with a short example message in the upper central frame and a set of annotations to add to the text on the left.

Figure 18: GATE graphical user interface with example message

Considering the implementation of a final RESCUER prototype, processing pipelines created with the GATE GUI can be easily integrated into other applications. Therefore, GATE offers a comprehensive Java-API to load existing processing resources, including complete pipelines, and other linguistic resources such as lookup tables, dictionaries etc. Considering scalability, it is possible to create several instances of processing pipelines sharing common resources. With respect to the architecture of the RESCUER data analysis module, all that needs to be done is converting the texts to

32

analyse to the document representation of GATE. Because of the possibility to create a prototype from existing resources, the ability to modify existing components and easily create completely new ones, as well as the integration to other applications without large effort, it is considered GATE as a good starting point for our next steps towards a RESCUER text analysis module.

We picked up the pipeline processing approach described above and use components of the ANNIE (a Nearly-New-Information-Extraction System) OpenNLP, and TwitIE pipelines included in GATE to create a preliminary pipeline following the architecture presented above.

4.5 Next steps

In the upcoming project months we will focus on three major points: 1. Concretizing the processing pipeline and implementing a first version of the RESCUER text

analysis module. The processing pipeline we presented in chapter 4.4 is an early example of the approach we want to take which needs to be refined. In particular, we need to focus on the semantic annotation step which we plan to realize using the JAPE (Java Annotations Pattern Engine) framework of GATE.

2. To evaluate the text-processing application reasonably, we need a large corpus of sample messages from real emergencies. However, up to the time this deliverable was due, we did not find an appropriate source of such text messages. We thus need to make further efforts to find a possibility to evaluate our module.

3. Integration of the text analysis module to the RESCUER platform. As mentioned before it is easily possible to load an application developed in GATE in another Java application. Some effort, however, is necessary to integrate the module to the RESCUER communication infrastructure used to receive incident reports from users of the mobile application and send processed reports to the ERTK.

33

5 Integration of Image, Video, and Text solutions The partners have agreed to use the technologies JSON-RabbitMQ-HTTP to integrate the

modules of image, video, and text analysis, internally and externally (with modules 3.2 and WP4). The details of such integration are explained in Figure 19, which shows that the Data Analysis Solution will comprise three submodules of Text, Video, and Image. The communication of the three modules and the filtering module will occur by means of message broker technology RabbitMQ (Middleware in the diagram). In a few words, following the RabbitMQ protocol, each module will “sign up” in the message broker as publisher, in order to post its output (input for others); and as subscriber, in order to read the output (as input to them) of other modules. In this specific case, all the three modules will subscribe to the filtering module, from which they will get all their input. The three modules, then, will work independently, posting their outputs to the Emergency Response Toolkit, which will subscribe to all the three modules. Furthermore, the image and video modules will access files using technology HTTP, which will abstract a remote storage solution – BLOB storage in the diagram.

Figure 19 – Diagram depicting the integration of the Data Analysis Solution by means of the technology RabbitMQ (Message Broker Middleware)

34

6 Conclusions Automatic data analysis concerning image, video, and text is one of the key features expected

from RESCUER. This capability shall improve the capacity of interpreting the emergency situations, enhancing the decision making process. The goal is to have the computer filtering irrelevant data, at the same time that it will highlight potentially useful data, what is a necessary feature since crowdsourcing may produce way more information than what human beings can monitor and interpret. Reviewed techniques

In D3.2.1, the literature has been extensively reviewed in order to identify the most advanced techniques for automatic data analysis. The initial architectural designs have been delineated, according to which the investigated techniques shall integrate both internally to Task 3.2 as well as externally to the whole RESCUER solution. A broad set of possibilities to work with has been achieved; currently, we are coding and evaluating the possibilities, which share similar Application Programming Interfaces – and that, therefore, can be interchangeably considered along the next iterations. Data aggregation1

Data aggregation will be done based on the metadata that is part of each data element (image/video/text); that is, the solution will consider location (“where”) and time (“when”), as these data straightly determines which data elements are related. Aggregation must be done on an incident basis, by proximity and time of occurrence considering simple Euclidean distance and time windowing, which are trivial and consolidated techniques.

In the next deliverable, the D3.2.2, we will present a finer set of techniques among the ones we have investigated so far. The selected techniques will have to be implemented in a first prototype that will work for experimentation and for early integration efforts. In the last iteration D3.2.3, it is expected to have validated techniques, a more mature prototype, and satisfactory integration with the RESCUER solution. The criterion is to have components that, together, will satisfy to the requisites of the project.

1 In computer science, aggregation may have many different meanings; however, the project description does not define it precisely. Here, we consider that aggregation means grouping of related data, as it would be desirable given the project characteristics.

35

References [1] Abdollahian, G.; Taskiran, C.M.; Pizlo, Z.; Delp, E.J.; “Camera Motion-Based Analysis of User

Generated Video”, IEEE Transactions on Multimedia, Jan. 2010. [2] Agarwal, C. & Sharma, A. (2011), Image understanding using decision tree based machine

learning, in 'Information Technology and Multimedia (ICIM), 2011 International Conference on', pp. 1-8.

[3] Aha, D. W.; Kibler, D. & Albert, M. K. (1991), 'Instance-Based Learning Algorithms', Mach. Learn. 6(1), 37-66.

[4] Alsmadi, M.; Bin Omar, K.; Noah, S. & Almarashdah, I. (2009), Performance Comparison of Multi-layer Perceptron (Back Propagation, Delta Rule and Perceptron) algorithms in Neural Networks, in 'Advance Computing Conference, 2009. IACC 2009. IEEE International', pp. 296-299.

[5] Andrade, E.L.; Blunsden, S.; Fisher, R.B.; “Modelling Crowd Scenes for Event Detection”, 18th International Conference on Pattern Recognition, 2006. ICPR 2006.

[6] Apache OpenNLP, Retrieved from https://opennlp.apache.org, Accessed 2014-08-19 [7] Appelt, D. E. (1999). Introduction to information extraction. Ai Communications, 12(3), 161-172. [8] Borges, P.V.K.; Izquierdo, E.; “A Probabilistic Approach for Vision-Based Fire Detection in Videos”,

IEEE Transactions on Circuits and Systems for Video Technology, May 2010. [9] Braun, A.; Weidner, U. & Hinz, S. (2011), Support vector machines, import vector machines and

relevance vector machines for hyperspectral classification x2014; A comparison, in 'Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS), 2011 3rd Workshop on', pp. 1-4.

[10] Brill, E. (1992, February). A simple rule-based part of speech tagger. In Proceedings of the workshop on Speech and Natural Language (pp. 112-116). Association for Computational Linguistics.

[11] Bilkent Signal Processing Group, Ankara, Turkey, Smoke and fire videos for testing: http://signal.ee.bilkent.edu.tr/VisiFire/Demo/SampleClips.html.

[12] Calderara, S.; Piccinini, P. & Cucchiara, R. (2011), 'Vision based smoke detection system using image energy and colour information', Machine Vision and Applications 22(4), 705-719.

[13] Charniak, E. (2000, April). A maximum-entropy-inspired parser. In Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference (pp. 132-139). Association for Computational Linguistics.

[14] Chen, T. ; Yin, Y.; Huang, S.; Ye, Y (2006). ‘The Smoke Detection for Early Fire-Alarming System Base on Video Processing’, Intelligent Information Hiding and Multimedia Signal Processing, International Conference on, p. 427-430

[15] Chen, C.-S.; Yeh, C.-W. & Yin, P.-Y. (2009), 'A novel Fourier descriptor based image alignment algorithm for automatic optical inspection', J. Visual Communication and Image Representation 20(3), 178-189.

[16] Chen, R.; Luo, Y. & Alsharif, M. R. (2013), 'Forest Fire Detection Algorithm Based on Digital Image', JSW 8(8), 1897-1905.

36

[17] Church, K. W. (1988, February). A stochastic parts program and noun phrase parser for

unrestricted text. In Proceedings of the second conference on Applied natural language processing (pp. 136-143). Association for Computational Linguistics.

[18] Collins, M. J. (1996, June). A new statistical parser based on bigram lexical dependencies. In Proceedings of the 34th annual meeting on Association for Computational Linguistics (pp. 184-191). Association for Computational Linguistics.

[19] Collumeau, J.-F.; Laurent, H.; Hafiane, A. & Chetehouna, K. (2011), Fire scene segmentations for forest fire characterization: A comparative study, in 'Image Processing (ICIP), 2011 18th IEEE International Conference on', pp. 2973-2976.

[20] Cordeiro, J., & Brazdil, P. (2004). Learning Text Extraction Rules, without Ignoring Stop Words. In PRIS (pp. 128-138).

[21] Cowie, J., & Lehnert, W. (1996). Information extraction. Communications of the ACM, 39(1), 80-91.

[22] Daelemans, W., Buchholz, S., & Veenstra, J. (1999). Memory-based shallow parsing. arXiv preprint cs/9906005.

[23] del-Blanco, C.R.; Jaureguizar, F.; Salgado, L.; Garcia, N.; “Target Detection Through Robust Motion Segmentation and Tracking Restrictions in Aerial Flir Images”, IEEE International Conference on Image Processing, 2007. ICIP 2007.

[24] Deng, Y.; Manjunath, B. S.; Kenney, C. S.; Moore, M. S. & Shin, H. (2001), 'An efficient colour representation for image retrieval', IEEE Transactions on Image Processing 10(1), 140-147.

[25] Doucet, A., Godsill, S., Andrieu, C.: On sequential Monte Carlo sampling methods for Bayesian filtering (2000).

[26] Florian, R., Ittycheriah, A., Jing, H., & Zhang, T. (2003, May). Named entity recognition through classifier combination. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4 (pp. 168-171). Association for Computational Linguistics.

[27] Freitag, D., & McCallum, A. (1999, July). Information extraction with HMMs and shrinkage. In Proceedings of the AAAI-99 workshop on machine learning for information extraction (pp. 31-36).

[28] Freitag, D., & McCallum, A. (2000). Information extraction with HMM structures learned by stochastic optimization. AAAI/IAAI, 2000, 584-589.

[29] General architecture fort ext engineering, Retrieved from http://gate.ac.uk, Accessed 2014-08-19

[30] Gunasekaran, A. & McGaughey, R. E. (2009), 'Mobile Commerce; Issues and Obstacles', Int. J. Bus. Inf. Syst. 4(2), 245--261.

[31] Habash, N., & Rambow, O. (2005, June). Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (pp. 573-580). Association for Computational Linguistics.

[32] Hahn, U., & Mani, I. (2000). The challenges of automatic summarization. Computer, 33(11), 29-36.

[33] Haralick, R.; Shanmugam, K. & Dinstein, I. (1973), 'Textural Features for Image Classification', Systems, Man and Cybernetics, IEEE Transactions on SMC-3(6), 610-621.

37

[34] Hepple, M. (2000, October). Independence and commitment: Assumptions for rapid training and

execution of rule-based POS taggers. In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics (pp. 278-277). Association for Computational Linguistics.

[35] Ho, T. K. (1995), Random decision forests, in 'Document Analysis and Recognition, 1995., Proceedings of the Third International Conference on', pp. 278-282 vol.1.

[36] Hosny, K. M. (2008), 'Fast computation of accurate Zernike moments', J. Real-Time Image Processing 3(1-2), 97-107.

[37] Huang, C. R., & Xue, N. (2012). Words without Boundaries: Computational Approaches to Chinese Word Segmentation. Language and Linguistics Compass, 6(8), 494-505.

[38] Huang, C. R., Šimon, P., Hsieh, S. K., & Prévot, L. (2007, June). Rethinking Chinese word segmentation: tokenization, character classification, or wordbreak identification. In Proceedings of the 45th annual meeting of the acl on interactive poster and demonstration sessions (pp. 69-72). Association for Computational Linguistics.

[39] Isard, M., Blake, A.: A mixed-state CONDENSATION tracker with automatic model-switching. In: IEEE International Conference on Computer Vision, pp. 107–112 (1998).

[40] Jung, K., In Kim, K., & K Jain, A. (2004). Text information extraction in images and video: a survey. Pattern recognition, 37(5), 977-997.

[41] Barbara Krausz; Christian Bauckhage; “Loveparade 2010: Automatic video analysis of a crowd disaster”, Computer Vision and Image Understanding Volume 116, Issue 3, March 2012, Pages 307–319.

[42] Kristjansson, T., Culotta, A., Viola, P., & McCallum, A. (2004, July). Interactive information extraction with constrained conditional random fields. In AAAI (Vol. 4, pp. 412-418).

[43] Lehnert, W., McCarthy, J., Soderland, S., Riloff, E., Cardie, C., Peterson, J., ... & Goldman, S. (1993, September). Umass/hughes: Description of the circus system used for tipster text. In Proceedings of a workshop on held at Fredericksburg, Virginia: September 19-23, 1993 (pp. 241-256). Association for Computational Linguistics.

[44] Liddy, E. D. (2001). Natural language processing. [45] Qing Liu; Sun'an Wang; XiaoHui Zhang; Yun Hou; “Flame Recognition Algorithm Research under

Complex Background”, IEEE 10th International Conference on Computer and Information Technology (CIT), 2010.

[46] Ma, R., Li, L., Huang, W., Tian, Q.: On pixel count based crowd density estimation for visual surveillance. IEEE Conf. Cybernet. Intell. Syst. 1 (2004).

[47] Magerman, D. M. (1994). Natural language parsing as statistical pattern recognition. arXiv preprint cmp-lg/9405009.

[48] Manjunath, B. S. & Ma, W. Y. (1996), 'Texture Features for Browsing and Retrieval of Image Data', IEEE Trans. Pattern Anal. Mach. Intell. 18(8), 837--842.

[49] Manjunath, B. S.; Ohm, J. R.; Vasudevan, V. V. & Yamada, A. (2001), 'Colour and Texture Descriptors', IEEE Trans. Cir. and Sys. for Video Technol. 11(6), 703--715.

[50] Maruta, H.; Nakamura, A.; Yamamichi, T. & Kurokawa, F. (2010), Image based smoke detection with local Hurst exponent, in 'Image Processing (ICIP), 2010 17th IEEE International Conference on', pp. 4653-4656.

[51] McCallum, A., & Li, W. (2003, May). Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In Proceedings of the seventh

38

conference on Natural language learning at HLT-NAACL 2003-Volume 4 (pp. 188-191). Association for Computational Linguistics.

[52] Mikheev, A., Moens, M., & Grover, C. (1999, June). Named entity recognition without gazetteers. In Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics (pp. 1-8). Association for Computational Linguistics.

[53] Mooney, R. J. (2007). Learning for semantic parsing. In Computational Linguistics and Intelligent Text Processing (pp. 311-324). Springer Berlin Heidelberg.

[54] Munoz, M., Punyakanok, V., Roth, D., & Zimak, D. (2000). A learning approach to shallow parsing. arXiv preprint cs/0008022.

[55] Nadeau, D., & Sekine, S. (2007). A survey of named entity recognition and classification. Lingvisticae Investigationes, 30(1), 3-26.

[56] Nadeau, D., Turney, P., & Matwin, S. (2006). Unsupervised named-entity recognition: Generating gazetteers and resolving ambiguity.

[57] Ojala, T.; Aittola, M. & Matinmikko, E. (2002), Empirical evaluation of MPEG-7 XM colour descriptors in content-based retrieval of semantic image categories, in 'Pattern Recognition, 2002. Proceedings. 16th International Conference on', pp. 1021-1024 vol.2.

[58] Park, D. K.; Jeon, Y. S. & Won, C. S. (2000), Efficient Use of Local Edge Histogram Descriptor, in 'Proceedings of the 2000 ACM Workshops on Multimedia', ACM, New York, NY, USA, pp. 51--54.

[59] Patel, S.; Mehta, T. & Pradhan, S. (2009), A novel approach using transformation techniques and decision tree algorithm on images for performing Digital Watermarking, in 'Internet Technology and Secured Transactions, 2009. ICITST 2009. International Conference for', pp. 1-6.

[60] Phillips III. W.; Shah M.; da Vitoria Lobo N.; “Flame Recognition in Video”, Pattern Recognition Letters, Volume 23, Issues 1–3, January 2002, Pages 319–327.

[61] Qiu, T.; Yan, Y. & Lu, G. (2012), 'An Autoadaptive Edge-Detection Algorithm for Flame and Fire Image Processing', IEEE T. Instrumentation and Measurement 61(5), 1486-1493.

[62] Rafi, M. S. (2008). SMS text analysis: Language, gender and current practices. Online Journal of TESOL France. Retrieved on December, 22, 2011.

[63] Ramirez, J.; Gorriz, J.; Chaves, R.; Lopez, M.; Salas-Gonzalez, D.; Alvarez, I. & Segovia, F. (2009), 'SPECT image classification using random forests', Electronics Letters 45(12), 604-605.

[64] RESCUER deliverable, “D3.1.1- Data Fusion and Filtering Method Description 1”, September 2014.

[65] Rezvani, R.; Katiraee, M.; Jamalian, A. H.; Mehrabi, S. & Vezvaei, A. (2012), A new method for hardware design of Multi-Layer Perceptron neural networks with online training, in 'Cognitive Informatics Cognitive Computing (ICCI*CC), 2012 IEEE 11th International Conference on', pp. 527-534.

[66] Riloff, E. (1993, July). Automatically constructing a dictionary for information extraction tasks. In AAAI (pp. 811-816).

[67] Saaty, T. L. (1990), 'How to make a decision: The analytic hierarchy process ', European Journal of Operational Research 48(1), 9 - 26.

[68] Saipullah, K. & Kim, D.-H. (2012), 'A robust texture feature extraction using the localized angular phase', Multimedia Tools and Applications 59(3), 717-747.

39

[69] Sami, M.; El-Bendary, N. & Hassanien, A. (2012), Automatic image annotation via incorporating

Naive Bayes with particle swarm optimization, in 'Information and Communication Technologies (WICT), 2012 World Congress on', pp. 790-794.

[70] Sarawagi, S., & Cohen, W. W. (2004). Semi-markov conditional random fields for information extraction. In Advances in Neural Information Processing Systems (pp. 1185-1192).

[71] Sarawagi, S. (2008). Information extraction. Foundations and trends in databases, 1(3), 261-377. [72] Schmid, H. (1994, September). Probabilistic part-of-speech tagging using decision trees. In

Proceedings of the international conference on new methods in language processing (Vol. 12, pp. 44-49).

[73] Sha, F., & Pereira, F. (2003, May). Shallow parsing with conditional random fields. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1 (pp. 134-141). Association for Computational Linguistics.

[74] Sikora, T. (2001), 'The MPEG-7 Visual Standard for Content Description-an Overview', IEEE Trans. Cir. and Sys. for Video Technol. 11(6), 696--702.

[75] Soderland, S. (1999). Learning information extraction rules for semi-structured and free text. Machine learning, 34(1-3), 233-272.

[76] Solomon, C. & Breckon, T. (2010), Fundamentals of Digital Image Processing: A Practical Approach with Examples in Matlab, Wiley-Blackwell.

[77] The Stanford Natural Language Processing Group, Retrieved from http://nlp.stanford.edu, Accessed 2014-08-19

[78] Stehling, R. O.; Nascimento, M. A. & Falcão, A. X. (2002), A Compact and Efficient Image Retrieval Approach Based on Border/Interior Pixel Classification, in 'Proceedings of the Eleventh International Conference on Information and Knowledge Management', ACM, New York, NY, USA, pp. 102--109.

[79] Tjong Kim Sang, E. F., & De Meulder, F. (2003, May). Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4 (pp. 142-147). Association for Computational Linguistics.

[80] BU Toreyin, Y. Dedeoglu, and AE Cetin, “Wavelet based real-time smoke detection in video”, EUSIPCO '05, 2005.

[81] da Silva Torres, R. & Falcão, A. X. (2007), 'Contour salience descriptors for effective image retrieval and analysis', Image Vision Comput. 25(1), 3-13.

[82] Tseng, V. S.; Lee, C.-J. & Su, J.-H. (2005), Classify By Representative Or Associations (CBROA): A Hybrid Approach for Image Classification, in 'Proceedings of the 6th International Workshop on Multimedia Data Mining: Mining Integrated Media and Complex Data', ACM, New York, NY, USA, pp. 61--69.

[83] Vapnik, V. N. (1995), The Nature of Statistical Learning Theory, Springer-Verlag New York, Inc.. [84] Verma, S., Vieweg, S., Corvey, W. J., Palen, L., Martin, J. H., Palmer, M., ... & Anderson, K. M.

(2011, July). Natural Language Processing to the Rescue? Extracting" Situational Awareness" Tweets During Mass Emergency. In ICWSM.

40

[85] Vieweg, S., Hughes, A. L., Starbird, K., & Palen, L. (2010, April). Microblogging during two natural

hazards events: what twitter may contribute to situational awareness. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 1079-1088). ACM.

[86] Wang, Y.; Wang, S. & Lai, K. (2005), 'A new fuzzy support vector machine to evaluate credit risk', Fuzzy Systems, IEEE Transactions on 13(6), 820-831.

[87] Wang H., “Spatial-Temporal Structural and Dynamics Features for Video Fire Detection”, Proceedings of the 2013 IEEE Workshop on Applications of Computer Vision (WACV), 2013.

[88] Witten, I. H.; Frank, E. & Hall, M. A. (2011), Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.

[89] Yamagishi, H. & Yamaguchi, J. (1999), Fire flame detection algorithm using a colour camera, in 'Micromechatronics and Human Science, 1999. MHS'99. Proceedings of 1999 International Symposium on', pp. 255--260.

[90] Yang, M.; Kpalma, K.; Ronsin, J. & others (2008), 'A survey of shape feature extraction techniques', Pattern recognition, 43--90.

[91] Yin, J., Velastin, S., Davies, A.: Image Processing Techniques for Crowd Density Estimation Using a Reference Image. Proc. 2nd Asia-Pacific Conf. Comput. Vis. 3, 6–10 (1995).

[92] Yuan, F. (2011), 'Video-based smoke detection with histogram sequence of LBP and LBPV pyramids', Fire Safety Journal 46(3), 132-139.

[93] Feiniu Yuan, “A fast accumulative motion orientation model based on integral image for video smoke detection”, Pattern Recognition Letters, Volume 29, Issues 7, May 2008, Pages 925–932.

[94] Zhan B.; N. Monekosso, D.; Remagnino, P.; A. Velastin, S.; Xu, L.-Q.; “Crowd analysis: a survey”, Machine Vision and Applications, October 2008, Volume 19, Issue 5-6, pp 345-357.

[95] Zhang, Z., & Iria, J. (2009, August). A novel approach to automatic gazetteer generation using wikipedia. In Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources (pp. 1-9). Association for Computational Linguistics.

[96] Zhang, D. & Lu, G. (2004), 'Review of shape representation and description techniques', Pattern Recognition 37(1), 1-19.

[97] Zhou, G., & Su, J. (2002, July). Named entity recognition using an HMM-based chunk tagger. In proceedings of the 40th Annual Meeting on Association for Computational Linguistics (pp. 473-480). Association for Computational Linguistics.

41

Glossary

Abbreviations

CBIR Content-Based Image Retrieval

DT Decision Trees

FEM Feature-Extractor Method

FV Feature Vector

IBL Instance-Based Learning

MLP Multilayer Perceptron

NB Naïve Bayes

ROI Region of Interest

SVM Support Vector Machine

42

deliverable d3.2.1 data analysis method description 1 · 2016-09-22 · deliverable d3.2.1 provides...

Documents