action recongnition in video survillance using hipi … › masteradmin › uploadfolder ›...

http://www.iaeme.com/IJMET/index.asp 368 [email protected]

International Journal of Mechanical Engineering and Technology (IJMET) Volume 8, Issue 11, November 2017, pp. 368–375, Article ID: IJMET_08_11_041

Available online at http://www.iaeme.com/IJMET/issues.asp?JType=IJMET&VType=8&IType=11

ISSN Print: 0976-6340 and ISSN Online: 0976-6359

© IAEME Publication Scopus Indexed

ACTION RECONGNITION IN VIDEO

SURVILLANCE USING HIPI AND MAP

REDUCING MODEL

Ushapreethi P

School of Information Technology and Engineering, VIT University, Vellore, India

Balajee Jeyakumar

School of Information Technology and Engineering, VIT University, Vellore, India

BalaKrishnan P

School of Computing Sciences and Engineering, VIT University, Vellore, India

ABSTRACT

Action recognition in videos is possible using edge detection techniques such as

Cany edge detection algorithm and Sobel algorithm. The surveillance videos are the

major complex criterion for analyzing and produce the valuable results such as face

recognition and action recognition. But analysis of such videos is necessary for the

real-time applications like safety and security systems. To increase the scalability of

the edge detection based action recognition algorithm, the features of the video is

grouped together using Hadoop Image Processing Interface (HIPI). MapReduce

algorithm is proposed to parallelize the action recognition algorithm to achieve the

best results in large scale videos in minimum amount of time. The video has to be

converted to image and group of images and assembled as HIPI image bundle for

analyzing the video effectively.

Keywords: Action recognition, edge detection, Hadoop Image processing interface,

MapReduce, Image Bundle

Cite this Article: Ushapreethi P, Balajee Jeyakumar and BalaKrishnan P, Action

Recongnition in Video Survillance Using Hipi and Map Reducing Model,

International Journal of Mechanical Engineering and Technology 8(11), 2017,

pp. 368–375.

http://www.iaeme.com/IJMET/issues.asp?JType=IJMET&VType=8&IType=11

Action Recongnition in Video Survillance Using Hipi and Map Reducing Model


1. INTRODUCTION

In recent years, large amount of surveillance video data has been accumulating. When

processing this mass amount of data standalone computers are facing many bottlenecks such

as computational power, less storage and poor efficiency obviously distributed systems are

used for processing such tasks yes very much effective in recent scenario the Hadoop

mapreduce framework is the suitable platform for distributed processing. Video can be

reduced into multiple numbers of frames for analysis. The group of images can be analyzed

using the Hadoop Image Processing Interface. This interface provides the tools for computing

the basic information of group of images such as average pixel rate, spatial dimensions and

several other image metadata. HIPI framework hides most of the technical details from the

user and it allows the naive user for distributed environment to operate on Hadoop image

processing applications. Most of the cumbersome learning curve is reduced by this HIPI

framework.

Edge detection algorithms are mostly useful for identifying the actions in the images [1].

Canny edge detection algorithm is considered for the edge detection in this work. The

difference between adjacent images is calculated for identifying the action. Action recognition

consists of four phases namely, feature extraction, codebook generation, feature encoding and

optimization techniques [2]. Initially, the video is divided into video shots frames. Each video

shot consists of a single key frame. The local features are extracted from the key frames in the

first phase, and the extracted features are known as the bases or code words. A code book or

dictionary is generated using set of code words in the second phase, namely codebook

generation or dictionary creation. Clustering techniques such as K-Means [3] algorithm is

used in this phase. These dictionaries represent the visual descriptors. Each descriptor is

capable of activating number of code words and generating a coding vector using a coding

technique. This phase is called feature encoding and the length of the coding vector is equal to

the number of code words. Several encoding techniques are used such as vector quantization

[4] soft coding [5], sparse coding [6] and so on. The last phase is optimization or pooling,

creates the compact signature or feature vector for a specific sample given. Max pooling [7, 8,

18] is the common technique usually used; This paper exhibits the work as the following

modules.

• Converting video into frames

• Image Analysis Using HIPI (Mapreduce)

• Edge detection based action recognition

Figure 1 shows the overall idea of the proposed work. The advantage of this work is

parallel processing. The HIPI framework makes the interface very easy and supports

distributed process until step 6. Each module of the work is discussed in the forthcoming

chapters.

Figure 1 Overview of the proposed work

Ushapreethi P, Balajee Jeyakumar and BalaKrishnan P


2. RELATED WORK

With the rapid increase in the usage of social media and surveillance videos the increase in

multimedia data is also growing exponentially and the analysis methods needs are also very

high in the present scenario. Most of the researchers are concentrating on analysis of such

data using various technologies along with distributed processing. White et.al [9] presents a

clustering based classifier using Distributed processing. The work is mainly moves through

pre-processing of image and its exhibits the object recognition as a result. Pereira et.al [10]

discussed the inconveniences of Hadoop processing on image based video analytics and they

suggested some cloud based technologies. Lv et.al [11] describes some naïve classification

algorithms in Hadoop environment and they used the satellite images for color based analysis.

Human activity is categorized based on the levels of complexity. Conceptually the four

levels identified by [12] are gestures, actions, interactions and group activities. A gesture is a

body movement intended to express some meaning. It can be communicated with the hands,

arms or body. Examples of gestures includes head movement for expressing yes and no, eye

movements such as winking, exclaiming, or rolling eyes. Action is a goal-directed motion

sequence such as picking up a stone from the ground, golf swing etc. Actions are lengthy

compared with gestures and actions have clear starting and end points. Both gestures and

actions are expressed by the single subject. Most of the research works are contributed for

gesture recognition and identified the curve fit for the benchmarks. Now the research

community has moved to the next level called action recognition. And several surveys have

been done on gesture recognition [13]. More than one subject is involved in interactions and

group activities and level is also high to achieve. So the point of interest of this paper is on

action recognition. Human activity recognition approaches are hierarchically represented in

figure 2.

Figure 2 Human activity recognition approaches

3. METHODOLOGY

3.1. DIVIDING VIDEOS INTO FRAMES

Hadoop supports several programming languages such as java, python and etc. The basic

Mapreduce programs of this work are written using java language and the Java Virtual

Machine (JVM) supports the execution. Similarly the video frames are converted into frames

(images) using java. The accumulated images are then grouped as image bunldles using HIPI

framework and the acquired edge detection parameters are passed to the edge detection

algorithm.



3.2. CANY EDGE DETECTOR

Canny edge detector is a Gaussian filter based edge detector. The Gaussian filter is used to

remove the noises in the image. The intensity gradients of the images are used in Canny edge

detection. Canny edge detection algorithm is not depend on any context specific meta data, so

that it is very effective among other edge detection methods. And this its detection mechanism

is useful for extracting the structural information of an image, which yields the action

recognition from the keyframes. Fig. 3 shows the result of cany edge detection algorithm

applied to the MRI image (left) and an image with the edges highlighted (right)

Figure 3 Result of Cany edge detection algorithm applied to the MRI image (left) and an image with

the edges highlighted (right)

3.3. HADOOP IMAGE PROCESSING INTERFACE (HIPI)

HIPI is an image processing library designed to be used with the Apache Hadoop MapReduce

parallel programming framework. HIPI facilitates efficient and high-throughput image

processing with MapReduce style parallel programs typically executed on a cluster. It

provides a solution for how to store a large collection of images on the Hadoop Distributed

File System (HDFS) and make them available for efficient distributed processing. HIPI also

provides integration with OpenCV, a popular open-source library that contains many

computer vision algorithms [14, 15].

The HIPI distribution includes several useful tools for creating HIBs, including a

MapReduce program that builds a HIB from a list of images downloaded from the Internet.

The first processing stage of a HIPI program is a culling step that allows filtering the images

in a HIB based on a variety of user-defined conditions like spatial resolution or criteria related

to the image metadata. The records emitted by the Mapper are collected and transmitted to the

Reducer according to the built-in MapReduce shuffle algorithm that attemps to minimize

network traffic. Finally, the user-defined reduce tasks are executed in parallel and their output

is aggregated and written to the HDFS. Fig. 4 shows the conversion of video frames into

image bundle. The mapreduce tasks are clearly mentioned in figure 1.



Figure 4 Conversion of video frames into image bundle.

3.4. Action Recognition Using Hadoop

In conversion of video into frames, jcodec is an open source library for video codecs and

formats are carrying out in java. There are different tools for the digital transcoding of the

video data into frames such as JCodec, Xuggler. Putting frames in HDFS using the command

put is not able to perform [16]. To store the images or frames into the HDFS, convert the

frames in stream of bytes and then store it HDFS. Hadoop make available us the ability to

read / write binary files. So, almost it can be altered into bytes can be stored in HDFS.

After transcoding the images, all are combine into a single large files that can be easily

managed and analyzed. The bunch of images is stored in the HIPI image bundle, each mapper

generates the HIPI bundle and reducer will merge all these bundles into single large bundle.

Mapreduce jobs run on these image bundles for image analysis [17,19,20]. Hadoop



Mapreduce parallel programming framework helps to carry out on large number of images.

HIPI performs well and high-throughput image processing with MapReduce style parallel

programs on a cluster.

4. EXPERIMENTAL SETUP

The experimental setup is done with 4–6 1TB hard disks in a JBOD configuration (1 for the

OS, 2 for the FS image [RAID 1], 1 for Apache Zoo Keeper, and 1 for Journal node),2 quad-

/hex-/octo-core CPUs, running at least 2-2.5GHz, 64-128GB of RAM and Bonded Gigabit

Ethernet connected to the machines. Hadoop Image Processing Interface and gradle (java

compiler) are installed. Image Bundles are created for image processing in hadoop using

HIPI. Mapreduce programming model for image analysis is implemented. Average pixel

values and the edge information of sample images are calculated using the map reduce model.

At the end the actions are recognized based on edge detection.

4.1. Steps for Action Recognition

Figure 6 shows the steps for converting the sample images in fig. 5 to HIPI Image Bundle

(HIB) after installation of gradle (java enabler for Hadoop and HIPI) and Hadoop. The meta

data of the sample images are shown in fig. 7. Fig. 8 shows the edge detection based action

recognition.

Figure 5 Sample Images

Figure 6 Converting images to HIB (HIPI Image Bundle)



Figure 7 The metadata information of the images in the image bundle

Figure 8 Action Recognition by edge detection – sample images

5. CONCLUSIONS

In order to achieve the efficient action recognition for large scale video data, a MapReduce

based parallel algorithm, is proposed. Action recognition using mapreduce model is proposed.

Image analysis is the key concept of action recognition. . The conversion from video to

images is done and the images are analysed and the average pixel values and the edges of the

sample images are identified using HIPI. The other action recognition is identified using the

edge detection. The framework with other image analysis techniques is the future work of this

paper.

REFERENCES

[1] Ushapreethi, P. and Lakshmipriya, G.G., 2017. Survey on Video Big Data: Analysis

Methods and Applications. International Journal of Applied Engineering Research,

12(10), pp.2221-2231.

[2] Mohammad AB, Qigang G, Sergio E, Thomas BM, Huamin R, Elham E, (2017) Locality

regularized group sparse coding for action recognition Computer Vision and Image

Understanding 158: 106–114

[3] Lloyd S.P, 1982. Least squares quantization in pcm. IEEE Trans. Inf. Theory 28 (2), 129-

137.

[4] Sivic, J, Zisserman, A, 2003. Video google: A text retrieval approach to object matching

in videos. In: IEEE International Conference on Computer Vision, pp. 1470–1477.

[5] Liu, L, Wang, L, Liu, X, 2011. In defense of soft-assignment coding. In: IEEE Inter-

national Conference on Computer Vision, pp. 24 86–24 93.

[6] Van Gemert, J.C, Veenman, C.J, Smeulders, A.W, Geusebroek, J.-M. , 2010. Visual word

ambiguity. IEEE Trans. Pattern Anal. Mach. Intell. 32 (7), 1271–1283.



[7] Yang, J, Yu, K, Gong, Y, Huang, T, 2009. Linear spatial pyramid matching using sparse

coding for image classification. In: IEEE Conference on Computer Vision and Pattern

Recognition, pp. 1794–1801.

[8] Yao T., Wang z., Xie z., Gao J., Feng DD., Learning universal multiview dictionary for

human action recognition Pattern Recognition., 64 (2017) 236–244

[9] B. White, T. Yeh, J. Lin, and L. Davis, Web-scale computer vision using map reduce for

multimedia data mining, in Proceedings of the Tenth International Workshop on

Multimedia Data Mining, ser. MDMKDD ’10. New York, NY, USA: ACM, 2010, pp.

9:1–9:10.

[10] R. Pereira, M. Azambuja, K. Breitman, and M. Endler, An architecture for distributed

high performance video processing in the cloud, in Cloud Computing (CLOUD), 2010

IEEE 3rd International Conference on, July 2010, pp. 482–489.

[11] Z. Lv, Y. Hu, H. Zhong, J. Wu, B. Li, and H. Zhao, Parallel k-means clustering of remote

sensing images based on mapreduce, in Proceedings of the 2010 International Conference

on Web Information Systems and Mining, ser. WISM’10. Berlin, Heidelberg: Springer-

Verlag, 2010, pp. 162–170.

[12] Aggarwal JK, Ryoo MS (2011) Human activity analysis: a review. ACM Comput Surv

43:1–43

[13] M. A. Ranzato, F.-J. Huang, Y. Boureau, and Y. LeCun. Unsupervised Learning of

Invariant Feature Hierarchies with Applications to Object Recognition. In CVPR, 2007.

[14] Ding, S., Li, G., Li, Y., Li, X., Zhai, Q., Champion, A. C. & Zheng, Y. F. (2017).

Survsurf: human retrieval on large surveillance video data. Multimedia Tools and

Applications, 76(5), 6521-6549.

[15] Xu, Z., Mei, L., Hu, C., & Liu, Y. (2016). The big data analytics and applications of the

surveillance system using video structured description technology. Cluster Computing, 19(3), 1283-1292.

[16] Verma, B., Zhang, L., & Stockwell, D. (2017). Roadside Video Data Analysis

Framework. In Roadside Video Data Analysis (pp. 13-39). Springer Singapore.

[17] Wang, K., Mi, J., Xu, C., Shu, L., & Deng, D. J. (2016, July). Real-time big data analytics

for multimedia transmission and storage. In Communications in China (ICCC), 2016

IEEE/CIC International Conference on (pp. 1-6). IEEE.

[18] Jeyakumar, B., Durai, M. S., & Lopez, D. (2018). Case Studies in Amalgamation of Deep

Learning and Big Data. In HCI Challenges and Privacy Preservation in Big Data Security

(pp. 159-174). IGI Global.

[19] Srinivasa Raghava S Janarthanan Y, Balajee J.M. (2016). Content based video retrieval

and analysis using image processing: A review. International Journal of Pharmacy and

Technology 8(4) (pp. 5042-5048).

[20] Kamalakannan, S. (2015). G., Balajee, J., Srinivasa Raghavan.,Superior content-based

video retrieval system according to query image”. International Journal of Applied

Engineering Research, 10(3) 7951-7957.

[21] Reena Jangra and Abhishek Bhatnagar. Comparison Analysis of Sensitivity of Noise B/W

Various Edge Detection Technique by Estimating Their PSNR Value. International

Journal of Computer Engineering and Technology, 6(10), 2015, pp. 01-12.

[22] Bhupendra Fataniya, Mekhala Kar, Grishma Joshi, Dr. Tanish Zaveri and Dr. Sanjeev

Acharya. Edge Detection of Microscopic Image, International Journal of Electronics and

Communication Engineering & Technology, 7(3), 2016, pp. 01–10

[23] Ms. Sonali Meghare & Roshani Talmale, Developing and Comparing an Encoding System

Using Vector Quantization & Edge Detection, International Journal of Computer

Engineering & Technology (IJCET), Volume 4, Issue 3, May-June (2013), pp. 503-511

action recongnition in video survillance using hipi … › masteradmin › uploadfolder ›...

Documents