big data (security issue)
TRANSCRIPT
8/29/2015
SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING
SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTINGSeminar Advance Topics One
Submitted By Md.Mehedi Hassan
1/26
SupervisorSajjad WaheedAssociate ProfessorDept. of ICT,MBSTU
SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015
Outline Introduction Big Data Why Big Data Cloud Computing How Big Data is Related with Cloud Computing Why Choose Big Data as a Thesis Topic Introduction to Hadoop
MapReduce Hadoop Distributed File System(HDFS)
Application Advantages of Big Data Alternative of Big Data Security Issue of Big Data Motivation and Related Work Issues and Challenges The Proposed Approaches Conclusions
2/26
SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015
Introduction
To analyze complex data and to identify patterns it is very important to securely store, manage and share large amounts of complex data (big data).
Big data applications are a great benefit to organizations, business, companies and many large scale and small scale industries.
Cloud resources are needed to support big data storage and projects, and big data is a huge business case for moving to cloud
The main focus is on security issues in cloud computing that are associated with big data.
3/26
SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015
Big Data
Big Data is the word used to describe massive volumes of structured and unstructured data that are so large that it is very difficult to process this data using traditional databases and software technologies.
Big Data Source :
4/26
SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015
Big Data Volume Many factors contribute towards increasing Volume
storing transaction, live streaming and data collected from sensors etc
Variety Structured: Relational data. Semi Structured: XML data.
Unstructured: Word, PDF, Text, Media Logs
Velocity Big Data Velocity deals with the
pace at which data flows in from sources and human interaction
5/26
SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015
Why Big Data
Speed, Capacity and Scalability of Cloud Storage End Users Can Visualize Data Manage Data Better Company Can Find New Business Opportunities Data Analysis Methods, Capabilities will Evolve
6/26
SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015
Cloud Computing
Cloud Computing is a technology which depends on sharing of computing resources than having local servers or personal devices to handle the applications.
In Cloud Computing, the word “Cloud” means “The Internet”, so Cloud Computing means a type of computing in which services are delivered through the Internet.
7/26
SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015
How Big Data is Related with Cloud Computing
Cloud computing is a powerful technology to perform massive-scale and complex computing.
It eliminates the need to maintain expensive computing hardware, dedicated space, and software
Big Data need large on-demand compute power and distributed storage to crunch the 3V data problem and Cloud seamlessly provides this elastic on-demand
8/26
SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015
Why Choose Big Data as a Thesis Topic
As a software developer I have handle large volume of data for banking transaction.
Already observed for time consume to execute data for a particular select statement or analytical SQL
System is very slow when all branch are parallel processing. This problem over come using Big Data concept Already use Facebook,Goole,IBM etc. Open source (Hadoop) In this case I choose Big Data Topic
9/26
SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015
Introduction to Hadoop
10/26
Hadoop : Apache open source framework written in java that allows distributed processing of large datasets across clusters of computers using simple programming models
Doug Cutting son’s toy Hadoop Architecture Two major layers
Processing layer :MapReduce
Storage layer :Hadoop Distributed File System
MapReduce(Distributed Computation)
HDFS(Distributed Storage)
YARN Framework Common Utilities
SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015
Introduction to Hadoop (cont.)
How Hadoop works Core tasks across a cluster of computers Data dividing into directories and files Files are then distributed across various cluster nodes HDFS, supervises the processing. Blocks are replicated. Performing sort that takes place between the map and reduce stages. Sending the sorted data to a certain computer.
Advantages Low-cost alternative to build bigger servers Fault-tolerance and high availability. Dynamic clustering Automatic data distribution and open source
11/26
SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015
MapReduce What is MapReduce : A processing technique and a program model for
distributed computing based on java. Mapper Shuffle Reducer Java based Key Value
12/26
SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015
MapReduce (cont.)
13/26
SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015
MapReduce Example
14/26
SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015
Hadoop Distributed File System(HDFS)
The HDFS is a distributed, scalable, and portable file-system written in Java for the Hadoop framework
Features Distributed storage and processing Name Node Data Node Interface in Hadoop Streaming access Cluster status check
15/26
SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015
Hadoop Distributed File System(cont.)
16/26
Name Node Meta data(Name, replica…)/home/foo/data, 3…
Client
BlocksReplication
Write
Meta data Ops
Rea
d
Block Ops
D a t a n o d e s D a t a n o d e s
Rack 1 Rack 2
SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015
Application
17/26
Homeland Security
Smarter Healthcar
eMulti-
channel sales
Telecom
Manufacturing
Traffic Control
Trading Analytics
Search Quality
SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015
Advantages of Big Data
Cost reduction Faster, better decision making New products and services Perform risk analysis
18/26
SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015
Alternative of Big Data
Apache Spark (Less security than Hadoop) Cluster Map Reduce(Slow and less security than Hadoop)
19/26
SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015
Issue and Challenge Network level
Distributed Nodes Distributed Data Internodes Communication
Authentication level Data Protection Administrative Rights for Nodes Authentication of Applications and Nodes Logging
Data level Confidentiality Integrity Availability
Generic types Traditional Security Tools Use of Different Technologies
20/26
SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015
The Proposed Approaches
File Encryption Network Encryption Logging Software Format and Node Maintenance Nodes Authentication Rigorous System Testing of Map Reduce Jobs Honeypot Nodes Layered Framework for Assuring Cloud Third Party Secure Data Publication to Cloud Access Control
21/26
SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015
Conclusions I have highlighted the main advantages and application of Big data with
cloud computing . Summarized security issues associated with big data in cloud computing . Propose cloud environments can be secured for complex business
operations. Propose approaches for Big Data security
22/26
SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015
Future Works
To Implement data chaptering algorithm with data security Data flow Hadoop to Cloud with confidential security
23/26
SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015
Q & A24/26
SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015 25/26
SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING8/29/2015
References Ren, Yulong, and Wen Tang. "A SERVICE INTEGRITY ASSURANCE
FRAMEWORK FOR CLOUD COMPUTING BASED ON MAPREDUCE."Proceedings of IEEE CCIS2012. Hangzhou: 2012, pp 240 –244, Oct. 30 2012-Nov. 1 2012
Hao, Chen, and Ying Qiao. "Research of Cloud Computing based on the Hadoop platform."Chengdu, China: 2011, pp. 181 – 184, 21-23 Oct 2011.
N, Gonzalez, Miers C, Redigolo F, Carvalho T, Simplicio M, de Sousa G.T, and Pourzandi M. "A Quantitative Analysis of Current Security Concerns and Solutions for Cloud Computing.". Athens:2011., pp 231 – 238, Nov. 29 2011- Dec. 1 2011
Hao, Chen, and Ying Qiao. "Research of Cloud Computing based on the Hadoop platform.".Chengdu, China: 2011, pp. 181 – 184, 21-23 Oct 2011.
26/26