cloud computing & big data

27
M0214 ADVANCED TOPICS OF INFORMATION SYSTEMS Cloud Computing and Big Data Bina Nusantara University Jakarta 2014

Upload: felly-cia

Post on 24-Jan-2016

10 views

Category:

Documents


0 download

DESCRIPTION

PURPOSE of this paper is to give more insight about cloud computing and big data, where later we will discuss about cloud computing services, types, its provider and also gives more explanation about big data and its usage.

TRANSCRIPT

Page 1: Cloud Computing & Big Data

M0214

ADVANCED TOPICS OF INFORMATION SYSTEMS

Cloud Computing and Big Data

Bina Nusantara University

Jakarta

2014

Page 2: Cloud Computing & Big Data

Abstract

PURPOSE of this paper is to give more insight about cloud computing and big data, where

later we will discuss about cloud computing services, types, its provider and also gives more

explanation aout big data and its usage.

METHODOLOGY used in this research are library and internet research. It is conducted by

looking for references from textbooks, journals, articles, and various sources on the internet.

First, we determine keywords related to our research topic. These keywords will help us to

find any textbooks or scientific journals we need easier. Second, we select the information

based on our research objectives. The information should also be analysed since they come

from various sources.

THE EXPECTED OUTCOME is to improve the readers’ understanding about Big Data and

Cloud Computing, its implementation in enterprises, and how it can be used to improve the

company’s operation.

CONCLUSION of this paper is that in cloud computing, the word cloud (also phrased as

"the cloud") is used as a metaphor for "the Internet," so the phrase cloud computing means "a

type of Internet-based computing," where different services such as servers, storage and

applications are delivered to an organization's computers and devices through the Internet.

There are several services of cloud computing such as: SaaS, PaaS, and IaaS. Indonesia also

has provider that gives cloud service like biznet, Lintas Media Danawa, and Telkomsigma.

Cloud computing is still categorized as new, and not many companies use cloud computing

because they still consider about internet speed, vendor dependency. bandwith, and security &

privacy matters. For the big data, it has 3 characteristics such as: volume , velocity , variety

that must be considered in managing it.

Keywords: Cloud Computing, Big Data, Structured Data, Unstructured Data, Repository,

Provider

Page 3: Cloud Computing & Big Data

Chapter 1

INTRODUCTION

1. Background

In last three years, cloud computing has begun its phenomena in the world of

IT business. Survey result from several industry analysts such as IDC, Gartner or

Forester Research always put cloud computing as the most important topics discussed

by IT manager in many companies around the world. Even though it may be still a

controversial subject today but it is going to be looked at as a revolutionary

technology in retrospect. There are several reasons that make cloud computing click

with people even now. The main reason cloud computing is becoming popular with

businesses is because it helps them cut down costs. Operational expenses are reduced

significantly with cloud computing. You need to pay only for what you use. These

keep a tab on your expenses and convert your capex to opex as well. In businesses,

time translates to money. As cloud computing becomes functional faster than other

systems, businesses save time at the time of set-up. It also ensures fast recoveries,

making sure businesses don’t lose time unnecessarily. In fact, the cost of setting up the

cloud system is not too much. You don’t need to get additional hardware or software

for the installation and implementation can be done remotely.

Cloud computing also offers a high level of automation, making life easier for

organization. You don’t need to set up a team to handle system updates and back-ups.

Now, this also helps you release internal resources for other high-priority work. The

cloud also allows you to work from anywhere in the world. Your employees can

access work-related information from anywhere. Cloud computing holds promise for

the future. It may take a little more time to make it more secure and sturdy, but we

believe the technology can reap much more benefits for businesses even today and you

will begin to feel how the beneficial use the cloud.

Beside cloud computing, there is another technology that we will discuss

named “Big Data”. Big Data applies to information that can’t be processed or

analyzed using traditional processes or tools. Increasingly, organizations today are

facing more and more Big Data challenges. They have access to a wealth of

information, but they don’t know how to get value out of it because it is sitting in its

Page 4: Cloud Computing & Big Data

most raw form or in a semistructured or unstructured format; and as a result, they

don’t even know whether it’s worth keeping (or even able to keep it for that matter).

So for that purpose, we bring up Cloud Computing & Big Data for this paper topic.

2. Scope

On this paper we will limit our scope of topic, so it wont be too general. The

scope of the analysis and discussion is about:

Sample of Cloud Computing services :

Cloud computing provider in Indonesia

Fee structure the provider offer to use cloud computing

Big Data Type Source (volume , velocity , variety )

Structured data , unstructured data and semi structure data,

3. Objectives & Benefits

Purpose: of this paper is to give more insight about cloud computing and big

data, where later we will discuss about cloud computing services, types, its

provider and also gives more explanation about big data and its usage.

Benefit:

The benefits to be attained is that writers and reader understand the overview

and gain knowledge about cloud computing and data, and able to give more

understanding about the overview, usage and importance of cloud computing

and big data today.

4. Methodology

Data is collected mainly by literature Study method. Literature study is done

by collecting data and information available in many sources, such as books,

internet, television, and other media that provides suitable information with the

object of research. The materials found and used, will be used as a theoretical

basis for next using.

Page 5: Cloud Computing & Big Data

5. Writing Systematic

Chapter 1: Introduction

This chapter explains the background of the research, scope, the

purpose and benefits, research methodology, and the writing

systematic.

Chapter 2: Literature Review

This chapter explains about theories used in the research and as

the framework in writing and arranging in this research.

Chapter 3: Discussion

This chapter discuss about ICT, the overview, career opportunities,

ICT competition in Indonesia, ICT implementation, and IT

implementation in banking.

Chapter 4: Conclusion and Suggestion

This chapter explains the conclusion that has been done by

completing the research and suggestion that can be done for the

technology.

Page 6: Cloud Computing & Big Data

Chapter 2

LITERATURE REVIEW

2.1 Cloud Computing

In cloud computing, the word cloud (also phrased as "the cloud") is used as a

metaphor for "the Internet," so the phrase cloud computing means "a type of Internet-

based computing," where different services — such as servers, storage and

applications — are delivered to an organization's computers and devices through the

Internet. (Webopedia)

2.2 Big Data

“Big Data” is a bit of a misnomer since it implies that pre-existing data is somehow

small (it isn’t) or that the only challenge is its sheer size (size is one of them, but there

are often more). In short, the term Big Data applies to information that can’t be

processed or analyzed using traditional processes or tools. (Zikopoulos, 2012)

2.3 Software as a Service (SaaS)

Software as a Service (SaaS) is the delivery of computer applications over the Internet.

(Hurwitz, 2013)

2.4 Infrastructure as a Service (IaaS)

Infrastructure as a Service (IaaS) means Infrastructure, including a management

interface and associated software, provided to companies from the cloud as a service.

(Hurwitz, 2013)

2.5 Infrastructure as a Service (IaaS)

Platform as a Service (PaaS) is a cloud service that abstracts the computing services,

including the operating software and the development and deployment and

management life cycle. It sits on top of Infrastructure as a Service. (Hurwitz, 2013)

Page 7: Cloud Computing & Big Data

2.6 Repository

Repository is a database for software and components, with an emphasis on revision

control and configuration management (where they keep the good stuff, in other

words). (Hurwitz, 2013)

2.7 Structured Data

Structured data is a data that has a defined length and format. (Hurwitz, 2013)

2.8 Unstructured Data

Unstructured data is a data that does not follow a specified data format. (Hurwitz,

2013)

2.9 Information and Communication Technology

According to (Wikipedia, Information and communication technology - Wikipedia, the

free encyclopedia, 2014), “Information and communications technology (ICT) is often

used as an extended synonym for information technology (IT), but is a more specific

term that stresses the role of unified communications and the integration of

telecommunications (telephone lines and wireless signals), computers as well as

necessary enterprise software, middleware, storage, and audio-visual systems, which

enable users to access, store, transmit, and manipulate information”

2.6 System

According to (Satzinger, Jackson, & Burd, 2005, p. 4), “System is a collection of

interrelated components that function together to achieve some outcome.”

According to (Wikipedia, System - Wikipedia, the free encyclopedia, 2014), “A

system is a set of interacting or interdependent components forming an integrated

whole or a set of elements and relationships which are different from relationships of

the set or elements to other elements or sets.”

Page 8: Cloud Computing & Big Data

System is a group of interrelated components working together toward a common goal

by accepting inputs and producing outputs in organized transformation

process.(O'Brien, 2004)

2.7 Internet

The Internet is a global system of interconnected computer networks that use the

standard Internet protocol suite (TCP/IP) to link several billion devices worldwide. It

is a network of networks that consists of millions of private, public, academic,

business, and government networks, of local to global scope, that are linked by a broad

array of electronic, wireless, and optical networking technologies. (Wikipedia)

2.8 Provider

According to dictionary, provider is a group or company that provides a specified

service. (Merriam Webster)

Page 9: Cloud Computing & Big Data

Chapter 3

DISCUSSION

3.1 Sample of Cloud Computing Services

a. SaaS (Software as a Service)

SaaS consists of Software as a Service is a software delivery method that provides

access to software and its functions remotely as a Web-based service. Software as a

Service allows organizations to access business functionality at a cost typically less

than paying for licensed applications since SaaS pricing is based on a monthly fee.

Also, because the software is hosted remotely, users don't need to invest in additional

hardware. Software as a Service removes the need for organizations to handle the

installation, set-up and often daily upkeep and maintenance.

Newbie user can use the applications or software anywhere and anytime, depends

on the service provider policy, example fee. This data storage is needed and can be

used anywhere and anytime, as long as there is an internet connection. This SaaS isn’t

only to be the storage, but it also can open some file’s extension without installing the

application on the computer. Sample of SaaS services:

I. Google Drive

This software is made by Google. The excellence of this application is this

application can open about 30 kinds of files in a browser without installing the

application to read the file’s extension. Example: Google drive is able to open

Photoshop, but there’s no Photoshop application installed in the computer. The other

excellence is the OCR ability for pictures file that are uploaded in Google Drive. This

OCR makes the picture can be searched based on words or sentences in it. Since this

Google Drive is the Google’s application, it’s integrated with other Google’s

Application such as Google Docs.

II. Drop Box

Drop box is also one of the SaaS services that is also to store the files with online

based. User can get free 2GB memory to contain files. Like Google Drive, Drop Box

Page 10: Cloud Computing & Big Data

is also can read other file’s extension without installing the application to read the

file’s extension.

III. Social Media

Social media is also one of the SaaS services. In social media, user is also able to

keep or store some files with many extensions. Facebook and twitter are the example

of the social media. User can contain pictures, text, video, and etc. Soundcloud is also

the SaaS. The ability is soundcloud can store song’s file (mp3, WAV, etc).

IV. Apple iCloud

This application is the same as other. The difference is, Apple iCloud is can be

used by Apple user. This is not the open application. The functions are same. It can

contain music, pictures, videos, word, etc.

V. SugarSync

SugarSync is a cloud service that enables active synchronization of files across

computers and other devices for file backup, access, syncing, and sharing from a

variety of operating systems, such as Android, BlackBerry OS, iOS, Mac OS X,

Samsung SmartTV, Symbian, Windows, and Windows Mobile devices. For Linux,

only a discontinued unofficial third-party client is available. The program

automatically refreshes its sync by constantly monitoring changes to files additions,

deletions, edits and syncs these changes with any other linked devices as well as the

SugarSync servers. Originally offering a free 5GB plan and several paid plans, the

company transitioned to a paid-only model on February 8th, 2014.

b. PaaS (Platform as a Service)

Platform as a Service (PaaS) is a way to rent hardware, operating systems, storage

and network capacity over the Internet. The service delivery model allows the

customer to rent virtualized servers and associated services for running existing

applications or developing and testing new ones.

Platform as a Service (PaaS) is an outgrowth of Software as a Service (SaaS), a

software distribution model in which hosted software applications are made available

Page 11: Cloud Computing & Big Data

to customers over the Internet. PaaS has several advantages for developers. With PaaS,

operating system features can be changed and upgraded frequently. Geographically

distributed development teams can work together on software development projects.

Services can be obtained from diverse sources that cross international boundaries.

Initial and ongoing costs can be reduced by the use of infrastructure services from a

single vendor rather than maintaining multiple hardware facilities that often perform

duplicate functions or suffer from incompatibility problems. Overall expenses can also

be minimized by unification of programming development efforts.

On the downside, PaaS involves some risk of "lock-in" if offerings require

proprietary service interfaces or development languages. Another potential pitfall is

that the flexibility of offerings may not meet the needs of some users whose

requirements rapidly evolve.

Here are the examples of PaaS:

I. Apprenda

Apprenda’s Enterprise Platform as a Service (PaaS) delivers significant cost savings &

massive improvements in productivity by freeing app development from internal

infrastructure & IT. In addition to this, the .NET framework and Java support allows

businesses to take their applications with them, with no fear of their web applications

becoming ‘locked-in’ on the platform. Uniquely, Apprenda combines the development

freedom of traditional Software-as-a-Service models with the individual level of

customization expected of a PaaS environment along with a very competitive cost.

Apprenda has been identified as a private cloud leader according to a recent Gartner

report.

II. IBM

Traditionally a powerhouse of computing development, IBM was slow to catch on to

the PaaS service and cloud-computing in general. That said, IBM is offering up a pilot

PaaS service aimed largely at the IT market. While you don’t need to be an

independent software vendor in order to use IBM’s new PaaS service, the company

has structured their existing cloud-services footprint to serve the software vendor

sector best. Partnering with IBM could prove risky for some clients however, as

discovered by Chase Bank. Their architecture is proprietary, meaning that leaving

IBM can render your web-applications useless, and their service is still new. Without

Page 12: Cloud Computing & Big Data

much support or confirmation as to the direction IBM will take their platform in, it’s

anyone’s guess how IBM will handle the service needs of clients in the next few years.

III. VCE’S VBLOCK

Another proprietary service provider, VCE is proud of the flexibility and usefulness of

their Vblock platform to the modern business. VCE uses industry-standard pricing

models to help business cope with ‘predictable costs’ for deployment and

development. The user-friendliness of the platform has sometimes been questioned

however, and VCE’s business model incorporates revenue streams from advisory and

implementation services while other companies provide these services as part of their

flat-cost PaaS package.

IV. Openshift

A new offering by Red Hat Inc., OpenShift is a PaaS marketed towards clients who

wish to use open source technologies. The platform itself is compatible with Ruby,

Python, Java and Perl and offers a variety of open source frameworks for customers.

As with all Linux-centered technologies however, OpenShift suffers from an

underwhelming support base and far-reaching inaccessibility problems. Those

developers who aren’t intimately familiar with the extant Linux environment might

find the idea of cloud-computing through a command line intimidating, if not

completely alien.

V. Google App Engine

Google App Engine (often referred to as GAE or simply App Engine) is a platform as

a service (PaaS) cloud computing platform for developing and hosting web

applications in Google-managed data centers. Applications are sandboxed and run

across multiple servers.[1] App Engine offers automatic scaling for web applications

as the number of requests increases for an application, App Engine automatically

allocates more resources for the web application to handle the additional demand.

Google App Engine is free up to a certain level of consumed resources. Fees are

charged for additional storage, bandwidth, or instance hours required by the

application. It was first released as a preview version in April 2008, and came out of

preview in September 2011.

Page 13: Cloud Computing & Big Data

c. IaaS (Infrastructure as a Service)

IaaS (Infrastructure as a Service) is the virtual delivery of computing resources in

the form of hardware, networking, and storage services. It may also include the

delivery of operating systems and virtualization technology to manage the resources.

Rather than buying and installing the required resources in their own data center,

companies rent these resources as needed.

Many companies with a hybrid environment are likely to include IaaS in some

form because IaaS is a highly practical solution for companies with various IT

resource challenges. Whether a company needs additional resources to support a

temporary development project, an on-going dedicated development testing

environment, or disaster recovery, paying for infrastructure services on a per-use basis

can be highly cost-effective.

Compared to SaaS and PaaS, IaaS users are responsible for managing more:

applications, data, runtime, middleware, and O/S. Vendors still manage virtualization,

servers, hard drives, storage, and networking. What users gain with IaaS is

infrastructure on top of which they can install any required platforms. Users are

responsible for updating these if new versions are released. Here are the samples of

IaaS Services:

I. Amazon Elastic Compute Cloud (EC2)

EC2 is a central part of Amazon.com's cloud computing platform, Amazon Web

Services (AWS). EC2 allows users to rent virtual computers on which to run their own

computer applications. EC2 allows scalable deployment of applications by providing a

Web service through which a user can boot an Amazon Machine Image to create a

virtual machine, which Amazon calls an "instance", containing any software desired.

A user can create, launch, and terminate server instances as needed, paying by the

hour for active servers, hence the term "elastic". EC2 provides users with control over

the geographical location of instances that allows for latency optimization and high

levels of redundancy.

II. Rackspace

Rackspace Inc. is an IT hosting company based in Windcrest, Texas, USA, a suburb of

San Antonio, Texas. The company also has offices in Australia, the United Kingdom,

Switzerland, Israel, The Netherlands, India, and Hong Kong, and data centers

operating in Texas, Illinois, Virginia, the United Kingdom, Australia, and Hong Kong.

The company's email and apps division operates from Blacksburg, VA; other offices

Page 14: Cloud Computing & Big Data

are located in Austin, Texas and San Francisco, California. Rackspace has two main

service-level segments: Managed and Intensive. Both service levels receive support

via e-mail, telephone, live chat, and ticket systems, but they are designed to fit the

needs of different businesses.

The Managed support level consists of "on-demand" support where proactive services

are provided, but the customer can contact Rackspace when they need additional

assistance. The Intensive support level consists of "proactive" support where many

proactive services are provided, and customers receive additional consultations about

their server configuration. Highly customized implementations generally fall under

this level of support. Some services and products are only available for certain support

levels

III. Green House Data

Green House Data is a data center services provider headquartered in Cheyenne,

Wyoming. Cheyenne is home to a data center, administrative offices, and technical

support. The company also has locations in Oregon, New Jersey, and sales and

marketing offices in Laramie and in Denver, Colorado.

As a whole, the data center industry has been highly criticized for heavy electrical use,

and in recent years has actively tried to reduce power consumption by improving

facility design and increasing server virtualization. As a key element of their business

model, Green House Data purchases renewable energy credits, or RECs, for wind

power and documents purchases with the EPA's Green Power Partnership. In 2013,

Green House Data was part of EPA's "Leadership Club" for sustainable power

purchases. A common measure for data center power consumption is Power usage

effectiveness, often abbreviated PUE.

3.2 Cloud Computing Provider in Indonesia

There are several cloud computing providers in Indonesia, some of them are:

Lintas Media Danawa

PT Aplikanusa Lintasarta (Lintasarta) and its subsidiary, PT. Cross Media Danawa

(LMD) did a collaboration in offering a complete cloud computing solution. Lintasarta

currently successfully markets its Infrastructure as a Service (IaaS) solutions to

various industrial sectors and LMD also enjoy the benefits of a solution Software as a

Service (SaaS) is offered.

Page 15: Cloud Computing & Big Data

Telkomsigma

Established in 1987, PT Sigma Cipta Caraka (telkomsigma) is a leading integrated

End-to-End ICT Solutions company for more than 26 years in Indonesia.

Telkomsigma offers comprehensive information technology services comprising of

consulting services, managing IT services, software development services, and

integrated data center operations in the banking (conventional and sharia-based),

financial, telecommunications, manufacturing, distribution and other sectors. Their

solutions portfolio comprises of excellence: Managed Services (International certified

Data Center, Cloud Computing, E-Transaction, Telco Managed Services, and

Edutainment Media and Communication Services), Financial & Banking Software

Development Services, Consulting and System Integrator.

Biznet

Biznet Networks established in 2000 as an Internet Service Provider that provides

Internet needs for business customers. In 2000, Biznet using Wireless and In-Building

Ethernet technology. Owing to the support of the best technical team and a full

commitment, Biznet Networks is leading the way to becoming one of the leading

Network Service Provider in Indonesia

3.3 Fee Structure The Provider Offer to Use Cloud Computing

Lintas Media Danawa

Can be checked at: http://www.lintasmediadanawa.com/cloud-infrastructure-service/cozy-on-demand-cloud-pricing

Usage Price Note

Computing (RAM) IDR 1.000 per hour per 1 GB

Storage IDR 2.000 per month per 1 GB

Public IP IDR 100.000 per month per unit

Biznet

Can be checked at: http://www.biznetnetworks.com/id/enterprise/cloud-computing-enterprise/

Page 16: Cloud Computing & Big Data

Feature Cloud Computing Enterprise

Application All applications that are based on Windows & Linux, large

scale critical

Virtualization Technology VMware ESXi

Operating System Windows & Linux

Flexibility All applications that each system to get the Virtual Data

Center (VDC) so that the system can be partitioned into

multiple servers according to need

Support 24x7x365

Contract duration Minimum 6 months

Monthly Fee Start from IDR. 2,250,000 per month

Biznet Cloud Server Enterprise

Service Monthly Fee

(IDR)

Setup Fee

(IDR)

Cloud Server Enterprise 1 Core, 1 GB RAM, 100

GB SAN Storage

2,250,000 2,000,000

Cloud Server Enterprise 2 Core, 2 GB RAM, 100

GB SAN Storage

3,000,000 2,000,000

Cloud Server Enterprise 4 Core, 4 GB RAM, 100

GB SAN Storage

4,000,000 2,000,000

Cloud Server Enterprise 8 Core, 8 GB RAM, 100

GB SAN Storage

5,750,000 2,000,000

Cloud Server Enterprise 8 Core, 16 GB RAM, 100

GB SAN Storage

9,000,000 2,000,000

Cloud Server Enterprise 8 Core, 32 GB RAM, 100

GB SAN Storage

14,500,000 2,000,000

Biznet Cloud Storage Enterprise

Service Monthly Fee

(IDR)

Setup Fee

(IDR)

Cloud Storage Enterprise 1 TB 3,000,000 2,000,000

Cloud Storage Enterprise 5 TB 12,500,000 2,000,000

Cloud Storage Enterprise 10 TB 22,500,000 2,000,000

Cloud Storage Enterprise 25 TB 50,000,000 2,000,000

Cloud Storage Enterprise 50 TB 75,000,000 2,000,000

Cloud Storage Enterprise 100 TB 125,000,000 2,000,000

3.4 Big Data

There are three characteristics define by big data: volume, velocity and variety.

Page 17: Cloud Computing & Big Data

VOLUME

The sheer volume of data being stored today is exploding. In the year 2000, 800,000 petabytes

(PB) of data were stored in the world. Of course, a lot of the data that’s being created today

isn’t analyzed at all and that’s another problem IBM is trying to address with BigInsights.

IBM expect this number to reach 35 zettabytes (ZB) by 2020. Twitter alone generates more

than 7 terabytes (TB) of data every day, Facebook 10 TB, and some enterprises generate

terabytes of data every hour of every day of the year. It’s no longer unheard of for individual

enterprises to have storage clusters holding petabytes of data.

Figure:

Big data is characterized by its volume,

velocity and variety – or simply V3.

VARIETY

The volume associated with the Big Data phenomena brings along new challenges for data

centers trying to deal with it: its variety. With the explosion of sensors, and smart devices, as

well as social collaboration technologies, data in an enterprise has become complex, because

it includes not only traditional relational data, but also raw, semistructured, and unstructured

data from web pages, web log files (including click-stream data), search indexes, social media

forums, e-mail, documents, sensor data from active and passive systems, and so on. What’s

more, traditional systems can struggle to store and perform the required analytics to gain

understanding from the contents of these logs because much of the information being

generated doesn’t lend itself to traditional database technologies. In our experience, although

some companies are moving down the path, by and large, most are just beginning to

understand the opportunities of Big Data (and what’s at stake if it’s not considered).

Variety represents all types of data—a fundamental shift in analysis requirements from

traditional structured data to include raw, semistructured, and unstructured data as part of the

decision-making and insight process. Traditional analytic platforms can’t handle variety.

Page 18: Cloud Computing & Big Data

However, an organization’s success will rely on its ability to draw insights from the various

kinds of data available to it, which includes both traditional and nontraditional.

To capitalize on the Big Data opportunity, enterprises must be able to analyze all types of

data, both relational and nonrelational: text, sensor data, audio, video, transactional, and more

Structured Data

The term structured data generally refers to data that has a defined length and format.

Examples of structured data include numbers, dates, and groups of words and numbers called

strings (for example, a customer’s name, address, and so on).

Most experts agree that this kind of data accounts for about 20 percent of the data that is out

there. Structured data is the data that you’re probably used to dealing with. It’s usually stored

in a database. You can query it using a language like structured query language (SQL).

Structured data is taking on a new role in the world of big data. The evolution of technology

provides newer sources of structured data being produced — often in real time and in large

volumes. The sources of data are divided into two categories:

✓ Computer- or machine-generated: Machine-generated data generally refers to data that

is created by a machine without human intervention.

✓ Human-generated: This is data that humans, in interaction with computers, supply.

MACHINE-GENERATED STRUCTURED DATA can include the following:

✓ Sensor data:

Examples include radio frequency ID (RFID) tags, smartmeters, medical devices, and Global

Positioning System (GPS) data. Forexample, RFID is rapidly becoming a popular technology.

It uses tiny computer chips to track items at a distance. An example of this is tracking

containers of produce from one location to another. When information is transmitted from the

receiver, it can go into a server and then be analyzed. Companies are interested in this for

supply chain management and inventory control. Another example of sensor data is

smartphones that contain sensors like GPS that can be used to understand customer behavior

in new ways.

Page 19: Cloud Computing & Big Data

✓ Web log data:

When servers, applications, networks, and so on operate, they capture all kinds of data about

their activity. This can amount to huge volumes of data that can be useful, for example, to

deal with service-level agreements or to predict security breaches.

✓ Point-of-sale data:

When the cashier swipes the bar code of any product that you are purchasing, all that data

associated with the product is generated. Just think of all the products across all the people

who purchase them, and you can understand how big this data set can be.

✓ Financial data:

Lots of financial systems are now programmatic; they are operated based on predefined rules

that automate processes. Stocktrading data is a good example of this. It contains structured

data such as the company symbol and dollar value. Some of this data is machine generated,

and some is human generated.

STRUCTURED HUMAN-GENERATED DATA might include the following:

✓ Input data:

This is any piece of data that a human might input into a computer, such as name, age,

income, non-free-form survey responses, and so on. This data can be useful to understand

basic customer behavior.

✓ Click-stream data:

Data is generated every time you click a link on a website. This data can be analyzed to

determine customer behavior and buying patterns.

✓ Gaming-related data:

Every move you make in a game can be recorded. This can be useful in understanding how

end users move through a gaming portfolio.

Unstructured Data

Unstructured data is data that does not follow a specified format. Unstructured data is

everywhere. In fact, most individuals and organizations conduct their lives around

unstructured data. Just as with structured data, unstructured data is either machine generated

or human generated.

Page 20: Cloud Computing & Big Data

MACHINE-GENERATED UNSTRUCTURED DATA examples:

✓ Satellite images: This includes weather data or the data that the government captures in its

satellite surveillance imagery. Just think about Google Earth, and you get the picture.

✓ Scientific data: This includes seismic imagery, atmospheric data, and high energy physics.

✓ Photographs and video: This includes security, surveillance, and traffic video.

✓ Radar or sonar data: This includes vehicular, meteorological, and oceanographic seismic

profiles.

HUMAN-GENERATED UNSTRUCTURED DATA examples:

✓ Text internal to your company:

Think of all the text within documents, logs, survey results, and e-mails. Enterprise

information actually represents a large percent of the text information in the world today.

✓ Social media data:

This data is generated from the social media platforms such as YouTube, Facebook, Twitter,

LinkedIn, and Flickr.

✓ Mobile data: This includes data such as text messages and location information.

✓ Website content:

This comes from any site delivering unstructured content, like YouTube, Flickr, or Instagram.

Semi-structured data

Semi-structured data is a kind of data that falls between structured and unstructured data.

Semi-structured data does not necessarily conform to a fixed schema (that is, structure) but

may be self-describing and may have simple label/value pairs.

Examples of semistructured data include EDI, SWIFT, and XML.

It can be explained more from this figure below:

Page 21: Cloud Computing & Big Data

VELOCITY

A conventional understanding of velocity typically considers how quickly the data is

arriving and stored, and its associated rates of retrieval. While managing all of that quickly is

good—and the volumes of data that we are looking at are a consequence of how quick the

data arrives—we believe the idea of velocity is actually something far more compelling than

these conventional definitions.

To accommodate velocity, a new way of thinking about a problem must start at the

inception point of the data. Rather than confining the idea of velocity to the growth rates

associated with your data repositories, we suggest you apply this definition to data in motion:

The speed at which the data is flowing. After all, we’re in agreement that today’s enterprises

are dealing with petabytes of data instead of terabytes, and the increase in RFID sensors and

other information streams has led to a constant flow of data at a pace that has made it

impossible for traditional systems to handle.

3.5 How to Use Big Data to Give The Benefit for Company

There are many things that big data can be implemented in a company. For instance, it

can be used for logging, detecting fraud pattern, analyzing social media pattern, and many

Page 22: Cloud Computing & Big Data

more. In this part of discussion, we are going to break down some of it in order to fully

understand why big data is important and how it will help company to grow and advance.

IT for IT Log Analytics

This is one of the common uses for an inaugural Big Data project. All of those logs

and trace data generated by the operation of common IT solution implemented in a

company is considered as data exhaust.

Enterprises has a lot of data exhaust and can be pretty much pollutant if left around for

a couple hours or days if there is any case when its needed, and usually those data is

purged when this kind of event occurs. The problem is these data might have a

concentrated value, and IT shops need to figure a way to store and extract value from

it. IT nowadays have to be able to store logs and efficiently store them so these logs

need to be kept for emergencies and discarded as soon as possible. It is also can be

used for looking rare problems.

Nowadays log histories are retained, but usually, only for several days or weeks,

because there are too much data for conventional systems to store and making it

impossible to determine trends and issues within a span of a limited time period.

The nature of these logs is semi structured and raw making it not always suited for

traditional database processing. Log formats are constantly changing due to software

and hardware upgrades, so they can’t be tied to strict inflexible analysis paradigm.

Enterprises are trying to get better insight into how their systems are running and

when how things break down. IBM helped them leverage a Big Data platform that is

able to analyze approximately 1TB of log data each day. They are now able to

decipher what is happening across the entire stack with each and every transaction.

They are able to start to develop a base of knowledge from it to anticipate and

understand the interaction between failures, able to generate best-practice remediation

steps in the event of specific problem, or even retune the infrastructure to eliminate

them.

Fraud Detection Pattern

Pretty much anywhere some sort of financial transaction is involved presents a

potential for misuse and the ubiquitous specter of fraud. By leveraging Big Data

platform, enterprise has the opportunity to identify, or even stop it from happening.

Page 23: Cloud Computing & Big Data

Figure 3-1 Modern-day fraud detection ecosystem synergizes a Big Data platform with

traditional processes.

A modern-day fraud detection ecosystem provides a low-cost Big Data platform form

exploratory modelling and discovery. This data can be leveraged by traditional

systems either directly or through integration into existing data quality and governance

protocols. The addition of InfoSphere Streams also provides the ecosystem analytics

for data-in-motion and data-at-rest.

In the implementation in an enterprise, it’s discovered that they could not only

improve just how quickly they were able to speed up the build and refresh their fraud

detection model, but it also provides broader and more accurate insight. A process that

once took about three weeks from the transaction hit the transaction switch until

occurs potential fraud and turned the latency into just a couple hours. The fraud

detection models built also broader by roughly 50 percent than the previous set of

data.

Social Media Pattern

We can use Big Data to figure out customers rating about enterprises, and this

can be used to figure out the impacts of the decisions made by the executives in the

enterprise and the way they engage their customers. Specifically, we can determine

how sentiment is impacting sales, effectiveness or receptiveness of marketing

Page 24: Cloud Computing & Big Data

campaigns, accuracy of marketing mix, and many more. Those data can be processed

and give a basic insight of people opinion and their sentiment.

But in the end, the more important question is why people says what they say

and why are they behaving in such way. To answer it requires enriching the social

media feeds with additional and differently shaped information that’s likely residing in

other enterprise systems. In order to do that enterprise has to look beyond that;

enterprise has to look at the interaction of what people are doing with their behaviour,

financial trends, actual transactions, and so on.

3.6 Why Many Indonesia Companies don’t Use Cloud Computing?

Cloud Computing is a new thing in Indonesia and maybe some people still

don’t know about Cloud Computing. Because this is still new, there are some issues in

introducing this Cloud Computing to Indonesia people. Example: Cloud computing is

a data storage by internet network, so internet is an important thing in this Cloud

Computing. If there’s a problem with internet connection, it will make the computer

becomes slower because the long process.

The other issue is if a company uses cloud computing as data storage, so the

company will depend with the vendor (provider of cloud computing service) because

the company doesn’t have enough direct server in cloud computing, and also if the

vendor has a bad backup service or broken server, it will cause loss for the company.

If a company plans to use cloud computing, big bandwidth must be provided as

another main thing. Big bandwidth will support storing the data that is transferred.

Security and privacy issues become the other new issues because when the company

use internet, so it can be seen by other people (it can be from out of company), and if

the management is bad, fatal error will be the worst thing. Lack of supporting from

other departments to use cloud computing service is also the reason why Indonesia’s

company doesn’t use cloud computing.

Besides that, there are crackers or hackers that can access the data without

permission and get the important things from it. So vendor of cloud computing still

works to manage the source that is used in cloud computing service.

Page 25: Cloud Computing & Big Data

3.7 Measuring the Value of Investment of Big Data

For an enterprise that has implemented Big Data, it’s most likely to determine the

value of the investment is by seeing the impact of the implementation. Big Data will improve

efficiency in analyzing various data at a time. But it is capable far more than that. If an

enterprise able to analyze every detail data they have got, they could win the market because

they are able to determine and direct the sentiment of the market for their benefits.

Page 26: Cloud Computing & Big Data

Chapter 4

CONCLUSION AND SUGGESTION

4.1 CONCLUSION

In cloud computing, the word cloud (also phrased as "the cloud") is used as a

metaphor for "the Internet," so the phrase cloud computing means "a type of Internet-

based computing," where different services such as servers, storage and applications are

delivered to an organization's computers and devices through the Internet. There are

several services of cloud computing such as: SaaS, PaaS, and IaaS. Indonesia also has

provider that gives cloud service like biznet, Lintas Media Danawa, and Telkomsigma.

Cloud computing is still categorized as new, and not many companies use cloud

computing because they still consider about internet speed, vendor dependency.

bandwith, and security & privacy matters. For the big data, it has 3 characteristics such

as: volume , velocity , variety that must be considered in managing it.

4.2 SUGGESTION

In using cloud, companies still consider about internet speed, vendor dependency.

bandwith, and security & privacy matters. So what we would recommend is that the

companies buys their own private cloud network to prevent security&privacy issues.

The bandwith should be adjusted with the estimated of bandwith that will be used. The

bigger the company, the more bandwith and storage also privacy needed.

Also for the next issue: As we know, organizations today are facing more and

more Big Data challenges. They have access to a wealth of information, but they don’t

know how to get value out of it because it is sitting in its most raw form or in a

semistructured or unstructured format; and as a result, they don’t even know whether

it’s worth keeping (or even able to keep it for that matter). Actually by using big data,

company can do IT log analytics. We think by using the big data would make good

investment for the company itself.

Page 27: Cloud Computing & Big Data

REFERENCES

Books:

Hurwitz, J. S. (2013). Big Data For Dummies®. New Jersey: John Wiley & Sons, Inc.

O'Brien, J. A. (2004). In Management Information Systems : Managing Information

Technology in the Business Enterprise (Vol. 6th). New York: McGraw-Hill.

Satzinger, J. W., Jackson, R. B., & Burd, S. D. (2005). Object-Oriented Analysis and Design

with the Unified Process. Boston: Course Technology, Cengage Learning.

Zikopoulos, P. C. (2012). Understanding Big Data: Analytics for Enterprise Class Hadoop

and Streaming Data. United States: McGraw-Hill.

Websites:

Investopedia. (t.thn.). Bank Definition | Investopedia. Diambil kembali dari Investopedia:

http://www.investopedia.com/terms/b/bank.asp

Merriam Webster. (t.thn.). Diambil kembali dari http://www.merriam-

webster.com/dictionary/provider

Webopedia. (n.d.). Retrieved from

http://www.webopedia.com/TERM/C/cloud_computing.html

Wikipedia. (t.thn.). Diambil kembali dari http://en.wikipedia.org/wiki/Internet

Wikipedia. (2014, 4 18). Business - Wikipedia, the free encyclopedia. Diambil kembali dari

Wikipedia: http://en.wikipedia.org/wiki/Business

Wikipedia. (2014, 4 3). Company - Wikipedia, the free encyclopedia. Diambil kembali dari

Wikipedia: http://en.wikipedia.org/wiki/Company

Wikipedia. (2014, May 21). Information and communication technology - Wikipedia, the free

encyclopedia. Diambil kembali dari Wikipedia, the free encyclopedia:

en.wikipedia.org/wiki/Information_and_communications_technology

Wikipedia. (2014, 4 19). System - Wikipedia, the free encyclopedia. Diambil kembali dari

Wikipedia: http://en.wikipedia.org/wiki/System