big data

21
Big Data

Upload: rishi-kashyap

Post on 19-Jan-2015

683 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Big data

Big Data

Page 2: Big data

What is Big Data ? “Big Data” is data whose scale, diversity, and

complexity requires new architecture, techniques, algorithms, and analytics to manage it and extract value and hidden knowledge from it

Most analysts and practitioners currently refer to data sets from 30-50 terabytes(1000 gigabytes per terabyte) to multiple petabytes (1000 terabytes per petabyte) as big data.

Page 3: Big data

3

The progress and innovation is no longer hindered by the ability to collect data But, by the ability to manage, analyze, summarize, visualize, and discover

knowledge from the collected data in a timely manner and in a scalable fashion

Who’s Generating Big Data ?

Social media and networks(all of us are generating data)

Scientific instruments(collecting all sorts of data)

Mobile devices (tracking all objects all the time)

Sensor technology and networks(measuring all kinds of data)

Page 4: Big data

Volume: The massive scale and growth of unstructured data outstrips traditional storage and analytical solutions

Velocity: Data is generated in real time, with demands for usable information to be served up immediately

Variety: Data is getting generated in the form of relational data, text data, semi structured data ,Graph data etc.

Big Data: 3V’s

Page 5: Big data

There were 5 billion mobile phones in use in 2010.

There is a 40% projected growth in global data generated per year vs. 5% growth in global IT spending.

There were 235 terabytes of data collected by the US Library of Congress in April 2011.

15 out of 17 major business sectors in the United States have more data stored per company than the US Library of Congress.

The Statistics

Page 6: Big data

The ProblemThe complex nature of big data is primarily driven by the unstructured nature of much of the data that is generated by modern technologies, such as that from web logs, radio frequency Id (RFID), sensors embedded in devices, machinery, vehicles, Internet searches, social networks such as Facebook, portable computers, smart phones and other cell phones, GPS devices, and call center records.

In most cases, in order to effectively utilize big data, it must be combined with structured data (typically from a relational database) from a more conventional business application, such as Enterprise Resource Planning (ERP) or Customer Relationship Management (CRM).

Page 7: Big data

Global market for big dataIndustry Size :

Today every organisation across the globe is faced with an unprecedented growth in data.

The digital universe of data was expected to expand to 2.7 Zetta bytes (ZB) by the end of 2012. Then it is predicted to be double every two years, reaching 8 ZB data by 2015. Its hard to conceptualize this quantity of information.

US library of Congress holds 462 terabytes (TB) of digital data. At this rate 8 ZB is equivalent to almost 18 million libraries of Congress.

That translates to a ten-fold increase over the last five years and an astounding 29-fold increase over the next ten years.

This year, the world’s digital information is expected to grow by 57%. Within that, internet traffic is growing by 35%, and mobile data traffic at 110%, according to Cisco. The big data industry is worth somewhere between $30bn and $200bn.

Page 8: Big data

Smartphones, tablets, sensors, social networks, online games, video streams and mobile payments will all drive big data for many years to come

Growth drivers

Page 9: Big data

Internet companies:

Amazon , Apple, Facebook ,Google, Microsoft

The big Internet companies control where the data comes from and where it goes to .

Amazon, Baidu, Facebook and Google may one day make a lucrative side business from selling their proprietary distributed database technologies, competing with IBM and Oracle

Big Players of Big Data

Page 10: Big data

Data storage, networking and hardware companies:

ARM, BROCADE, CISCO, DELL, EMC, HP, INTEL ,LENOVO, NETAPP, SEAGATE

Many hardware makers like Cisco, Dell, Lenovo and HP are investing heavily in big data appliances

Data storage companies are likely to continue to beat earnings expectations as the data deluge goes into overdrive

Page 11: Big data

Enterprise software companies:

Adobe, Citrix System, IBM, Fujitsu, Informatica, Oracle, Red Hat, SAP, Salesforce.com

Hadoop is fast becoming the industry standard enterprise database platform

Cloud database services are likely to be the fastest growth sector this year within the enterprise software space

Page 12: Big data

A wide variety of techniques and technologies has been developed and adapted to aggregate, manipulate, analyze, and visualize big data.

BIG DATA TECHNIQUES:

A/B testing: A technique in which a control group is compared with a variety of test groups in order to determine what treatments (i.e., changes) will improve a given objective variable, e.g., marketing response rate.

This technique is also known as split testing or bucket testing. An example application is determining what copy text, layouts, images, or colors will improve conversion rates on an e-commerce Web site.

Association rule learning: A set of techniques for discovering interesting

relationships, i.e., “association rules,” among variables in large databases.

These techniques consist of a variety of algorithms to generate and test possible rules.

One application is market basket analysis, in which a retailer can determine which

products are frequently bought together and use this information for marketing (a

commonly cited example is the discovery that many supermarket shoppers who buy diapers also tend to buy beer). Used for data mining.

Big Data techniques & technologies

Page 13: Big data

Cluster analysis: A statistical method for classifying objects that splits a diverse group into smaller groups of similar objects, whose characteristics of similarity are not known in advance.

Crowdsourcing: A technique for collecting data submitted by a large group of people or community through an open call, usually through networked media such as the Web.

Statistics: The science of the collection, organization, and interpretation of data, including the design of surveys and experiments

BIG DATA TECHNOLOGIES There is a growing number of technologies used to

aggregate, manipulate, manage, and analyze big data.

Page 14: Big data

Big Table. Proprietary distributed database system built on the Google File System. Tables are further split into multiple tablets. When size of data grows beyond limits, tablets are compressed by the use of algorithms such as Snappy.

Business intelligence (BI): A type of application software designed to report, analyze, and present data. BI tools are often used to read data that have been previously stored in a data warehouse or data mart. BI tools can also be used to create standard reports that are generated on a periodic basis, or to display information on real-time management dashboards, i.e., integrated displays of metrics

that measure the performance of a system.

Page 15: Big data

Data warehouse: Specialized database optimized for reporting, often used for storing large amounts of structured data. Data is uploaded using ETL (extract, transform, and load) tools from operational data stores, and reports are often generated using business intelligence tools.

Extract, transform, and load (ETL): Software tools used to extract data from outside sources, transform them to fit operational needs, and load them into a database or data warehouse.

Hadoop: An open source (free) software framework for processing huge datasets on certain kinds of problems on a distributed system. Its development was inspired by Google’s MapReduce and Google File System.

Hbase: An open source (free), distributed, non-relational database modeled on Google’s Big Table. It enables fault tolerant way of storing large quantities of data.

Page 16: Big data

Opportunities:

Data intent and capacity• The data revolution• Intent in an age of growing volatility

Social Science and Policy Applications

Challenges:

Data• Privacy• Access and Sharing

Analysis• Defining and Detecting Anomalies in Human Ecosystems

OPPORTUNITIES AND CHALLENGES

Page 17: Big data

BIG DATA : THE NEED OF THE HOUR

• HP’s Big Data strategy and Vertica• CSC Buys Infochimps for Big Data, Analytics Expertise• Market Intelligence Provider FirstRain Unveils New Big Data Tool,

Market Insights

Page 18: Big data

Investment risks:Whilst big data industry revenues are certain to grow, investors face significant risks.

Bandwidth riskToday, internet bandwidth prices are capped, effectively making internet bandwidth a free resource for big data companies. But, withoutsubstantial investment by the world’s mobile operators, big data is likely to grow far faster than the ability of the network to carry it.

As networks get overloaded, network latency rises, reducing the speed and efficiency of analytical engines, especially those powered throughthe cloud. The coming mobile bandwidth shortage will shift competitive advantage from technology companies to telecom operators.

Risks Involved

Page 19: Big data

Open source riskWith the source code free, barriers to entry remain low. In the longer term, this may depress the database industry’s margins.

Patent riskEver since Apple took on the mobile phone industry – and won – with barely a handful of mobile patents to its name, a patent war has erupted across the technology sector. Were a patent war to break out in the big data space, technological progress could be slowed down.

Whilst regulators are unlikely to allow any hoarding of patents on anti-competitive grounds, the risk remains. Oracle, a leader in big data, iswell known for filing multi-billion dollar patent infringement lawsuits against its competitors.

Cyber riskLast month Global Payments, a credit card transaction processor, admitted that hackers had stolen the details of 1.5m North Americancard holders. This is the latest in a string of security breaches that have hit companies dealing in big data. Apple, EMC, Google, Oracle andSony are all recent hacking victims. As the level of cyber-crime rises, so does the risk of dealing with big data. Just as the Fukushima incident dampened prospects for the nuclear sector, so a large cyber-attack could adversely impact big data industry profits.

Page 20: Big data

Often misunderstood and ill-applied

The question is not “how big is your data?”, it is “what are you are doing with your data?”

It fails to supply its customers with products that solve business problems.

Companies searching for data solutions are often confused by all the big data marketing hype and sometimes end up wasting resources.

Big Data- An alternate view

Page 21: Big data

THANK YOU!