big data
DESCRIPTION
TRANSCRIPT
![Page 1: Big data](https://reader035.vdocuments.site/reader035/viewer/2022070304/54bca7104a7959c04b8b457e/html5/thumbnails/1.jpg)
Big Data
![Page 2: Big data](https://reader035.vdocuments.site/reader035/viewer/2022070304/54bca7104a7959c04b8b457e/html5/thumbnails/2.jpg)
What is Big Data ? “Big Data” is data whose scale, diversity, and
complexity requires new architecture, techniques, algorithms, and analytics to manage it and extract value and hidden knowledge from it
Most analysts and practitioners currently refer to data sets from 30-50 terabytes(1000 gigabytes per terabyte) to multiple petabytes (1000 terabytes per petabyte) as big data.
![Page 3: Big data](https://reader035.vdocuments.site/reader035/viewer/2022070304/54bca7104a7959c04b8b457e/html5/thumbnails/3.jpg)
3
The progress and innovation is no longer hindered by the ability to collect data But, by the ability to manage, analyze, summarize, visualize, and discover
knowledge from the collected data in a timely manner and in a scalable fashion
Who’s Generating Big Data ?
Social media and networks(all of us are generating data)
Scientific instruments(collecting all sorts of data)
Mobile devices (tracking all objects all the time)
Sensor technology and networks(measuring all kinds of data)
![Page 4: Big data](https://reader035.vdocuments.site/reader035/viewer/2022070304/54bca7104a7959c04b8b457e/html5/thumbnails/4.jpg)
Volume: The massive scale and growth of unstructured data outstrips traditional storage and analytical solutions
Velocity: Data is generated in real time, with demands for usable information to be served up immediately
Variety: Data is getting generated in the form of relational data, text data, semi structured data ,Graph data etc.
Big Data: 3V’s
![Page 5: Big data](https://reader035.vdocuments.site/reader035/viewer/2022070304/54bca7104a7959c04b8b457e/html5/thumbnails/5.jpg)
There were 5 billion mobile phones in use in 2010.
There is a 40% projected growth in global data generated per year vs. 5% growth in global IT spending.
There were 235 terabytes of data collected by the US Library of Congress in April 2011.
15 out of 17 major business sectors in the United States have more data stored per company than the US Library of Congress.
The Statistics
![Page 6: Big data](https://reader035.vdocuments.site/reader035/viewer/2022070304/54bca7104a7959c04b8b457e/html5/thumbnails/6.jpg)
The ProblemThe complex nature of big data is primarily driven by the unstructured nature of much of the data that is generated by modern technologies, such as that from web logs, radio frequency Id (RFID), sensors embedded in devices, machinery, vehicles, Internet searches, social networks such as Facebook, portable computers, smart phones and other cell phones, GPS devices, and call center records.
In most cases, in order to effectively utilize big data, it must be combined with structured data (typically from a relational database) from a more conventional business application, such as Enterprise Resource Planning (ERP) or Customer Relationship Management (CRM).
![Page 7: Big data](https://reader035.vdocuments.site/reader035/viewer/2022070304/54bca7104a7959c04b8b457e/html5/thumbnails/7.jpg)
Global market for big dataIndustry Size :
Today every organisation across the globe is faced with an unprecedented growth in data.
The digital universe of data was expected to expand to 2.7 Zetta bytes (ZB) by the end of 2012. Then it is predicted to be double every two years, reaching 8 ZB data by 2015. Its hard to conceptualize this quantity of information.
US library of Congress holds 462 terabytes (TB) of digital data. At this rate 8 ZB is equivalent to almost 18 million libraries of Congress.
That translates to a ten-fold increase over the last five years and an astounding 29-fold increase over the next ten years.
This year, the world’s digital information is expected to grow by 57%. Within that, internet traffic is growing by 35%, and mobile data traffic at 110%, according to Cisco. The big data industry is worth somewhere between $30bn and $200bn.
![Page 8: Big data](https://reader035.vdocuments.site/reader035/viewer/2022070304/54bca7104a7959c04b8b457e/html5/thumbnails/8.jpg)
Smartphones, tablets, sensors, social networks, online games, video streams and mobile payments will all drive big data for many years to come
Growth drivers
![Page 9: Big data](https://reader035.vdocuments.site/reader035/viewer/2022070304/54bca7104a7959c04b8b457e/html5/thumbnails/9.jpg)
Internet companies:
Amazon , Apple, Facebook ,Google, Microsoft
The big Internet companies control where the data comes from and where it goes to .
Amazon, Baidu, Facebook and Google may one day make a lucrative side business from selling their proprietary distributed database technologies, competing with IBM and Oracle
Big Players of Big Data
![Page 10: Big data](https://reader035.vdocuments.site/reader035/viewer/2022070304/54bca7104a7959c04b8b457e/html5/thumbnails/10.jpg)
Data storage, networking and hardware companies:
ARM, BROCADE, CISCO, DELL, EMC, HP, INTEL ,LENOVO, NETAPP, SEAGATE
Many hardware makers like Cisco, Dell, Lenovo and HP are investing heavily in big data appliances
Data storage companies are likely to continue to beat earnings expectations as the data deluge goes into overdrive
![Page 11: Big data](https://reader035.vdocuments.site/reader035/viewer/2022070304/54bca7104a7959c04b8b457e/html5/thumbnails/11.jpg)
Enterprise software companies:
Adobe, Citrix System, IBM, Fujitsu, Informatica, Oracle, Red Hat, SAP, Salesforce.com
Hadoop is fast becoming the industry standard enterprise database platform
Cloud database services are likely to be the fastest growth sector this year within the enterprise software space
![Page 12: Big data](https://reader035.vdocuments.site/reader035/viewer/2022070304/54bca7104a7959c04b8b457e/html5/thumbnails/12.jpg)
A wide variety of techniques and technologies has been developed and adapted to aggregate, manipulate, analyze, and visualize big data.
BIG DATA TECHNIQUES:
A/B testing: A technique in which a control group is compared with a variety of test groups in order to determine what treatments (i.e., changes) will improve a given objective variable, e.g., marketing response rate.
This technique is also known as split testing or bucket testing. An example application is determining what copy text, layouts, images, or colors will improve conversion rates on an e-commerce Web site.
Association rule learning: A set of techniques for discovering interesting
relationships, i.e., “association rules,” among variables in large databases.
These techniques consist of a variety of algorithms to generate and test possible rules.
One application is market basket analysis, in which a retailer can determine which
products are frequently bought together and use this information for marketing (a
commonly cited example is the discovery that many supermarket shoppers who buy diapers also tend to buy beer). Used for data mining.
Big Data techniques & technologies
![Page 13: Big data](https://reader035.vdocuments.site/reader035/viewer/2022070304/54bca7104a7959c04b8b457e/html5/thumbnails/13.jpg)
Cluster analysis: A statistical method for classifying objects that splits a diverse group into smaller groups of similar objects, whose characteristics of similarity are not known in advance.
Crowdsourcing: A technique for collecting data submitted by a large group of people or community through an open call, usually through networked media such as the Web.
Statistics: The science of the collection, organization, and interpretation of data, including the design of surveys and experiments
BIG DATA TECHNOLOGIES There is a growing number of technologies used to
aggregate, manipulate, manage, and analyze big data.
![Page 14: Big data](https://reader035.vdocuments.site/reader035/viewer/2022070304/54bca7104a7959c04b8b457e/html5/thumbnails/14.jpg)
Big Table. Proprietary distributed database system built on the Google File System. Tables are further split into multiple tablets. When size of data grows beyond limits, tablets are compressed by the use of algorithms such as Snappy.
Business intelligence (BI): A type of application software designed to report, analyze, and present data. BI tools are often used to read data that have been previously stored in a data warehouse or data mart. BI tools can also be used to create standard reports that are generated on a periodic basis, or to display information on real-time management dashboards, i.e., integrated displays of metrics
that measure the performance of a system.
![Page 15: Big data](https://reader035.vdocuments.site/reader035/viewer/2022070304/54bca7104a7959c04b8b457e/html5/thumbnails/15.jpg)
Data warehouse: Specialized database optimized for reporting, often used for storing large amounts of structured data. Data is uploaded using ETL (extract, transform, and load) tools from operational data stores, and reports are often generated using business intelligence tools.
Extract, transform, and load (ETL): Software tools used to extract data from outside sources, transform them to fit operational needs, and load them into a database or data warehouse.
Hadoop: An open source (free) software framework for processing huge datasets on certain kinds of problems on a distributed system. Its development was inspired by Google’s MapReduce and Google File System.
Hbase: An open source (free), distributed, non-relational database modeled on Google’s Big Table. It enables fault tolerant way of storing large quantities of data.
![Page 16: Big data](https://reader035.vdocuments.site/reader035/viewer/2022070304/54bca7104a7959c04b8b457e/html5/thumbnails/16.jpg)
Opportunities:
Data intent and capacity• The data revolution• Intent in an age of growing volatility
Social Science and Policy Applications
Challenges:
Data• Privacy• Access and Sharing
Analysis• Defining and Detecting Anomalies in Human Ecosystems
OPPORTUNITIES AND CHALLENGES
![Page 17: Big data](https://reader035.vdocuments.site/reader035/viewer/2022070304/54bca7104a7959c04b8b457e/html5/thumbnails/17.jpg)
BIG DATA : THE NEED OF THE HOUR
• HP’s Big Data strategy and Vertica• CSC Buys Infochimps for Big Data, Analytics Expertise• Market Intelligence Provider FirstRain Unveils New Big Data Tool,
Market Insights
![Page 18: Big data](https://reader035.vdocuments.site/reader035/viewer/2022070304/54bca7104a7959c04b8b457e/html5/thumbnails/18.jpg)
Investment risks:Whilst big data industry revenues are certain to grow, investors face significant risks.
Bandwidth riskToday, internet bandwidth prices are capped, effectively making internet bandwidth a free resource for big data companies. But, withoutsubstantial investment by the world’s mobile operators, big data is likely to grow far faster than the ability of the network to carry it.
As networks get overloaded, network latency rises, reducing the speed and efficiency of analytical engines, especially those powered throughthe cloud. The coming mobile bandwidth shortage will shift competitive advantage from technology companies to telecom operators.
Risks Involved
![Page 19: Big data](https://reader035.vdocuments.site/reader035/viewer/2022070304/54bca7104a7959c04b8b457e/html5/thumbnails/19.jpg)
Open source riskWith the source code free, barriers to entry remain low. In the longer term, this may depress the database industry’s margins.
Patent riskEver since Apple took on the mobile phone industry – and won – with barely a handful of mobile patents to its name, a patent war has erupted across the technology sector. Were a patent war to break out in the big data space, technological progress could be slowed down.
Whilst regulators are unlikely to allow any hoarding of patents on anti-competitive grounds, the risk remains. Oracle, a leader in big data, iswell known for filing multi-billion dollar patent infringement lawsuits against its competitors.
Cyber riskLast month Global Payments, a credit card transaction processor, admitted that hackers had stolen the details of 1.5m North Americancard holders. This is the latest in a string of security breaches that have hit companies dealing in big data. Apple, EMC, Google, Oracle andSony are all recent hacking victims. As the level of cyber-crime rises, so does the risk of dealing with big data. Just as the Fukushima incident dampened prospects for the nuclear sector, so a large cyber-attack could adversely impact big data industry profits.
![Page 20: Big data](https://reader035.vdocuments.site/reader035/viewer/2022070304/54bca7104a7959c04b8b457e/html5/thumbnails/20.jpg)
Often misunderstood and ill-applied
The question is not “how big is your data?”, it is “what are you are doing with your data?”
It fails to supply its customers with products that solve business problems.
Companies searching for data solutions are often confused by all the big data marketing hype and sometimes end up wasting resources.
Big Data- An alternate view
![Page 21: Big data](https://reader035.vdocuments.site/reader035/viewer/2022070304/54bca7104a7959c04b8b457e/html5/thumbnails/21.jpg)
THANK YOU!