big data terabytes, petabytes, exabytes, and zettabytes ... · hadoop mapreduce be enough? is it...

69
BIG DATA TERABYTES, PETABYTES, EXABYTES, AND ZETTABYTES – OH MY !! COMPILED BY HOWIE BAUM

Upload: others

Post on 14-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

BIG DATA

TERABYTES, PETABYTES, EXABYTES, AND ZETTABYTES – OH MY !!

COMPILED BY HOWIE BAUM

Page 2: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

Big Data is a phrase that gets talked about a lot in the media, the board room — and everywhere in between.

It’s been used, overused and used incorrectly so many times that it’s become difficult to know what it really means.

Is it a tool?

Is it a technology?

Is it just a buzzword used by data scientists to scare us?

Is it really going to change the world? Or ruin it?

First let’s just say that Big Data is getting bigger every day, “fast”

SO FAST THAT 90% OF THE WORLD’S DIGITAL DATA WAS CREATED IN THE LAST TWO YEARS !!

Page 3: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

WHAT IS BIG DATA ?

Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. (from Wikipedia)

The first ideas about its value for Businesses started in 2006

The challenges include:Data capture

Storage

Search

Sharing

Transfer

Analysis

Visualization

Some have defined big data as an amount of data that exceeds a petabyte—one million gigabytes !

Page 4: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

WHY HAS BIG DATA BECOME SO POPULAR NOW?

Only in the last few years, companies now realize that the availability of lower priced and faster computer power -- or big computing -- is the real change that has opened the door to big opportunity.

Big computing at small prices allows companies to look at, and deal with, data in ways not possible before. It's this computational capacity that has the real potential to transform data from a compliance burden into a business asset.

Organizations have always collected data, but until recently, large-scale cluster computing and analytic algorithms that could perform at scale were cost-prohibitive. That's no longer the case, and many organizations are now experimenting with big data

Page 5: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

THE 3 TO 8 “V”s ARE USED TO DESCRIBE BIG DATA

What’s important to keep in mind is that Big Data isn’t just about the amount of data we’re generating, it’s also about all the different types of data:

Text

Video

Search logs

Sensor logs

Customer transactions

In most big data circles, it’s description is based on the four V’s: volume, variety, velocity, and veracity. (You might consider a fifth V, value.)

The following pages show comparisons of 4, 7, and 8 V’s that are part of Big Data.

Page 6: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And
Page 7: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

REVIEWING THE 7 “V” ‘s of BIG DATA

Volume : With the dramatic growth of the internet, mobile devices, social media, and Internet of Things (IoT) technology, the amount of data generated by all these sources has grown accordingly.

Velocity : In addition to getting bigger, the generation of data and organizations’ ability to process it is accelerating.

Variety: In earlier times, most data types could be neatly captured in rows on a structured table.

In the Big Data world, data often comes in unstructured formats like social media posts, server log data, latitude and longitude geo-coordinates, photos, audio, video and free text.

Page 8: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

Variability: The meaning of words in unstructured data can change based on context.

Veracity: With many different data types and data sources, data quality issues invariably pop up in Big Data sets. Veracity deals with exploring a data set for data quality and systematically cleansing that data to be useful for analysis.

Visualization: Once data has been analyzed, it needs to be presented in a visualization for end users to understand and act upon.

Value: Data must be combined with rigorous processing and analysis to be useful.

Page 9: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

5 MINUTE GREAT VIDEO ABOUT IT GO TO 4:02 MINUTEShttps://www.youtube.com/watch?v=bAyrObl7TYE

Page 10: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

WHAT ARE THE UNITS IN BIG DATA ?

Page 11: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

11

THE AMOUNT OF MEMORY IN COMPUTERSThe smallest amount of memory is a bit which represents a 0 or a 1

A memory location, or byte, is made of 8 bits and usually stores one character such as a letter or a number or symbol.

Therefore, a computer with 8 Megabyte of memory can store approximately 8 million characters.

One megabyte can hold approximately 768 pages of text information.

1 Byte = 8 bits = 1 letter, number, or a symbol

-Kilobyte (KB) = 1 Thousand Bytes

–Megabyte (Mb) = 1 Million Bytes

–Gigabyte (GB) = 1 Billion Bytes

–Terabyte (TB) = 1 Trillion Bytes

Page 12: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And
Page 13: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And
Page 14: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And
Page 15: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And
Page 16: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And
Page 17: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And
Page 18: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And
Page 19: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And
Page 20: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

20

Page 21: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

MEGABYTE = 1 MILLION BYTES

1 MEGABYTE IS ENOUGH TEXT FOR A 400 PAGE BOOK

1 MEGABYTE = 768 PAGES OF TYPED TEXT

2 MEGABYTES FOR AN AVERAGE PHOTO

5 MEGABYTES = ONE, 4 MINUTE SONG

700 MEGABYTES = 1 CD WITH 80 MINUTES OF MUSIC

Page 22: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

GIGABYTE = 1,000 MEGABYTES

1 GIGABYTE = 10 YARDS OF BOOKS ON A SHELF

1 GIGABYTE=DATA OF THE MUSIC OF BEETHOVEN’S 5TH SYMPHONY

1 GIGABYTE = STACK OF TYPED PAGES 262 FEET HIGH ---

4.7 GIGABYTES = 1 DVD WITH MOVIES ON IT

7 GIGABYTES = HOW MUCH DATA YOU USE PER HOUR STREAMING A NETFLIX HIGH DEFINITION VIDEO

2.5 BILLION GIGABYTES OF DATA ARE PRODUCED EVERY DAY !!

Page 23: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

TERABYTE = 1,000 GIGABYTES

1 TERABYTE OF PRINTED PAGES IS 51 MILES HIGH

1 TERABYTE IS THE DATA ON ALL OF THE X-RAY IMAGES IN A LARGE HOSPITAL, PER YEAR

1 TERABYTE IS 200,000, FIVE MINUTE SONGS OR 310,000 PICTURES

10 TERABYTES IS THE INFORMATION IN THE PRINTED COLLECTION IN THE LIBRARY OF CONGRESS

24 TERABYTES OF VIDEOS ARE UPLOADED TO YOUTUBE, EVERY DAY

Page 24: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

PETABYTE = 1,000 TERABYTES

1 PETABYTE = A STACK OF 500 BILLION PAGES OF TYPED TEXT THAT IS 52,000 MILES HIGH, WHICH IS ¼ THE DISTANCE BETWEEN THE EARTH AND THE MOON

1.5 PETABYTES = 10 BILLION PHOTOS ON FACEBOOK

2 PETABYTES OF PRINTED INFORMATION IS IN ALL THE UNITED STATES ACADEMIC LIBRARIES

20 PETABYTES OF DATA IS PROCESSED BY GOOGLE, EVERY DAY

Page 25: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

EXABYTE = 1,000 PETABYTES

1 EXABYTE OF TYPED PAGES IS 52 MILLION MILES HIGH WHICH IS TWICE THE DISTANCE BETWEEN THE EARTH AND THE PLANET VENUS

2 EXABYTES = ALL OF THE WORLD’S INFORMATION IN A YEAR

5 EXABYTES = ALL OF THE WORDS EVER SPOKEN BY HUMANS

1 EXABYTE = 11 MILLION, 4K VIDEOS

15 EXABYTES = AN ESTIMATE OF ALL OF THE INFORMATION HELD BY GOOGLE

Page 26: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And
Page 27: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And
Page 28: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

Backblaze is a data storage provider. It offers two products:

1) B2 Cloud Storage - An object storage service similar to Amazon's S3.

2) Computer Backup - An online backup tool that allows Windows and macOS users to back up their data to offsite data centers.

The service is designed for businesses and end-users, providing unlimited storage space and supporting unlimited file sizes.

Page 29: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And
Page 30: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

AN EXPLOSION OF DATA –WHERE DOES IT ALL COME

FROM ?

The arrival of internet, social media and the digitization of everything around the world have led to massive amount of data generated every second.

Retail sales and inventory databases

Logistics – truck and train movement of goods

Financial services

Healthcare

Extracting meaningful information from still images, video and audio that people see and listen to

Smart objects (sensors) and the Internet of Things.

Social media

Personnel files

Location data and online activities.

Machine generated data

Computer and network logs.

Page 31: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And
Page 32: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

DATA FROM THE INTERNET OF THINGS

Page 33: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

USES OF BIG DATA

Page 34: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

Benefits of Big Data

Using big data, Netflix saves $1 billion per year on customer retention (TechJury)

$1 trillion – the amount businesses will save from IoT by 2020 (Grazziti)

$40 billion – the projected financial impact of AI by 2025 (Tractica)

8–10% – profit increase for businesses that use big data. (Entrepreneur)

Data wrapped in stories are 22x more memorable than bare facts (Chicago Analytics Group)

70% of businesses believe that data warehouse optimization is critical to their success (Forbes)

Data analytics top 4 benefits:

25% faster innovation cycles17% improved business efficiencies/higher productivity13% more effective R&D12% product/service (Chicago Analytics Group)

Page 35: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

USES OF BIG DATA AND WHERE IT IS USED

1) Health Care, 2) Detect Fraud, 3) Social Media Analysis, 4) Weather, and the 5) Public sector

1) Contribution of Big Data in Health Care

It has grown a lot. With medical advances there was need to store large amount of data of the patient’s health history.

This data can be used to analyze the patient’s health condition and to prevent health failures in future.

Google famously showed that they could predict flu outbreaks based upon when and where people were searching for flu-related terms.

Page 36: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

GENERAL ELECTRIC HEALTH INFOSCOPE

When you get a sore throat do you also end up getting an ear infection?

Health Infoscope is a compilation of 72 million electronic records and shows the connection of one disease with another.

It also shows the strength of the connection and the likelihood of catching one disease due to the other.

Page 37: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

2) DETECTING FRAUD

Fraud detection and prevention is one of the many uses of BIg Data today.

Credit card companies face a lot of frauds and big data technologies are used to detect and prevent them.

Earlier credit card companies would keep a track on all the transactions and if any suspicious transaction is found they would call the buyer and confirm if that transaction was made.

Now the buying patterns are observed and fraud affected areas are analyzed using Big Data analytics.

Page 38: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

3) SOCIAL MEDIA ANALYSIS

The best use case of big data is the data that keeps flowing on social media networks like, Facebook, Twitter, etc.

The data is collected and observed in the form of comments, images, social statuses, etc.

Companies use big data techniques to understand the customers requirements and check what they say on social media.

This helps companies to analyze and come up strategies that will be beneficial for the company’s growth.

Page 39: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

4) WEATHER

Big Data technologies are used to predict the weather forecast.

Large amounts of data about the climate from ground sensors and satellites is fed into computers and an average is taken to predict the weather.

This can be useful to predict natural calamities such as floods, etc.

Page 40: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

5) PUBLIC SECTOR

Big Data is used in a lot of government applications as well as in public sectors.

It provides helpful information to a lot of facilities such as electric and natural gas power generation, water utilities, investigation, economic promotion, etc.

It is used in many other cases such as the Education sector, Insurance services, Transportation. Security Intelligence, etc.

Big data has become an important part for analysis and is needed in order to understand the growth of the businesses and build strategies to help it grow further.

Page 41: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

WHY DO WE WAN’T TO COLLECT BIG DATA ?

WHAT ARE THE BENEFITS ?

Page 42: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

WHO COLLECTS BIG DATA ?

Page 43: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And
Page 44: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And
Page 45: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And
Page 46: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And
Page 47: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And
Page 48: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

THE ‘SCARY’ SEVEN: BIG DATA CHALLENGES AND WAYS TO SOLVE THEM

1) INSUFFICIENT UNDERSTANDING AND ACCEPTANCE OF BIG DATAA. Some companies fail to know even the

basics: what big data actually is, what its benefits are, what infrastructure is needed, etc. so if they don’t set it up properly, it is doomed to failure and they may waste a lot of time and resources

B. Big Data can be a huge change for a company, so it needs to be accepted by top management first and then down the ladder, but is shouldn’t be overdone or it will have an adverse effect on those involved to implement it.

C. To ensure big data understanding and acceptance at all levels, IT departments need to organize numerous trainings and workshops.

Page 49: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

2) CONFUSING VARIETY OF BIG DATA TECHNOLOGIES

It can be easy to get lost in the variety of big data technologies now available on the market.

Do you need Spark or would the speeds of Hadoop MapReduce be enough?

Is it better to store data in Cassandra or HBase?

Finding the answers can be tricky. And it’s even easier to choose poorly, if you are exploring the ocean of technological opportunities without a clear view of what you need.

Solution - Use skills in your IT Department or seek professional help by hiring an expert about it.

Page 50: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

Big Data requires a set of tools and techniques for analysis to gain insights from it.

Hadoop which helps in storing and processing large data

Spark helps in-memory calculation

Storm helps in faster processing of unbounded data,

Apache Cassandra provides high availability and scalability of a database

MongoDB provides cross-platform capabilities,

Page 51: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

Big Data Analysis is now commonly used by many companies to predict market trends, personalize customers experiences, speed up company’s workflow, etc.

Big Data can be processed using different tools such as MapReduce, Spark, Hadoop, Pig, Hive, Cassandra and Kafka. Each of these different tools has its advantages and disadvantages which determines how companies might decide to use them.

Page 52: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

THREE DIFFERENT WAYS OF

FORMATTING DATA COMMONLY

USED

Unstructured = unorganized data (eg.videos).

Semi-structured = the data is organized in a not fixed format (eg.JSON).

Structured = the data is stored in a structured format (eg.RDBMS).

Page 53: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And
Page 54: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

3) PAYING LOADS OF MONEY

Big Data adoption projects entail lots of expenses.

If you opt for an on-premises solution, you’ll have the costs of new hardware, new hires, electricity and the need to pay for the development, setup, configuration and maintenance of new software.

If you decide on a cloud-based big data solution, you’ll still need to hire staff and pay for cloud services, big data solution development as well as setup and maintenance of needed frameworks.

Page 55: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

4) COMPLEXITY OF MANAGING DATA QUALITY

Sooner or later, you’ll run into the problem of data integration, since the data you need to analyze comes from diverse sources in a variety of different formats.

For instance, ecommerce companies need to analyze data from website logs, call-centers, competitors’ website ‘scans’ and social media.

Unreliable dataThere is a whole bunch of techniques dedicated to cleansing data. But first things first. Your big data needs to have a proper model. Only after creating that, you can go ahead and analyze it.

But keep in mind that big data is never 100% accurate. You have to know it and deal with it.

Page 56: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

5) DANGEROUS BIG DATA SECURITY HOLES

Quite often, big data adoption projects put security off till later stages, but this is not a smart move.

IT persons hope that security will be granted on the application level but what can happen is that big data security gets cast aside.

Solution:The precaution against your possible big data security challenges is putting security first.

It is particularly important at the stage of designing your solution’s architecture.

If you don’t get along with big data security from the very start, it’ll bite you when you least expect it.

Page 57: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

6) TRICKY PROCESS OF CONVERTING BIG DATA INTO VALUABLE INSIGHTS

On Instagram, a certain soccer player posts his new look, and the two characteristic things he’s wearing are white Nike sneakers and a beige cap.

He looks good in them, and people who see that want to look this way too. Thus, they rush to buy a similar pair of sneakers and a similar cap.

But in your store, you have only the sneakers. As a result, you lose revenue and maybe some loyal customers.

Solution:The reason that you failed to have the needed items in stock is that your big data tool doesn’t analyze data from social networks or competitor’s web stores.

While your rival’s big data among other things does note trends in social media in near-real time. And their shop has both items and even offers a 15% discount if you buy both.

Page 58: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

7) TROUBLES OF UPSCALING

The most typical feature of big data is its dramatic ability to grow. And one of the most serious challenges of big data is associated exactly with this.

Your solution’s design may be thought through and adjusted to upscaling with no extra efforts. But the real problem isn’t the actual process of introducing new processing and storing capacities. It lies in the complexity of scaling up so, that your system’s performance doesn’t decline and you stay within budget.

Solution:The first and foremost precaution for challenges like this is a decent architecture of your big data solution.

As long as your big data solution can boast such a thing, less problems are likely to occur later.

Another highly important thing to do is designing your big data algorithms while keeping future upscaling in mind.

Page 59: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

INTERESTING EXAMPLES OF USING BIG DATA

Page 60: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

Mt. Sinai Hospital in NYC created a computer-based project they call Deep Patient.

They fed in the medical records of 700,000 people with 500 data points per patient and let the machine iterate on the data.

The machine was given no information about how the human body works or how diseases affect us.

It found correlations that let it predict the onset of some diseases more accurately than ever, and some diseases, such as schizophrenia, for the first time at all.

It does this by creating a vast network of weighted connections that is just too complex for us to understand.

Page 61: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

MOST POPULAR WEBSITES 1996 - 2019https://www.youtube.com/watch?v=2Uj1A9AguFs

Page 62: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

https://www.youtube.com/watch?v=a3w8I8boc_I

Page 63: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

MOST POPULAR MUSIC STYLES 1910 - 2019https://www.youtube.com/watch?v=eP88FUL7d_8

Page 64: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

WINDYTY’S GLOBAL WEATHER VISUALIZATION

Extremely simple and elegant, Windyty animates wind, temperature, clouds/rain, waves, snow, and air pressure patterns across the globe, drawing on data from the Global Forecast System’s weather model.

Users can drag and zoom to their location, and can play an animated projection of forecasted weather for two weeks

A snippet showing a two-day period is shown.

Page 65: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

HANS ROSLING’S WEALTH AND HEALTH OF NATIONS

Page 66: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And
Page 67: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And
Page 68: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And
Page 69: BIG DATA Terabytes, Petabytes, Exabytes, and Zettabytes ... · Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And

THE END