the era of big data - ufbaformas.ufba.br/dclaro/mat700/aula 1 - the era of...introduction formas -...
TRANSCRIPT
THE ERA OF BIG DATA: From IoT to NewSQL
Daniela Barreiro Claro
The era of Big Data
RDBMS
NOSQL
NewSQL
Big Data Analytics
Where is our course?
Outline
2 de X;X=38 Prof. Daniela Barreiro Claro
Introduction
3 de X
Are you ready for the BigData era?
Big Data
RDBMS
NOSQL
NewSQL
Data Analytics
Our course
Prof. Daniela Barreiro Claro
Introduction
Are you ready for the BigData era?
Big Data
RDBMS
NOSQL
NewSQL
Data Analytics
Our course
Prof. Daniela Barreiro Claro
Introduction
Big Data = cloud+social+mobile
Big Data
RDBMS
NOSQL
NewSQL
Data Analytics
Our course
Prof. Daniela Barreiro Claro
Introduction
FORMAS - UFBA 6 de X
What is BIG DATA?
Big data is data that exceeds the processing
capacity of conventional database systems.
The data is too big, moves too fast, or doesn’t fit
the structures of a database architecture
The buzzword started by 2012
Big Data
RDBMS
NOSQL
NewSQL
Data Analytics
Our course
Internet of Things
1. Adrian McEwen & Hakim Cassimally. Designing the Internet of Things, 7 de X
Physical Objects
+
Controller, Sensor, and Actuators
+
Internet
=
Internet of Things
Big Data
RDBMS
NOSQL
NewSQL
Data Analytics
Our course
Internet of Things
FORMAS - UFBA 8 de X
Integrate things into
the existing web
HTML and REST
Smart things
Big Data
RDBMS
NOSQL
NewSQL
Data Analytics
Our course
Introduction
FORMAS - UFBA 9 de X
RDBMS are 25- year-old legacy code lines
that should be retired in favor of a collection
of from-scratch specialized engines
(Stonebraker et al.)
Are we really prepared to the death of the
relational area?
Big Data
RDBMS
NOSQL
NewSQL
Data Analytics
Our course
RDBMS
10 de X
Big Data
RDBMS
NOSQL
NewSQL
Data Analytics
Our course
Prof. Daniela Barreiro Claro
One-size-fits-all
If you wanted to
build
an ecommerce shop
a banking core
rental car website
Database skills:
You need to deeply know
about a UNIQUE RDBMS
RDBMS
11 de X
Big Data
RDBMS
NOSQL
NewSQL
Data Analytics
Our course
Prof. Daniela Barreiro Claro
RDBMS
12 de X
Big Data
RDBMS
NOSQL
NewSQL
Data Analytics
Our course
Prof. Daniela Barreiro Claro
RDBMS
13 de X
Big Data
RDBMS
NOSQL
NewSQL
Data Analytics
Our course
Prof. Daniela Barreiro Claro
Drawbacks:
experts in only one database
technology.
Vertical scalability
Hard and costly to make horizontal
scalability
Models do not fit all cases
Structured
Do not deal well with non structured
data
Strengths
Experts in only one
database technology
Standard
SQL
Security (ACID)
Triggers
Joins
Composed keys
Structured
RDBMS
FORMAS - UFBA 14 de X
ACID are absolutely essential for most operational systems and
online transaction processing systems, including retail, banking,
and finance
ACID compliance may not be important to
a search engine that may return different results to two users
simultaneously, or
to Amazon when returning sets of different reviews to two users.
In these applications, speed and performance triumph the
consistency of the results.
Big Data
RDBMS
NOSQL
NewSQL
Data Analytics
Our course
NOSQL
FORMAS - UFBA 15 de X
No SQL then Not Only SQL
Non structured
Eventual consistency
Cap Theorem (Consistency, Availability, Partitions
tolerance)
Main memory
Data stored in graphs, key-value, columns format
Big Data
RDBMS
NOSQL
NewSQL
Data Analytics
Our course
NOSQL
FORMAS - UFBA 16 de X
Big Data
RDBMS
NOSQL
NewSQL
Data Analytics
Our course
High performance
Horizontal scalability
Diversity of models
Flexible schema
High availability
Manage well non structured
data and big data
Flexible schema
It is not secure at all
Eventual consistency
There is not a standard
query language
Strengts Drawbacks
NOSQL
17 de X FORMAS - UFBA
Big Data
RDBMS
NOSQL
NewSQL
Data Analytics
Our course
NOSQL
18 de X
3-4 “V”s
Volume
Variety
Velocity
Value
Big Data
RDBMS
NOSQL
NewSQL
Data Analytics
Our course
NOSQL
Big Data
RDBMS
NOSQL
NewSQL
Data Analytics
Our course
Few
solutions
are here
Most
NOSQL lives here
Cap theorem:
You can only have
two out of three
Consistency,
Partition tolerance,
Availability
NOSQL
FORMAS - UFBA
Big Data
RDBMS
NOSQL
NewSQL
Data Analytics
Our course
NOSQL
21 de X Prof. Daniela Barreiro Claro
Big Data
RDBMS
NOSQL
NewSQL
Data Analytics
Our course
NOSQL
22 de X Prof. Daniela Barreiro Claro
Big Data
RDBMS
NOSQL
NewSQL
Data Analytics
Our course
select sum(salary)
from customerperson
Analytical queries
NOSQL
23 de X Prof. Daniela Barreiro Claro
Big Data
RDBMS
NOSQL
NewSQL
Data Analytics
Our course
Compression
Poor compression
ratio (low repetition)
Good compression
ratio (high repetition)
NOSQL
24 de X Prof. Daniela Barreiro Claro
Big Data
RDBMS
NOSQL
NewSQL
Data Analytics
Our course
Insertion
Insert * into customerperson
NewSQL
FORMAS - UFBA 25 de X
A problem situation
Perhaps you have gigabytes to terabytes of data that needs high-speed
transactional access.
You have an incoming event stream (sensors, mobile phones, network access
points) and need per-event transactions to compute responses and
analytics in real time.
Your problem follows a pattern of “ingest, analyze, decide,” where the
analytics and the decisions must be calculated per-request and not post-
hoc in batch processing.
Big Data
RDBMS
NOSQL
NewSQL
Data Analytics
Our course
NewSQL
FORMAS - UFBA 26 de X
A problem situation
Perhaps you have gigabytes to terabytes of data that needs high-speed
transactional access.
You have an incoming event stream (sensors, mobile phones, network access
points) and need per-event transactions to compute responses and
analytics in real time.
Your problem follows a pattern of “ingest, analyze, decide,” where the
analytics and the decisions must be calculated per-request and not post-
hoc in batch processing.
Big Data
RDBMS
NOSQL
NewSQL
Data Analytics
Our course
NewSQL
FORMAS - UFBA 27 de X
It is a new concept from 2011
Bring together the best of relational
database and the best of NOSQL
More tables…distributed database
Big Data
RDBMS
NOSQL
NewSQL
Data Analytics
Our course
NewSQL
28 de X FORMAS - UFBA
Big Data
RDBMS
NOSQL
NewSQL
Data Analytics
Our course
ACID
SQL
Standard
Structured
High performance
Horizontal scalability
High availability
Model does not fit all cases
Does not tackle well with
non structured data
Structured
New concept (2011)
Do not have resources, tools
as relational and nosql
Strengths Drawbacks
NewSQL
29 de X FORMAS - UFBA
Big Data
RDBMS
NOSQL
NewSQL
Data Analytics
Our course
NewSQL
FORMAS - UFBA 30 de X
NuoDB
a cluster-first SQL database with a focus on cloud:
run on many nodes across many datacenters
let the underlying system manage data locality and consistency for you
NuoDB is the closest to being called eventually consistent of
the NewSQL systems
Hekaton
adds sophisticated in-memory processing to the more traditional
Microsoft SQL Server.
Big Data
RDBMS
NOSQL
NewSQL
Data Analytics
Our course
NewSQL
FORMAS - UFBA 31 de X
MemSQL
often offers faster OLAP analytics than all-in-one OldSQL systems,
with higher concurrency and the ability to update data as it’s
being analyzed
focus on clustered analytics
Distributed, with MySQL compatibility
VoltDB
the most mature of these systems, combines streaming
analytics, strong ACID guarantees and native clustering
Big Data
RDBMS
NOSQL
NewSQL
Data Analytics
Our course
NewSQL
FORMAS - UFBA 32 de X
VoltDB
Is the system-of-record for data-intensive applications, while
offering an integrated high-throughput, low-latency
ingestion engine.
It’s a great choice for policy enforcement, fraud/anomaly
detection, or other fast-decisioning apps
Big Data
RDBMS
NOSQL
NewSQL
Data Analytics
Our course
RDBMS x NOSQL x NewSQL
33 de X FORMAS - UFBA
Data Analytics
FORMAS - UFBA 34 de X
Traditional approach
Decision makers wait for reports from disparate OLTP systems
Put it all together in a spread-sheet
Highly manual process
In the Web context
Data capture at the user interaction level:
in contrast to the client transaction level in the Enterprise context
As a consequence the amount of data increases significantly
Greater need to analyze such data to understand user behaviors
Big Data
RDBMS
NOSQL
NewSQL
Data Analytics
Our course
Data Analytics
FORMAS - UFBA 35 de X
Scalability to large data volumes:
Scan 100 TB on 1 node @ 50 MB/sec = 23 days
Scan on 1000-node cluster = 33 minutes
Divide-And-Conquer (i.e., data partitioning)
Cost-efficiency:
Commodity nodes (cheap, but unreliable)
Commodity network
Automatic fault-tolerance (fewer admins)
Easy to use (fewer programmers)
Big Data
RDBMS
NOSQL
NewSQL
Data Analytics
Our course
Data Analytics
36 de X
Evolution
Big Data
RDBMS
NOSQL
NewSQL
Data Analytics
Our course
Operational:
Reporting
Tactical:
Data Analysis
Strategic:
Mining &
Statistics
Future:
Learning?
Big Data Analytics
37 de X FORMAS - UFBA
Big Data
RDBMS
NOSQL
NewSQL
Data Analytics
Our course
Big Data Analytics
FORMAS - UFBA 38 de X
Big data analytics is the process of examining large data sets
containing a variety of data types (i.e. Big Data) to discover hidden
patterns, unknown correlations, market trends, customer preferences
and other useful business information.
To analyze large volumes of transaction data, as well as other forms
of data
Examples: Web server logs and Internet stream data, social media content and social
network activity reports, text from customer emails and survey responses, mobile-
phone call detail records and machine data captured by sensors connected to the
Internet of Things.
Big Data
RDBMS
NOSQL
NewSQL
Data Analytics
Our course
Big Data Analytics
FORMAS - UFBA 39 de X
Traditional analytical tools comprise basic business intelligence
examine historical data
Tools for advanced analytics
focus on forecasting future events and behaviors, allowing businesses to conduct what-
if analyses to predict the effects of potential changes in business strategies.
Predictive analytics, data mining, big data analytics, and location
intelligence are just some of the analytical categories that fall under
the heading of advanced analytics.
These technologies are widely used in industries including marketing,
healthcare, risk management, and economics.
Big Data
RDBMS
NOSQL
NewSQL
Data Analytics
Our course
Where is our course?
FORMAS - UFBA 40 de X
Data Analytics
Big Data Analytics
Data Mining for Structured Data
Big Data
RDBMS
NOSQL
NewSQL
Data Analytics
Our course
/formasresearchgroup /formasresearchgroup
www.formas.ufba.br
Semantic Applications and Formalisms Research Group
Prof. Daniela Barreiro Claro
Email: [email protected]