java bigdata full stack development (version 2.0)
Post on 11-Apr-2017
385 Views
Preview:
TRANSCRIPT
Java BigData Full Stack
Development as is ...
Alexey Zinovyev, Java Trainer in EPAM
About
With IT since 2007
With Java since 2009
With Hadoop since 2012
With EPAM since 2015
3Java Big Data Full Stack Development
Contacts
E-mail : Alexey_Zinovyev@epam.com
Twitter : @zaleslaw @BigDataRussia
vk.com/big_data_russia Big Data Russia
vk.com/java_jvm Java & JVM langs
4Java Big Data Full Stack Development
The Good Old Days
5Java Big Data Full Stack Development
HRs & RMs are looking for Java developers
6Java Big Data Full Stack Development
Is Java Dream Team waiting You?
7Java Big Data Full Stack Development
Required Skills
• Advanced SQL
• Basic Linux
• Core Java & JVM
• Backend Development Experience
• Basic Computer Science Level
8Java Big Data Full Stack Development
REAL WORLD
9Java Big Data Full Stack Development
Let’s just use Javascript in frontend ONLY
10Java Big Data Full Stack Development
In frontend
ONLY?
11Java Big Data Full Stack Development
Cruel world
12Java Big Data Full Stack Development
Do you know ML JS library?
13Java Big Data Full Stack Development
Wild animals everywhere
14Java Big Data Full Stack Development
And what I tell you
15Java Big Data Full Stack Development
And what I tell you
16Java Big Data Full Stack Development
It’s Time for Java Superhero, yeah!
17Java Big Data Full Stack Development
Before patterns discovering you should ..
• Select small pieces
• Define default values for missed
data
• Remove strange signals from data
• Merge some tables in one if
required
18Java Big Data Full Stack Development
How it really works
• Share your date with us
• Our magic manipulations
• Building an answering machine
• PROFIT!!!
19Java Big Data Full Stack Development
How to start?
20Java Big Data Full Stack Development
21Java Big Data Full Stack Development
WHAT IS BIG DATA?
22Java Big Data Full Stack Development
Joke about Excel
23Java Big Data Full Stack Development
5V
24Java Big Data Full Stack Development
Every 60 seconds…
25Java Big Data Full Stack Development
From Mobile Devices
26Java Big Data Full Stack Development
From Industry
27Java Big Data Full Stack Development
We started to keep and handle stupid new things!
28Java Big Data Full Stack Development
10^6 rows
in MySQL
29Java Big Data Full Stack Development
GB->TB->PB->?
30Java Big Data Full Stack Development
Is BigData about PBs?
31Java Big Data Full Stack Development
Is BigData about PBs?
32Java Big Data Full Stack Development
It’s hard to …
• .. store
• .. handle
• .. search in
• .. visualize
• .. send in network
33Java Big Data Full Stack Development
Likes in Classmates: how to count?
34Java Big Data Full Stack Development
Crazy Zoo
2012
35Java Big Data Full Stack Development
Crazy Zoo
2016
36Java Big Data Full Stack Development
What will be
lighted this
training
37Java Big Data Full Stack Development
NOSQL
38Java Big Data Full Stack Development
What’s the problem with RBDMS’s
• Caching
• Master/Slave
• Cluster
• Table Partitioning
• Sharding
39Java Big Data Full Stack Development
Family
40Java Big Data Full Stack Development
Database
party
41Java Big Data Full Stack Development
Spring Data
42Java Big Data Full Stack Development
How to start?
43Java Big Data Full Stack Development
Java MongoDB Driver + Robomongo
44Java Big Data Full Stack Development
BIG DATA TOOL MASTER
VS
DATA SCIENTIST
45Java Big Data Full Stack Development
TRAIN
MODEL
46Java Big Data Full Stack Development
Datasets
• Facebook users, tweets
• Trade transactions
• Government
• Medicine (genomic data)
• Telecommunications
47Java Big Data Full Stack Development
Data Sources
• Relational Databases
• Data warehouses (Historical data)
• Files in CSV or in binary format
• Internet or electronic mails
• Scientific, research (R, Octave,
Matlab)
48Java Big Data Full Stack Development
Hey, man, predict something!
49Java Big Data Full Stack Development
Man or sofa?
50Java Big Data Full Stack Development
Typical questions for DM
• Which loan applicants are high-risk?
51Java Big Data Full Stack Development
Typical questions for DM
• Which loan applicants are high-risk?
• How do we detect phone card fraud?
52Java Big Data Full Stack Development
Typical questions for DM
• Which loan applicants are high-risk?
• How do we detect phone card fraud?
• What is the revenue prediction for next year?
53Java Big Data Full Stack Development
Typical questions for DM
• Which loan applicants are high-risk?
• How do we detect phone card fraud?
• What is the revenue prediction for next year?
• Can you recommend music for users?
54Java Big Data Full Stack Development
Green circle is blue square or red
triangle? Let’s ask its neighbors!
kNN (k-nearest neighbor)
55Java Big Data Full Stack Development
Collaborative Filtering
56Java Big Data Full Stack Development
Machine Learning vs Traditional Programming
57Java Big Data Full Stack Development
Data
Science
58Java Big Data Full Stack Development
Can a Java programmer to be a Data Scientist?
59Java Big Data Full Stack Development
Sexy Data Scientist
60Java Big Data Full Stack Development
Real Data Scientist
61Java Big Data Full Stack Development
How to start?
62Java Big Data Full Stack Development
Weka
63Java Big Data Full Stack Development
HADOOP
64Java Big Data Full Stack Development
Hadoop and Data Knights
65Java Big Data Full Stack Development
Hadoop
66Java Big Data Full Stack Development
MapReduce in different languages
67Java Big Data Full Stack Development
MapReduce for WordCount
68Java Big Data Full Stack Development
Hadoop
Jobs
69Java Big Data Full Stack Development
Hadoop frameworks
• Universal (MapReduce, Tez, RDD in Spark)
• Abstract (Pig, Pipeline Spark)
• SQL - like (Hive, Impala, Spark SQL)
• Processing graph (Giraph, GraphX)
• Machine Learning (Mahout, MLib)
• Stream processing (Spark Streaming, Storm)
70Java Big Data Full Stack Development
SPARK
71Java Big Data Full Stack Development
SPARK: the bloody son of MR
• MapReduce in memory
• Up to 50x faster than Hadoop
• RDD is a basic building block
(immutable distributed
collections of objects)
• Pipeline API (no needs in PIG)
72Java Big Data Full Stack Development
Spark
Family
73Java Big Data Full Stack Development
MLlib supports
• Classification and regression
• Collaborative filtering
• Clustering
• Dimensionality reduction
• Optimization
74Java Big Data Full Stack Development
Code sample MLlib (K-Means)
// Cluster the data into two classes using KMeans
int numClusters = 2;
int numIterations = 20;
KMeansModel clusters = KMeans.train(parsedData.rdd(), numClusters, numIterations);
// Evaluate clustering by computing Within Set Sum of Squared Errors
double WSSSE = clusters.computeCost(parsedData.rdd());
System.out.println("Within Set Sum of Squared Errors = " + WSSSE);
// Save and load model
clusters.save(sc.sc(), "myModelPath");
KMeansModel sameModel = KMeansModel.load(sc.sc(), "myModelPath");
75Java Big Data Full Stack Development
MLlib
• .. extends scikit-learn (Python lib) and Mahout
• .. runs fully on Spark and supports Spark’s Pipeline API
• .. dataset is represented by Spark SQL’s SchemaRDD
• .. supports Hive like external data source
• .. is well for large datasets and parallelized algorithms
76Java Big Data Full Stack Development
It solves all problems!
77Java Big Data Full Stack Development
How to start?
78Java Big Data Full Stack Development
HDP Zoo
79Java Big Data Full Stack Development
Ok, Google!
80Java Big Data Full Stack Development
AWS Amazon
81Java Big Data Full Stack Development
Infrastructure issues are waiting YOU!
82Java Big Data Full Stack Development
DEEP LEARNING
83Java Big Data Full Stack Development
Deep Learning help us build NEW FUTURE
84Java Big Data Full Stack Development
Deep Learning help us build NEW FUTURE
85Java Big Data Full Stack Development
HOW TO LEARN?
86Java Big Data Full Stack Development
1. Read books and write ‘pet’ projects
DIFFERENT WAYS
87Java Big Data Full Stack Development
1. Read books and write ‘pet’ projects
2. Become a mentee in Mentoring Process
DIFFERENT WAYS
88Java Big Data Full Stack Development
1. Read books and write ‘pet’ projects
2. Become a mentee in Mentoring Process
3. MOOC
DIFFERENT WAYS
89Java Big Data Full Stack Development
1. Read books and write ‘pet’ projects
2. Become a mentee in Mentoring Process
3. MOOC
4. Take a training course
DIFFERENT WAYS
90Java Big Data Full Stack Development
1. Read books and write ‘pet’ projects
2. Become a mentee in Mentoring Process
3. MOOC
4. Take a training course
5. Visit conferences
DIFFERENT WAYS
91Java Big Data Full Stack Development
Recommended Books
92Java Big Data Full Stack Development
Contacts
E-mail : Alexey_Zinovyev@epam.com
Twitter : @zaleslaw @BigDataRussia
vk.com/big_data_russia Big Data Russia
vk.com/java_jvm Java & JVM langs
top related