se2016 bigdata denis reznik "data driven future"
TRANSCRIPT
![Page 1: SE2016 BigData Denis Reznik "Data driven future"](https://reader031.vdocuments.site/reader031/viewer/2022030313/58eca7241a28ab1a3e8b46cd/html5/thumbnails/1.jpg)
Data-Driven FutureWhat to Learn and What to Expect?
Denis ReznikData Architect at Intapp KyivMicrosoft Data Platform MVP
![Page 2: SE2016 BigData Denis Reznik "Data driven future"](https://reader031.vdocuments.site/reader031/viewer/2022030313/58eca7241a28ab1a3e8b46cd/html5/thumbnails/2.jpg)
About me
•Denis Reznik
•Kyiv, Ukraine
•Data Architect at Intapp, Inc.
•Microsoft Data Platform MVP
•Co-Founder of Ukrainian Data Community
2 |
![Page 3: SE2016 BigData Denis Reznik "Data driven future"](https://reader031.vdocuments.site/reader031/viewer/2022030313/58eca7241a28ab1a3e8b46cd/html5/thumbnails/3.jpg)
Agenda
•Data is a new Oil (c)
•Data and Science
•Data in Big Companies
•Data and Application Development
•Data-Driven Future
![Page 4: SE2016 BigData Denis Reznik "Data driven future"](https://reader031.vdocuments.site/reader031/viewer/2022030313/58eca7241a28ab1a3e8b46cd/html5/thumbnails/4.jpg)
Data is a New Oil
“Data is the new oil. It’s valuable, but if unrefined it
cannot really be used. It has to be changed into gas,
plastic, chemicals, etc to create a valuable entity that
drives profitable activity; so must data be broken
down, analyzed for it to have value.”
(c) Clive Humby, UK Mathemetician
![Page 5: SE2016 BigData Denis Reznik "Data driven future"](https://reader031.vdocuments.site/reader031/viewer/2022030313/58eca7241a28ab1a3e8b46cd/html5/thumbnails/5.jpg)
Data and Science
•Thousands of years•Empirical
•Few hundreds of years•Theoretical
•Last fifty years•Computational•“Query the world”
•Last twenty years•eScience (Data Science)•“Download the world”
![Page 6: SE2016 BigData Denis Reznik "Data driven future"](https://reader031.vdocuments.site/reader031/viewer/2022030313/58eca7241a28ab1a3e8b46cd/html5/thumbnails/6.jpg)
Machine Learning
Supervised Learning Unsupervised Learning
Classification Regression
![Page 7: SE2016 BigData Denis Reznik "Data driven future"](https://reader031.vdocuments.site/reader031/viewer/2022030313/58eca7241a28ab1a3e8b46cd/html5/thumbnails/7.jpg)
Linear Regression
Learning Algorithm
Training Data
h
h - Hypothesis
OceanTemperature
WhalesPopulation
![Page 8: SE2016 BigData Denis Reznik "Data driven future"](https://reader031.vdocuments.site/reader031/viewer/2022030313/58eca7241a28ab1a3e8b46cd/html5/thumbnails/8.jpg)
DEMO
Linear Regression
![Page 9: SE2016 BigData Denis Reznik "Data driven future"](https://reader031.vdocuments.site/reader031/viewer/2022030313/58eca7241a28ab1a3e8b46cd/html5/thumbnails/9.jpg)
Data in Big Companies
![Page 10: SE2016 BigData Denis Reznik "Data driven future"](https://reader031.vdocuments.site/reader031/viewer/2022030313/58eca7241a28ab1a3e8b46cd/html5/thumbnails/10.jpg)
Parallel Processing
Temperature Sensor Datasets (n Items)
Q: How many times temperature was above the norm during the last week?
A: 5
Time: 2 sec
Algorithmic Complexity: O(n)
![Page 11: SE2016 BigData Denis Reznik "Data driven future"](https://reader031.vdocuments.site/reader031/viewer/2022030313/58eca7241a28ab1a3e8b46cd/html5/thumbnails/11.jpg)
Parallel Processing
Temperature Sensor Datasets (k Items in each one)
Q: How many times temperature was above the norm during the last week?
A: 1
Time: 0.5 sec
Algorithmic Complexity: O(n/k)
A: 0 A: 3 A: 4
![Page 12: SE2016 BigData Denis Reznik "Data driven future"](https://reader031.vdocuments.site/reader031/viewer/2022030313/58eca7241a28ab1a3e8b46cd/html5/thumbnails/12.jpg)
Map-Reduce
A: 1
Map -> COUNT(*) WHERE Value > 40
A: 0 A: 3 A: 4
Reduce -> COUNT(*)
A: 5
Reduce
![Page 13: SE2016 BigData Denis Reznik "Data driven future"](https://reader031.vdocuments.site/reader031/viewer/2022030313/58eca7241a28ab1a3e8b46cd/html5/thumbnails/13.jpg)
DEMO
Map-Reduce
![Page 14: SE2016 BigData Denis Reznik "Data driven future"](https://reader031.vdocuments.site/reader031/viewer/2022030313/58eca7241a28ab1a3e8b46cd/html5/thumbnails/14.jpg)
Data and Application Development
source: https://www.youtube.com/watch?v=t6kM2EM6so4
![Page 15: SE2016 BigData Denis Reznik "Data driven future"](https://reader031.vdocuments.site/reader031/viewer/2022030313/58eca7241a28ab1a3e8b46cd/html5/thumbnails/15.jpg)
Index (B-Tree) - Seek
…
…
1 .. 1M
1 .. 2K 2K+1 .. 4K
1M-2K .. 1M
1 .. 300 301..800 801..1,5K 1,5K+1..2K
SELECT * FROM UsersWHERE Id = 523
![Page 16: SE2016 BigData Denis Reznik "Data driven future"](https://reader031.vdocuments.site/reader031/viewer/2022030313/58eca7241a28ab1a3e8b46cd/html5/thumbnails/16.jpg)
Index (B-Tree) - Scan
…
…
1 .. 1M
1 .. 2K 2K+1 .. 4K
1M-2K .. 1M
1 .. 300 301..800 801..1,5K 1,5K+1..2K
SELECT * FROM Users
![Page 17: SE2016 BigData Denis Reznik "Data driven future"](https://reader031.vdocuments.site/reader031/viewer/2022030313/58eca7241a28ab1a3e8b46cd/html5/thumbnails/17.jpg)
Index (B-Tree) - Range Scan
…
…
1 .. 1M
1 .. 2K 2K+1 .. 4K
1M-2K .. 1M
1 .. 300 301..800 801..1,5K 1,5K+1..2K
SELECT * FROM UsersWHERE Id BETWEEN 700 AND 1700
![Page 18: SE2016 BigData Denis Reznik "Data driven future"](https://reader031.vdocuments.site/reader031/viewer/2022030313/58eca7241a28ab1a3e8b46cd/html5/thumbnails/18.jpg)
Hashtable
John Dow
John Snow
Jack Snack
2
3
1
4
0
John Dow
Hash Function
0
Jack Snack
2
John Snow
0
![Page 19: SE2016 BigData Denis Reznik "Data driven future"](https://reader031.vdocuments.site/reader031/viewer/2022030313/58eca7241a28ab1a3e8b46cd/html5/thumbnails/19.jpg)
Data-Driven Future
• Data amount is growing and this is cool
• More and more decisions are based on data
• More and more applications are developed
• It is exciting to be a Software Engineer now!
![Page 20: SE2016 BigData Denis Reznik "Data driven future"](https://reader031.vdocuments.site/reader031/viewer/2022030313/58eca7241a28ab1a3e8b46cd/html5/thumbnails/20.jpg)
Thank you!
Denis Reznik
Twitter: @denisreznik
Email: [email protected]
Blog: http://reznik.uneta.com.ua
Facebook: https://www.facebook.com/denis.reznik.5
LinkedIn: http://ua.linkedin.com/pub/denis-reznik/3/502/234