Download - Big Data Tech Stack
![Page 2: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/2.jpg)
Me :)
![Page 3: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/3.jpg)
Graduated from@HU
![Page 4: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/4.jpg)
PhD Student@METU
![Page 5: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/5.jpg)
Ex EntrepreneurI had 3 start-ups
![Page 6: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/6.jpg)
Senior SoftwareEngineer@Udemy
![Page 7: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/7.jpg)
Founder and Organizer of
meetup.com/ankara-big-data-meetup
![Page 8: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/8.jpg)
What's Big DataBig data is data that exceeds the processing capacity
of conventional database systems.
![Page 9: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/9.jpg)
What's Big DataBig data is when the data itself becomes part of the
problem.
![Page 10: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/10.jpg)
4V's of Big Data
![Page 11: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/11.jpg)
![Page 12: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/12.jpg)
![Page 13: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/13.jpg)
![Page 14: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/14.jpg)
![Page 15: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/15.jpg)
![Page 16: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/16.jpg)
![Page 17: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/17.jpg)
![Page 18: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/18.jpg)
Multitude of DataTypes
StructuredSemi-structuredUnstructured
![Page 19: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/19.jpg)
Data Data Data
![Page 20: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/20.jpg)
![Page 21: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/21.jpg)
![Page 22: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/22.jpg)
![Page 23: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/23.jpg)
![Page 24: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/24.jpg)
What We Need?StoreJoinIndexAnalyticsAggregateVisualize
![Page 25: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/25.jpg)
ChallengeThe challenge in big data analytics is to
dig deeplyquickly (real time?)and widely
![Page 26: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/26.jpg)
"ilities" or NFR?AvailabilityScalabilitySecurityPerformance...
![Page 27: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/27.jpg)
Solution?
![Page 28: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/28.jpg)
Big Data TechStack
![Page 29: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/29.jpg)
What're essentialcomponents?
![Page 30: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/30.jpg)
Data Sources
![Page 31: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/31.jpg)
Multiple internal& external
data sources
![Page 32: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/32.jpg)
Creates adata lake
![Page 33: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/33.jpg)
![Page 34: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/34.jpg)
![Page 35: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/35.jpg)
![Page 36: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/36.jpg)
DifferentVolume, Variety,
Velocity
![Page 37: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/37.jpg)
Aim is to createa funnel after
proper validationand cleaning
![Page 38: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/38.jpg)
Ingestion Layer
![Page 39: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/39.jpg)
Signal-to-Noiseratio10:90
![Page 40: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/40.jpg)
separate thenoise from
relevant info
![Page 41: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/41.jpg)
It has capability toValidateCleanseTransformReduceIntegrate
![Page 42: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/42.jpg)
![Page 43: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/43.jpg)
DistributedStorage Layer
![Page 44: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/44.jpg)
Fault toleranceParallelization
![Page 45: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/45.jpg)
HDFSmassively scalable distributed
file system
![Page 46: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/46.jpg)
HDFS
![Page 47: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/47.jpg)
HDFS Architecture
![Page 48: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/48.jpg)
Non-relational,distributed data?
![Page 49: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/49.jpg)
NoSQL
![Page 50: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/50.jpg)
CAP theoremConsistency, Availability,
Partition Tolerance
![Page 51: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/51.jpg)
![Page 52: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/52.jpg)
![Page 53: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/53.jpg)
Ingestion to DFSSqoop, Flume, MapReduce, ETL
![Page 54: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/54.jpg)
Infrastructure &Platform Layer
![Page 55: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/55.jpg)
Computing &Scalability
![Page 56: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/56.jpg)
Hadoop?
![Page 57: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/57.jpg)
Vertical Scaling
![Page 58: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/58.jpg)
Vertical Scaling
![Page 59: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/59.jpg)
Vertical Scaling
![Page 60: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/60.jpg)
Horizontal Scaling
![Page 61: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/61.jpg)
Horizontal Scaling
![Page 62: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/62.jpg)
Horizontal Scaling
![Page 63: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/63.jpg)
![Page 64: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/64.jpg)
MapReduceis the main computation paradigm
![Page 65: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/65.jpg)
MapReduce
![Page 66: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/66.jpg)
![Page 67: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/67.jpg)
Hadoop 2
![Page 68: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/68.jpg)
What's new?
![Page 69: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/69.jpg)
What's new?
![Page 70: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/70.jpg)
H1 vs. H2
![Page 71: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/71.jpg)
One cluster,distributed storage,
distributed scheduler,many types of applications.
![Page 72: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/72.jpg)
BlueprintsNoSQL with HBaseStream Processing with Storm/SparkGraph Processing with GiraphSQL on Hadoop with ImpalaColumnar Data Formats
![Page 73: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/73.jpg)
Security Layer
![Page 74: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/74.jpg)
Data need to be protectedMeet compliance requirementsIndividual's privacy
![Page 75: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/75.jpg)
Properauthorization and
authenticationneeded
![Page 76: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/76.jpg)
What can we do?Authentication protocol like KerberosEnable file layer encryptionUse SSL, certificates and trusted keysProvision with Chef, Puppet or Ansible like toolsLog all the communication for detecting anomaliesMonitor whole system
![Page 77: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/77.jpg)
Monitoring Layer
![Page 78: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/78.jpg)
Get a completepicture
of our Big Data tech stack
![Page 79: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/79.jpg)
Satisfy SLAs withmin downtime
![Page 80: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/80.jpg)
DataDog
![Page 81: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/81.jpg)
New Relic (Overview)
![Page 82: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/82.jpg)
New Relic (Databases)
![Page 83: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/83.jpg)
Analytics Engine
![Page 84: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/84.jpg)
Co-Existencewith Traditional
BIData warehouse in the traditional wayDistributed MR processing on big data stores
![Page 85: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/85.jpg)
Mediate data in either directioni.e use Hive/HBase with Sqoop
![Page 86: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/86.jpg)
Real-time analysis can leveragelow-latency NoSQL stores
i.e Cassandra, Vertica, ...
![Page 87: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/87.jpg)
R may be used for complexstatistical algorithms
![Page 88: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/88.jpg)
Search Engines
![Page 89: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/89.jpg)
Huge volume andvariety of data
![Page 90: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/90.jpg)
“needle in ahaystack”
![Page 91: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/91.jpg)
![Page 92: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/92.jpg)
Need blazing fast searchmechanism
to index and search for big dataanalytics
![Page 93: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/93.jpg)
![Page 94: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/94.jpg)
Elastic Search,Solr, ...
![Page 95: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/95.jpg)
Real-timeProcessing
![Page 96: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/96.jpg)
In memory?
![Page 98: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/98.jpg)
![Page 99: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/99.jpg)
![Page 100: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/100.jpg)
![Page 101: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/101.jpg)
Storm, Kinesis,Flink, ...
![Page 102: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/102.jpg)
VisualizationLayer
![Page 103: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/103.jpg)
Gain insight fasterLook at different aspects of
data visually
![Page 104: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/104.jpg)
![Page 105: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/105.jpg)
Tableau
![Page 106: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/106.jpg)
ChartIO
![Page 107: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/107.jpg)
LambdaArchitecture
![Page 109: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/109.jpg)
Don't forget
![Page 110: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/110.jpg)
There is no"One Size Fits All"
solution
![Page 111: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/111.jpg)
We need
ContinuousDevelopment
![Page 112: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/112.jpg)
![Page 113: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/113.jpg)
![Page 114: Big Data Tech Stack](https://reader034.vdocuments.site/reader034/viewer/2022050613/587e852f1a28abd6038b735d/html5/thumbnails/114.jpg)
Thank You :)