big data and analytics - ada...
TRANSCRIPT
![Page 1: Big Data and Analytics - ADA Universityaadamov/sources/slides/bigdata/week-4-BDA-Hadoop-Ecosystem.pdfHadoop 2.0 vs Hadoop 1.0 – Processing The Hadoop Ecosystem Hadoop. Hortonworks](https://reader033.vdocuments.site/reader033/viewer/2022042011/5e724f0903b64244ab403ab6/html5/thumbnails/1.jpg)
BigDataandAnalyticsHadoopEcosystem
Dr.Abzetdin AdamovSchoolofInformationTechnologyandEngineering
ADAUniversityhttp://site.ada.qu.edu.az/~aadamov
![Page 2: Big Data and Analytics - ADA Universityaadamov/sources/slides/bigdata/week-4-BDA-Hadoop-Ecosystem.pdfHadoop 2.0 vs Hadoop 1.0 – Processing The Hadoop Ecosystem Hadoop. Hortonworks](https://reader033.vdocuments.site/reader033/viewer/2022042011/5e724f0903b64244ab403ab6/html5/thumbnails/2.jpg)
PreviouslyCoveredTopics
• KeydifferencesofTraditionalandBigDataArchitecture• TransferringComputationPoweragainstTransferringData• SchemaonReadvsSchemaonWrite• HadoopCore– Storage:HDFSArchitecture• HadoopCore– Processing:MapReduce Architecture
![Page 3: Big Data and Analytics - ADA Universityaadamov/sources/slides/bigdata/week-4-BDA-Hadoop-Ecosystem.pdfHadoop 2.0 vs Hadoop 1.0 – Processing The Hadoop Ecosystem Hadoop. Hortonworks](https://reader033.vdocuments.site/reader033/viewer/2022042011/5e724f0903b64244ab403ab6/html5/thumbnails/3.jpg)
Objectives
• Vagrant+Provisioning+VirtualBox =RepeatableMultiWMs• Hadoop2.0vsHadoop1.0• HadoopEcosystemComponentsClassification• HadoopEcosystemComponentsKeyFeatures
![Page 4: Big Data and Analytics - ADA Universityaadamov/sources/slides/bigdata/week-4-BDA-Hadoop-Ecosystem.pdfHadoop 2.0 vs Hadoop 1.0 – Processing The Hadoop Ecosystem Hadoop. Hortonworks](https://reader033.vdocuments.site/reader033/viewer/2022042011/5e724f0903b64244ab403ab6/html5/thumbnails/4.jpg)
![Page 5: Big Data and Analytics - ADA Universityaadamov/sources/slides/bigdata/week-4-BDA-Hadoop-Ecosystem.pdfHadoop 2.0 vs Hadoop 1.0 – Processing The Hadoop Ecosystem Hadoop. Hortonworks](https://reader033.vdocuments.site/reader033/viewer/2022042011/5e724f0903b64244ab403ab6/html5/thumbnails/5.jpg)
![Page 6: Big Data and Analytics - ADA Universityaadamov/sources/slides/bigdata/week-4-BDA-Hadoop-Ecosystem.pdfHadoop 2.0 vs Hadoop 1.0 – Processing The Hadoop Ecosystem Hadoop. Hortonworks](https://reader033.vdocuments.site/reader033/viewer/2022042011/5e724f0903b64244ab403ab6/html5/thumbnails/6.jpg)
![Page 7: Big Data and Analytics - ADA Universityaadamov/sources/slides/bigdata/week-4-BDA-Hadoop-Ecosystem.pdfHadoop 2.0 vs Hadoop 1.0 – Processing The Hadoop Ecosystem Hadoop. Hortonworks](https://reader033.vdocuments.site/reader033/viewer/2022042011/5e724f0903b64244ab403ab6/html5/thumbnails/7.jpg)
HadoopEcosystemComponents
![Page 8: Big Data and Analytics - ADA Universityaadamov/sources/slides/bigdata/week-4-BDA-Hadoop-Ecosystem.pdfHadoop 2.0 vs Hadoop 1.0 – Processing The Hadoop Ecosystem Hadoop. Hortonworks](https://reader033.vdocuments.site/reader033/viewer/2022042011/5e724f0903b64244ab403ab6/html5/thumbnails/8.jpg)
CompaniesbuildingontopofHadoop
• AmazonWebServices• Cloudera• Hortonworks• IBM• Intel• MapR Technologies• Microsoft• PivotalSoftware• Teradata
![Page 9: Big Data and Analytics - ADA Universityaadamov/sources/slides/bigdata/week-4-BDA-Hadoop-Ecosystem.pdfHadoop 2.0 vs Hadoop 1.0 – Processing The Hadoop Ecosystem Hadoop. Hortonworks](https://reader033.vdocuments.site/reader033/viewer/2022042011/5e724f0903b64244ab403ab6/html5/thumbnails/9.jpg)
PoweredbyApacheHadoop
• https://wiki.apache.org/hadoop/PoweredBy
• ThousandscompaniesandorganizationswithHadoopClustersizefromseveraltohundredsthousandsnodes(40.000atYahoo)
![Page 10: Big Data and Analytics - ADA Universityaadamov/sources/slides/bigdata/week-4-BDA-Hadoop-Ecosystem.pdfHadoop 2.0 vs Hadoop 1.0 – Processing The Hadoop Ecosystem Hadoop. Hortonworks](https://reader033.vdocuments.site/reader033/viewer/2022042011/5e724f0903b64244ab403ab6/html5/thumbnails/10.jpg)
HadoopCore=Storage+Compute
storage storage
storage storage
CPU RAM
YetAnotherResourceNegotiator(YARN)
HadoopDistributedFileSystem(HDFS)
![Page 11: Big Data and Analytics - ADA Universityaadamov/sources/slides/bigdata/week-4-BDA-Hadoop-Ecosystem.pdfHadoop 2.0 vs Hadoop 1.0 – Processing The Hadoop Ecosystem Hadoop. Hortonworks](https://reader033.vdocuments.site/reader033/viewer/2022042011/5e724f0903b64244ab403ab6/html5/thumbnails/11.jpg)
Hadoop2.0vsHadoop1.0
![Page 12: Big Data and Analytics - ADA Universityaadamov/sources/slides/bigdata/week-4-BDA-Hadoop-Ecosystem.pdfHadoop 2.0 vs Hadoop 1.0 – Processing The Hadoop Ecosystem Hadoop. Hortonworks](https://reader033.vdocuments.site/reader033/viewer/2022042011/5e724f0903b64244ab403ab6/html5/thumbnails/12.jpg)
![Page 13: Big Data and Analytics - ADA Universityaadamov/sources/slides/bigdata/week-4-BDA-Hadoop-Ecosystem.pdfHadoop 2.0 vs Hadoop 1.0 – Processing The Hadoop Ecosystem Hadoop. Hortonworks](https://reader033.vdocuments.site/reader033/viewer/2022042011/5e724f0903b64244ab403ab6/html5/thumbnails/13.jpg)
![Page 14: Big Data and Analytics - ADA Universityaadamov/sources/slides/bigdata/week-4-BDA-Hadoop-Ecosystem.pdfHadoop 2.0 vs Hadoop 1.0 – Processing The Hadoop Ecosystem Hadoop. Hortonworks](https://reader033.vdocuments.site/reader033/viewer/2022042011/5e724f0903b64244ab403ab6/html5/thumbnails/14.jpg)
Hadoop1.0Bottlenecks:HDFS/MapReduce
![Page 15: Big Data and Analytics - ADA Universityaadamov/sources/slides/bigdata/week-4-BDA-Hadoop-Ecosystem.pdfHadoop 2.0 vs Hadoop 1.0 – Processing The Hadoop Ecosystem Hadoop. Hortonworks](https://reader033.vdocuments.site/reader033/viewer/2022042011/5e724f0903b64244ab403ab6/html5/thumbnails/15.jpg)
Hadoop2.0Architechture
![Page 16: Big Data and Analytics - ADA Universityaadamov/sources/slides/bigdata/week-4-BDA-Hadoop-Ecosystem.pdfHadoop 2.0 vs Hadoop 1.0 – Processing The Hadoop Ecosystem Hadoop. Hortonworks](https://reader033.vdocuments.site/reader033/viewer/2022042011/5e724f0903b64244ab403ab6/html5/thumbnails/16.jpg)
YARN/MRv2vsMRv1Architecture
![Page 17: Big Data and Analytics - ADA Universityaadamov/sources/slides/bigdata/week-4-BDA-Hadoop-Ecosystem.pdfHadoop 2.0 vs Hadoop 1.0 – Processing The Hadoop Ecosystem Hadoop. Hortonworks](https://reader033.vdocuments.site/reader033/viewer/2022042011/5e724f0903b64244ab403ab6/html5/thumbnails/17.jpg)
Hadoop2.0vsHadoop1.0– Processing
![Page 18: Big Data and Analytics - ADA Universityaadamov/sources/slides/bigdata/week-4-BDA-Hadoop-Ecosystem.pdfHadoop 2.0 vs Hadoop 1.0 – Processing The Hadoop Ecosystem Hadoop. Hortonworks](https://reader033.vdocuments.site/reader033/viewer/2022042011/5e724f0903b64244ab403ab6/html5/thumbnails/18.jpg)
TheHadoopEcosystem
Hadoop
![Page 19: Big Data and Analytics - ADA Universityaadamov/sources/slides/bigdata/week-4-BDA-Hadoop-Ecosystem.pdfHadoop 2.0 vs Hadoop 1.0 – Processing The Hadoop Ecosystem Hadoop. Hortonworks](https://reader033.vdocuments.site/reader033/viewer/2022042011/5e724f0903b64244ab403ab6/html5/thumbnails/19.jpg)
HortonworksHadoopDistribution
![Page 20: Big Data and Analytics - ADA Universityaadamov/sources/slides/bigdata/week-4-BDA-Hadoop-Ecosystem.pdfHadoop 2.0 vs Hadoop 1.0 – Processing The Hadoop Ecosystem Hadoop. Hortonworks](https://reader033.vdocuments.site/reader033/viewer/2022042011/5e724f0903b64244ab403ab6/html5/thumbnails/20.jpg)
ClassificationofHadoopEcosystemComponents
AdministrationandServerCoordination Hue
DistributedStorage
ResourceManagement
ProcessingFramework
API
Analytics
Ambari Zookeeper
DataManagement Flume Sqoop
WorkflowEngine Oozie
WorkflowEngine Avro
HDFS
YARN
MapReduce
Mahout
MapReduce v2
MapReduce Pig HBase
Tez Hoya
Hive
![Page 21: Big Data and Analytics - ADA Universityaadamov/sources/slides/bigdata/week-4-BDA-Hadoop-Ecosystem.pdfHadoop 2.0 vs Hadoop 1.0 – Processing The Hadoop Ecosystem Hadoop. Hortonworks](https://reader033.vdocuments.site/reader033/viewer/2022042011/5e724f0903b64244ab403ab6/html5/thumbnails/21.jpg)
ClassificationofHadoopEcosystemComponents
![Page 22: Big Data and Analytics - ADA Universityaadamov/sources/slides/bigdata/week-4-BDA-Hadoop-Ecosystem.pdfHadoop 2.0 vs Hadoop 1.0 – Processing The Hadoop Ecosystem Hadoop. Hortonworks](https://reader033.vdocuments.site/reader033/viewer/2022042011/5e724f0903b64244ab403ab6/html5/thumbnails/22.jpg)
HadoopEcosystemComponents
![Page 23: Big Data and Analytics - ADA Universityaadamov/sources/slides/bigdata/week-4-BDA-Hadoop-Ecosystem.pdfHadoop 2.0 vs Hadoop 1.0 – Processing The Hadoop Ecosystem Hadoop. Hortonworks](https://reader033.vdocuments.site/reader033/viewer/2022042011/5e724f0903b64244ab403ab6/html5/thumbnails/23.jpg)
DataManagementFrameworks
Framework Description
HadoopDistributedFileSystem(HDFS)
AJava-based, distributedfilesystemthatprovidesscalable,reliable,high-throughputaccesstoapplication datastoredacrosscommodityservers
YetAnotherResourceNegotiator(YARN)
Aframeworkforcluster resourcemanagementandjobscheduling
![Page 24: Big Data and Analytics - ADA Universityaadamov/sources/slides/bigdata/week-4-BDA-Hadoop-Ecosystem.pdfHadoop 2.0 vs Hadoop 1.0 – Processing The Hadoop Ecosystem Hadoop. Hortonworks](https://reader033.vdocuments.site/reader033/viewer/2022042011/5e724f0903b64244ab403ab6/html5/thumbnails/24.jpg)
OperationsFrameworksFramework Description
Ambari AWeb-basedframework forprovisioning,managing,andmonitoringHadoopclusters
ZooKeeper Ahigh-performance coordinationservicefordistributedapplications
Cloudbreak AtoolforprovisioningandmanagingHadoopclustersinthecloud
Oozie Aserver-basedworkflowengine usedtoexecuteHadoopjobs
![Page 25: Big Data and Analytics - ADA Universityaadamov/sources/slides/bigdata/week-4-BDA-Hadoop-Ecosystem.pdfHadoop 2.0 vs Hadoop 1.0 – Processing The Hadoop Ecosystem Hadoop. Hortonworks](https://reader033.vdocuments.site/reader033/viewer/2022042011/5e724f0903b64244ab403ab6/html5/thumbnails/25.jpg)
Ambari WEBUI(REST)
![Page 26: Big Data and Analytics - ADA Universityaadamov/sources/slides/bigdata/week-4-BDA-Hadoop-Ecosystem.pdfHadoop 2.0 vs Hadoop 1.0 – Processing The Hadoop Ecosystem Hadoop. Hortonworks](https://reader033.vdocuments.site/reader033/viewer/2022042011/5e724f0903b64244ab403ab6/html5/thumbnails/26.jpg)
DataAccessFrameworksFramework DescriptionPig Ahigh-levelplatformforextracting, transforming,oranalyzinglargedatasets
Hive AdatawarehouseinfrastructurethatsupportsadhocSQLqueries
HCatalog Atableinformation,schema,andmetadatamanagementlayersupportingHive,Pig,MapReduce,andTezprocessing
Cascading Anapplication developmentframeworkforbuildingdataapplications,abstractingthedetailsofcomplexMapReduceprogramming
HBase Ascalable,distributed NoSQLdatabasethatsupportsstructureddatastorageforlargetables
Phoenix Aclient-sideSQLlayer overHBasethatprovideslow-latencyaccesstoHBasedata
Accumulo Alow-latency,largetabledatastorageandretrievalsystemwithcell-levelsecurity
Storm Adistributed computationsystemforprocessingcontinuousstreamsofreal-timedata
Solr Adistributedsearch platformcapableofindexingpetabytesofdata
Spark A fast,generalpurposeprocessingengineusetobuildandrunsophisticatedSQL,streaming,machinelearning,orgraphicsapplications
![Page 27: Big Data and Analytics - ADA Universityaadamov/sources/slides/bigdata/week-4-BDA-Hadoop-Ecosystem.pdfHadoop 2.0 vs Hadoop 1.0 – Processing The Hadoop Ecosystem Hadoop. Hortonworks](https://reader033.vdocuments.site/reader033/viewer/2022042011/5e724f0903b64244ab403ab6/html5/thumbnails/27.jpg)
GovernanceandIntegrationFrameworksFramework DescriptionFalcon Adatagovernancetoolprovidingworkfloworchestration, datalifecycle
management,anddatareplicationservices.WebHDFS ARESTAPI that usesthestandardHTTPverbstoaccess,operate,andmanage
HDFSHDFSNFSGateway A gatewaythatenables accesstoHDFSasanNFSmountedfile systemFlume A distributed,reliable,andhighly-availableservicethatefficientlycollects,
aggregates,andmovesstreamingdataSqoop Asetoftoolsfor importingandexportingdatabetweenHadoopandRDBM
systemsKafka Afast,scalable,durable,andfault-tolerantpublish-subscribemessagingsystemAtlas Ascalableandextensible setofcoregovernanceservicesenablingenterprisesto
meetcomplianceanddataintegrationrequirements
![Page 28: Big Data and Analytics - ADA Universityaadamov/sources/slides/bigdata/week-4-BDA-Hadoop-Ecosystem.pdfHadoop 2.0 vs Hadoop 1.0 – Processing The Hadoop Ecosystem Hadoop. Hortonworks](https://reader033.vdocuments.site/reader033/viewer/2022042011/5e724f0903b64244ab403ab6/html5/thumbnails/28.jpg)
SecurityFrameworksFramework DescriptionHDFS A storagemanagementservice providingfile anddirectorypermissions,even
moregranularfileanddirectoryaccesscontrollists,andtransparentdataencryption
YARN Aresourcemanagement servicewithaccesscontrollistscontrollingaccesstocomputeresourcesandYARNadministrativefunctions
Hive Adatawarehouseinfrastructure serviceprovidinggranularaccesscontrolstotablecolumnsandrows
Falcon Adatagovernancetoolprovidingaccesscontrol liststhatlimitwhomaysubmitHadoopjobs
Knox AgatewayprovidingperimetersecuritytoaHadoopclusterRanger Acentralized securityframeworkofferingfine-grainedpolicycontrolsforHDFS,
Hive,HBase,Knox,Storm,Kafka,andSolr
![Page 29: Big Data and Analytics - ADA Universityaadamov/sources/slides/bigdata/week-4-BDA-Hadoop-Ecosystem.pdfHadoop 2.0 vs Hadoop 1.0 – Processing The Hadoop Ecosystem Hadoop. Hortonworks](https://reader033.vdocuments.site/reader033/viewer/2022042011/5e724f0903b64244ab403ab6/html5/thumbnails/29.jpg)
EcosystemComponentVersions
![Page 30: Big Data and Analytics - ADA Universityaadamov/sources/slides/bigdata/week-4-BDA-Hadoop-Ecosystem.pdfHadoop 2.0 vs Hadoop 1.0 – Processing The Hadoop Ecosystem Hadoop. Hortonworks](https://reader033.vdocuments.site/reader033/viewer/2022042011/5e724f0903b64244ab403ab6/html5/thumbnails/30.jpg)
HadoopEcosystemComponents’KeyFeatures
![Page 31: Big Data and Analytics - ADA Universityaadamov/sources/slides/bigdata/week-4-BDA-Hadoop-Ecosystem.pdfHadoop 2.0 vs Hadoop 1.0 – Processing The Hadoop Ecosystem Hadoop. Hortonworks](https://reader033.vdocuments.site/reader033/viewer/2022042011/5e724f0903b64244ab403ab6/html5/thumbnails/31.jpg)
HADOOPECOSYSTEMCOMPONENTS
Its important to understand the components in Hadoop Ecosystem to build right solutions for a given business problem.
![Page 32: Big Data and Analytics - ADA Universityaadamov/sources/slides/bigdata/week-4-BDA-Hadoop-Ecosystem.pdfHadoop 2.0 vs Hadoop 1.0 – Processing The Hadoop Ecosystem Hadoop. Hortonworks](https://reader033.vdocuments.site/reader033/viewer/2022042011/5e724f0903b64244ab403ab6/html5/thumbnails/32.jpg)
ClassificationoftheHadoopEcosystemComponents
HadoopisstraightanswerforprocessingBigData.
HadoopEcosystemhasacombinationoftechnologieswhichproficientadvantageinsolvingData-orientedbusinessproblem.
![Page 33: Big Data and Analytics - ADA Universityaadamov/sources/slides/bigdata/week-4-BDA-Hadoop-Ecosystem.pdfHadoop 2.0 vs Hadoop 1.0 – Processing The Hadoop Ecosystem Hadoop. Hortonworks](https://reader033.vdocuments.site/reader033/viewer/2022042011/5e724f0903b64244ab403ab6/html5/thumbnails/33.jpg)
COREHADOOPHadoopDistributedFileSystem(HDFS)Standsfor:managingbigdatasetswithHighVolume, VelocityandVariety.
MapReduceStandsfor:processinghighvolumedistributeddata
YetAnotherResourceNegotiator(YARN)Standsfor:resourcemanagement,jobscheduling andmonitoring
![Page 34: Big Data and Analytics - ADA Universityaadamov/sources/slides/bigdata/week-4-BDA-Hadoop-Ecosystem.pdfHadoop 2.0 vs Hadoop 1.0 – Processing The Hadoop Ecosystem Hadoop. Hortonworks](https://reader033.vdocuments.site/reader033/viewer/2022042011/5e724f0903b64244ab403ab6/html5/thumbnails/34.jpg)
DATAACCESSApachePigStandsfor:highlevellanguagebuiltontopofMapReduce foranalyzinglargedatasetsandforDataFlow.
ApacheHiveStandsfor:highlevelquery languageanddatawarehouseinfrastructurebuilton topofHadoopforproviding datasummarization,queryandanalysis.
![Page 35: Big Data and Analytics - ADA Universityaadamov/sources/slides/bigdata/week-4-BDA-Hadoop-Ecosystem.pdfHadoop 2.0 vs Hadoop 1.0 – Processing The Hadoop Ecosystem Hadoop. Hortonworks](https://reader033.vdocuments.site/reader033/viewer/2022042011/5e724f0903b64244ab403ab6/html5/thumbnails/35.jpg)
DATASTORAGE
ApacheHBaseStandsfor:NoSQLdatabasebuiltforhostinglargetableswithbillionsofrowsandmillionsofcolumnsontopofHadoop.
CasandraStandsfor:NoSQLdatabasebasedonkey-valuemodeldesigned forlinearscalabilityandhighavailability.
![Page 36: Big Data and Analytics - ADA Universityaadamov/sources/slides/bigdata/week-4-BDA-Hadoop-Ecosystem.pdfHadoop 2.0 vs Hadoop 1.0 – Processing The Hadoop Ecosystem Hadoop. Hortonworks](https://reader033.vdocuments.site/reader033/viewer/2022042011/5e724f0903b64244ab403ab6/html5/thumbnails/36.jpg)
INTERACTION-VISUALIZATION-DEVELOPMENT
HcatalogStandsfor:providing integrationofHivemetadataforotherHadoopapplicationslikePig,MapReduce andothers.
LuceneStandsfor:high-performance, full-featuredtextsearchengine librarywrittenentirelyinJava.
HamaStandsfor:distributed frameworkbasedonBulkSynchronousParallel(BSP)computing formassivescientificcomputations likematrix,graphandnetworkalgorithms.
CrunchStandsfor:writing, testingandrunningMapReduce pipelines.
![Page 37: Big Data and Analytics - ADA Universityaadamov/sources/slides/bigdata/week-4-BDA-Hadoop-Ecosystem.pdfHadoop 2.0 vs Hadoop 1.0 – Processing The Hadoop Ecosystem Hadoop. Hortonworks](https://reader033.vdocuments.site/reader033/viewer/2022042011/5e724f0903b64244ab403ab6/html5/thumbnails/37.jpg)
DATAINELLIGENCE
ApacheDrillStandsfor:lowlatencySQLqueryengineforHadoopandNoSQL.
ApacheMahoutStandsfor:scalablemachinelearning librarydesigned forbuilding predictiveanalyticsonBigData.Mahoutnowhasimplementations apachesparkforfasterinmemorycomputing.
![Page 38: Big Data and Analytics - ADA Universityaadamov/sources/slides/bigdata/week-4-BDA-Hadoop-Ecosystem.pdfHadoop 2.0 vs Hadoop 1.0 – Processing The Hadoop Ecosystem Hadoop. Hortonworks](https://reader033.vdocuments.site/reader033/viewer/2022042011/5e724f0903b64244ab403ab6/html5/thumbnails/38.jpg)
DATAINTEGRATIONApacheSqoopStandsfor:lowlatencySQLqueryengine forHadoopandNoSQL.
ApacheFlumeStandsfor:distributed, reliable,andavailableserviceforefficientlycollecting,aggregating,andmovinglargeamountsoflogdata.
ApacheChukwaStandsfor:scalablelogcollectorusedformonitoring largedistributed filessystems.
![Page 39: Big Data and Analytics - ADA Universityaadamov/sources/slides/bigdata/week-4-BDA-Hadoop-Ecosystem.pdfHadoop 2.0 vs Hadoop 1.0 – Processing The Hadoop Ecosystem Hadoop. Hortonworks](https://reader033.vdocuments.site/reader033/viewer/2022042011/5e724f0903b64244ab403ab6/html5/thumbnails/39.jpg)
MANAGEMENT,MONITORINGandORCHESTRATION
ApacheAmbariStandsfor:simplifying Hadoopmanagementbyproviding aninterfaceforprovisioning,managingandmonitoring ApacheHadoopClusters.
ApacheZookeeperStandsfor:maintainingconfiguration informationnaming,providing distributedsynchronization, andprovidinggroupservices.
ApacheOozieStandsfor:schedulingworkflowtomanageApacheHadoop jobs.
![Page 40: Big Data and Analytics - ADA Universityaadamov/sources/slides/bigdata/week-4-BDA-Hadoop-Ecosystem.pdfHadoop 2.0 vs Hadoop 1.0 – Processing The Hadoop Ecosystem Hadoop. Hortonworks](https://reader033.vdocuments.site/reader033/viewer/2022042011/5e724f0903b64244ab403ab6/html5/thumbnails/40.jpg)
WhereCanWeUseMachineLearning(DataScience)
Healthcare• Predictdiagnosis• Prioritizescreenings• Reducere-admittancerates
Financialservices• FraudDetection/prevention• Predictunderwritingrisk• Newaccountriskscreens
PublicSector• Analyzepublicsentiment• Optimizeresourceallocation• Lawenforcement&security
Retail• Productrecommendation• Inventorymanagement• Priceoptimization
Telco/mobile• Predictcustomerchurn• Predictequipmentfailure• Customerbehavioranalysis
Oil&Gas• Predictivemaintenance• Seismicdatamanagement• Predictwellproduction levels
![Page 41: Big Data and Analytics - ADA Universityaadamov/sources/slides/bigdata/week-4-BDA-Hadoop-Ecosystem.pdfHadoop 2.0 vs Hadoop 1.0 – Processing The Hadoop Ecosystem Hadoop. Hortonworks](https://reader033.vdocuments.site/reader033/viewer/2022042011/5e724f0903b64244ab403ab6/html5/thumbnails/41.jpg)
YARNasaDataOperatingSystem
ApplicationsRunNativelyINHadoop
HDFS2(Redundant,ReliableStorage)
YARN(ClusterResourceManagement)
BATCH(MapReduce)
INTERACTIVE(Tez)
STREAMING(Storm)
GRAPH(Giraph)
IN-MEMORY(Spark)
HPCMPI(OpenMPI)
EXISTING(Slider)
SEARCH(Solr)
Applicationsnowrun“in”Hadoop,insteadof“on”Hadoop.
![Page 42: Big Data and Analytics - ADA Universityaadamov/sources/slides/bigdata/week-4-BDA-Hadoop-Ecosystem.pdfHadoop 2.0 vs Hadoop 1.0 – Processing The Hadoop Ecosystem Hadoop. Hortonworks](https://reader033.vdocuments.site/reader033/viewer/2022042011/5e724f0903b64244ab403ab6/html5/thumbnails/42.jpg)
Next Generation AnalyticsIterative & ExploratoryData is the structure
Traditional AnalyticsStructured & Repeatable
Structure built to store data
42
ModernDataApplicationsapproachtoInsights
Start with hypothesisTest against selected data
Data leads the way Explore all data, identify correlations
Analyze after landing… Analyze in motion…
![Page 43: Big Data and Analytics - ADA Universityaadamov/sources/slides/bigdata/week-4-BDA-Hadoop-Ecosystem.pdfHadoop 2.0 vs Hadoop 1.0 – Processing The Hadoop Ecosystem Hadoop. Hortonworks](https://reader033.vdocuments.site/reader033/viewer/2022042011/5e724f0903b64244ab403ab6/html5/thumbnails/43.jpg)
![Page 44: Big Data and Analytics - ADA Universityaadamov/sources/slides/bigdata/week-4-BDA-Hadoop-Ecosystem.pdfHadoop 2.0 vs Hadoop 1.0 – Processing The Hadoop Ecosystem Hadoop. Hortonworks](https://reader033.vdocuments.site/reader033/viewer/2022042011/5e724f0903b64244ab403ab6/html5/thumbnails/44.jpg)
Q&A ?Abzetdin Adamov,Assoc Prof.Emailmeat:[email protected]:@Linktomeat:www.linkedin.com/in/adamovVisitmyblogat:aadamov.wordpress.com