![Page 1: Introduction to Hortonworks Data Cloud for AWS](https://reader034.vdocuments.site/reader034/viewer/2022042605/587151fe1a28ab8e5b8b464d/html5/thumbnails/1.jpg)
1 ©HortonworksInc.2011– 2016.AllRightsReserved
HortonworksDataCloudEnterprisereadyHadooponthecloud
蒋逸峰(しょういつほう/YifengJiang)SolutionsEngineer,Hortonworks@uprushDecember14,2016
![Page 2: Introduction to Hortonworks Data Cloud for AWS](https://reader034.vdocuments.site/reader034/viewer/2022042605/587151fe1a28ab8e5b8b464d/html5/thumbnails/2.jpg)
2 ©HortonworksInc.2011– 2016.AllRightsReserved
About Me
蒋 逸峰 (しょう いつほう / Yifeng Jiang)• Solutions Engineer, Hortonworks• Apache HBase book author• I like hiking & running• Twitter: @uprush
![Page 3: Introduction to Hortonworks Data Cloud for AWS](https://reader034.vdocuments.site/reader034/viewer/2022042605/587151fe1a28ab8e5b8b464d/html5/thumbnails/3.jpg)
3 ©HortonworksInc.2011– 2016.AllRightsReserved
HortonworksDataPlatform(HDP)
![Page 4: Introduction to Hortonworks Data Cloud for AWS](https://reader034.vdocuments.site/reader034/viewer/2022042605/587151fe1a28ab8e5b8b464d/html5/thumbnails/4.jpg)
4 ©HortonworksInc.2011– 2016.AllRightsReserved
What’sMissing?
à Ambari makesdeployingHDPsupereasy,but..– Itisnoteasytogetthere– Clustersizing– HWpurchase,setupinDC,network– OSsetup
à Averagethreeweeksorevenmore
![Page 5: Introduction to Hortonworks Data Cloud for AWS](https://reader034.vdocuments.site/reader034/viewer/2022042605/587151fe1a28ab8e5b8b464d/html5/thumbnails/5.jpg)
5 ©HortonworksInc.2011– 2016.AllRightsReserved
![Page 6: Introduction to Hortonworks Data Cloud for AWS](https://reader034.vdocuments.site/reader034/viewer/2022042605/587151fe1a28ab8e5b8b464d/html5/thumbnails/6.jpg)
©HortonworksInc.2011– 2016.AllRightsReserved6
IntroducingHortonworksDataCloudforAWS
à AnewcloudproductfromHortonworks– PoweredbyHortonworks DataPlatform
à OffersPay-As-You-Go(PAYG)pricing
à DeliveredandsoldviaAWSMarketplace
à HandlesmostcommonbigdatausecaseswithApacheHadoop,Spark,andHive– Choosefromasetofprescriptiveclustertypes
à Focusesoneaseofuseandbusinessagility– Avoidsinfiniteconfigurabilityandcustomization
à OptionalFreeCommunitySupport**
**EnterpriseSupportoptioncomingsoon
![Page 7: Introduction to Hortonworks Data Cloud for AWS](https://reader034.vdocuments.site/reader034/viewer/2022042605/587151fe1a28ab8e5b8b464d/html5/thumbnails/7.jpg)
7 ©HortonworksInc.2011– 2016.AllRightsReserved
DEMO
![Page 8: Introduction to Hortonworks Data Cloud for AWS](https://reader034.vdocuments.site/reader034/viewer/2022042605/587151fe1a28ab8e5b8b464d/html5/thumbnails/8.jpg)
8 ©HortonworksInc.2011– 2016.AllRightsReserved
Architecture
AmazonWebServices
CloudbreakServices
Cloudcontroller(akaCloudbreak)
CloudbreakDB
Connector
AWS GCE Azure
HDPCluster:ETL/EDW
MasterGroupMasterGroup:Hive,Spark
Ambari
SlaveGroup
Blueprint
HDPCluster:Analytics
MasterGroupMasterGroup:LLAP,Zeppelin
Ambari
SlaveGroup
Blueprint
CloudbreakDeployer
Accesstools
Shell RESTAPI WebUI
OpenStack
S3aFileSystem
S3aFileSystem
![Page 9: Introduction to Hortonworks Data Cloud for AWS](https://reader034.vdocuments.site/reader034/viewer/2022042605/587151fe1a28ab8e5b8b464d/html5/thumbnails/9.jpg)
9 ©HortonworksInc.2011– 2016.AllRightsReserved
HortonworksDataCloud- Summary
à Launchandmanageclustersbyworkloadtype– ETL/EDW,Datascience,Businessanalytics
à Usehighlyscalable,durablestoragefordata(S3)&metadata(RDS)
à Sharedataandmetadataamongmultipleephemeralclusters
à Scaleupanddownattheclickofabutton
à SecureclusterswithIAMroles,securitygroups,etc.
![Page 10: Introduction to Hortonworks Data Cloud for AWS](https://reader034.vdocuments.site/reader034/viewer/2022042605/587151fe1a28ab8e5b8b464d/html5/thumbnails/10.jpg)
10 ©HortonworksInc.2011– 2016.AllRightsReserved
ImprovingEnterpriseReadiness
![Page 11: Introduction to Hortonworks Data Cloud for AWS](https://reader034.vdocuments.site/reader034/viewer/2022042605/587151fe1a28ab8e5b8b464d/html5/thumbnails/11.jpg)
11 ©HortonworksInc.2011– 2016.AllRightsReserved
EnterpriseReadiness
Improvingenterprisereadinessinthecloudà Cloudstorageà Securityandgovernanceà Reliabilityandfaulttolerance
![Page 12: Introduction to Hortonworks Data Cloud for AWS](https://reader034.vdocuments.site/reader034/viewer/2022042605/587151fe1a28ab8e5b8b464d/html5/thumbnails/12.jpg)
12 ©HortonworksInc.2011– 2016.AllRightsReserved
MatchingHadoopwiththeCloud
Datacenter• DataLocality• Consistent
Storage• Singlecluster
administration
Cloud• Scalablestorage• Customizability• Costeffective
compute
• Scalablestoragewithperformanceandconsistency
• Customizabilitywitheaseofadministration
• Costeffectivecomputewith SLApolicies
![Page 13: Introduction to Hortonworks Data Cloud for AWS](https://reader034.vdocuments.site/reader034/viewer/2022042605/587151fe1a28ab8e5b8b464d/html5/thumbnails/13.jpg)
13 ©HortonworksInc.2011– 2016.AllRightsReserved
CloudStorageaccessfacts
HDFS
Application
Input Output tmp
Interactionmodels
Application
HDFSInput
Output
Copy
à Cloudstorageoptimizesforscale– S3dataisreplicatedforhighscaleaccess,durability
à Dataaccessisremote– Datalocality– Costliermetadataoperations(E.g.hadoop fs–mv isactuallyacopyanddelete)
à EventualConsistency– Takestimeforeffectofmodificationoperationstopermeatetoallcopies
![Page 14: Introduction to Hortonworks Data Cloud for AWS](https://reader034.vdocuments.site/reader034/viewer/2022042605/587151fe1a28ab8e5b8b464d/html5/thumbnails/14.jpg)
14 ©HortonworksInc.2011– 2016.AllRightsReserved
PerformancewithScalability
à Generalstrategy:Optimizebyworkloadtypes
à ETLworkloads
– Typicalpipeline:Bringindata=>Transform=>Repairpartitions=>Computestatistics
– Multiplemetadatacalls:Batchedandissuedinparallelforperformancegains
à Distcp
– Optimizedbuffermanagementfortransferringlargefiles
– RandomizeinputtoDistcp toavoidhot-spottingS3nodes
![Page 15: Introduction to Hortonworks Data Cloud for AWS](https://reader034.vdocuments.site/reader034/viewer/2022042605/587151fe1a28ab8e5b8b464d/html5/thumbnails/15.jpg)
15 ©HortonworksInc.2011– 2016.AllRightsReserved
PerformancewithScalability
à Analyticsworkloads– ORCfilerelatedoptimizations
– Supportfastrandomaccessreads(bothdirections)byavoidingtearingdown
S3HTTPconnections
– Passindexinformationtocomputetasksaspartofsplitdatatoavoidre-
computation
à Status:Available,butperformanceoptimizationsneverstopJ
https://hortonworks.github.io/hdp-aws/s3-performance/index.html
![Page 16: Introduction to Hortonworks Data Cloud for AWS](https://reader034.vdocuments.site/reader034/viewer/2022042605/587151fe1a28ab8e5b8b464d/html5/thumbnails/16.jpg)
16 ©HortonworksInc.2011– 2016.AllRightsReserved
Correctnesswithstrongconsistency
à Writeoperationsfollowedbyreadmaynotreturncorrect
results
– Issuesfordatapipelines,multi-stagejobs,etc.
à S3Guardproject:Intermediate,consistentmetadatastore
à WritecallsfromS3AFileSystemupdatebothS3andmetadata
store
à S3AFileSystemautomaticallytriestoreconcilemetadata
betweenS3andmetadatastoreonsubsequentreads
– Inconsistenciesarehandledbasedonpolicy
à Status:Inprogress
16
https://issues.apache.org/jira/browse/HADOOP-13345
![Page 17: Introduction to Hortonworks Data Cloud for AWS](https://reader034.vdocuments.site/reader034/viewer/2022042605/587151fe1a28ab8e5b8b464d/html5/thumbnails/17.jpg)
17 ©HortonworksInc.2011– 2016.AllRightsReserved
SecuringdataaccessviaIAMRoles
à Integrationwithcloudprovider
à ProvideanIAMroleasinstanceprofileforacluster
à AttachpoliciesforaccessingS3totherole– E.g.Read-onlyaccessforBIclusterto
specificbuckets
à Status:Available
![Page 18: Introduction to Hortonworks Data Cloud for AWS](https://reader034.vdocuments.site/reader034/viewer/2022042605/587151fe1a28ab8e5b8b464d/html5/thumbnails/18.jpg)
18 ©HortonworksInc.2011– 2016.AllRightsReserved
DataSecurityinHadoop
ApacheRangerà Finegrained,role-basedaccesspoliciesto
data– Table/columnlevelACL
à Auditaccessinformationà Rowlevelfilteringà Dynamicdatamasking
![Page 19: Introduction to Hortonworks Data Cloud for AWS](https://reader034.vdocuments.site/reader034/viewer/2022042605/587151fe1a28ab8e5b8b464d/html5/thumbnails/19.jpg)
19 ©HortonworksInc.2011– 2016.AllRightsReserved
DataGovernanceinHadoop
ApacheAtlasà Autodiscover&indexmetadataà Tagdataà Trackdatalineage
![Page 20: Introduction to Hortonworks Data Cloud for AWS](https://reader034.vdocuments.site/reader034/viewer/2022042605/587151fe1a28ab8e5b8b464d/html5/thumbnails/20.jpg)
20 ©HortonworksInc.2011– 2016.AllRightsReserved
Datagovernancetechnicalarchitecture– OnPremise
OnPremiseHDPCluster
RangerAdmin
PolicyPolicy
AtlasAdmin Metadata
GovernedHDPComponent(E.g.Hive)
RangerPlugin
AtlasPlugin
LDAP/AD
DataSteward
![Page 21: Introduction to Hortonworks Data Cloud for AWS](https://reader034.vdocuments.site/reader034/viewer/2022042605/587151fe1a28ab8e5b8b464d/html5/thumbnails/21.jpg)
21 ©HortonworksInc.2011– 2016.AllRightsReserved
DataGovernanceintheCloud:Easeofadministrationwithflexibility
à Nolongerasinglecomputeclustergenerating/accessingdata
à Data&Metadataarestillsingleandshared
à EvolveAtlasandRangertobedatalakecentricthanclustercentric– SharedlongrunningAdmincomponents– Ephemeralpluginsoncomputeclusters
à Status:AvailableasaTechPreview
https://github.com/hortonworks/hdc-cli/blob/master/shared_cluster.md
![Page 22: Introduction to Hortonworks Data Cloud for AWS](https://reader034.vdocuments.site/reader034/viewer/2022042605/587151fe1a28ab8e5b8b464d/html5/thumbnails/22.jpg)
22 ©HortonworksInc.2011– 2016.AllRightsReserved
SharedRanger/Atlasadminservices
AvailableinTechPreviewinHortonworksDataCloud
ETL-EDWCluster
GovernedHDPComponent(E.g.Hive)
LDAP/AD
RangerPlugin
AtlasPlugin
DataAnalyticsCluster
GovernedHDPComponent(E.g.Hive)
RangerPlugin
AtlasPlugin
RangerAdmin Policy
Policy
AtlasAdmin Metadata
CloudController
SharedEnterpriseServices
DataSteward
![Page 23: Introduction to Hortonworks Data Cloud for AWS](https://reader034.vdocuments.site/reader034/viewer/2022042605/587151fe1a28ab8e5b8b464d/html5/thumbnails/23.jpg)
23 ©HortonworksInc.2011– 2016.AllRightsReserved
HDPCloudComputenodesonAWS
à RegularEC2instancesà CanattachEBSvolumesorephemeralstoragedisksà Groupedaccordingtofunctionality/accessrequirementsà Opportunisticprovisioning– spotinstances(workinprogress)
HDPCluster
MasterGroupGroup#1
Gatewaynode:Ambari
MasterGroupGroup#2
CloudController
![Page 24: Introduction to Hortonworks Data Cloud for AWS](https://reader034.vdocuments.site/reader034/viewer/2022042605/587151fe1a28ab8e5b8b464d/html5/thumbnails/24.jpg)
24 ©HortonworksInc.2011– 2016.AllRightsReserved
HDPCloudComputenodesonAWS
24
![Page 25: Introduction to Hortonworks Data Cloud for AWS](https://reader034.vdocuments.site/reader034/viewer/2022042605/587151fe1a28ab8e5b8b464d/html5/thumbnails/25.jpg)
25 ©HortonworksInc.2011– 2016.AllRightsReserved
Reliabilitywithcostbenefits
à HDPhostinstancescouldbecomeunhealthy– Unreliableunderlyinginfrastructure– Spotinstancesaretransient,dependentonbidprice– SLAimpactforworkloads
à Automaticallyreplaceun-healthynodes– Nocostsincurredifnodeisnotfunctional– Replaceunhealthyinstancestomaintainadesiredcapacity
à Status:Workinprogress
![Page 26: Introduction to Hortonworks Data Cloud for AWS](https://reader034.vdocuments.site/reader034/viewer/2022042605/587151fe1a28ab8e5b8b464d/html5/thumbnails/26.jpg)
26 ©HortonworksInc.2011– 2016.AllRightsReserved
Auto-recoveryofslavenodes
à UseAmbaritodetectunhealthystatus¬ifyCloudbreakà Decommissionandterminateunhealthyinstancesà Provisionnewinstancesandaddtocluster
HDPCluster
MasterGroupGroup#1
Gatewaynode:Ambari
MasterGroupGroup#2CloudController
![Page 27: Introduction to Hortonworks Data Cloud for AWS](https://reader034.vdocuments.site/reader034/viewer/2022042605/587151fe1a28ab8e5b8b464d/html5/thumbnails/27.jpg)
27 ©HortonworksInc.2011– 2016.AllRightsReserved
Summary
![Page 28: Introduction to Hortonworks Data Cloud for AWS](https://reader034.vdocuments.site/reader034/viewer/2022042605/587151fe1a28ab8e5b8b464d/html5/thumbnails/28.jpg)
28 ©HortonworksInc.2011– 2016.AllRightsReserved
OurConnectedDataPlatformSolutions
Hortonworks:PoweringtheFutureofData(Everybusinessisadatabusiness,mastervalueofdataviaopenapproach)
ModernDataApplications(CyberSecurity,IoT,Partners,Custom,etc.)
ConnectedDataPlatforms(ManageAllData:data-at-rest,data-in-motion,datacenter&cloud)
Training|Consulting|CommunityConnection|Partnerworks
DataCenterSolutions CloudSolutions
HortonworksDataCloudforAWS
AzureHDInsight
RackspaceAccentureOthers
HDP HDFSyncsortAtScale
PivotalHDBOthers
EnterpriseSubscription
SmartSense operationalsvc’s24x7SupportMaintenance
Etc.
![Page 29: Introduction to Hortonworks Data Cloud for AWS](https://reader034.vdocuments.site/reader034/viewer/2022042605/587151fe1a28ab8e5b8b464d/html5/thumbnails/29.jpg)
29 ©HortonworksInc.2011– 2016.AllRightsReserved
http://hortonworks.com/info/aws-marketplace-credits-signup/
![Page 30: Introduction to Hortonworks Data Cloud for AWS](https://reader034.vdocuments.site/reader034/viewer/2022042605/587151fe1a28ab8e5b8b464d/html5/thumbnails/30.jpg)
30 ©HortonworksInc.2011– 2016.AllRightsReserved
THANKYOU