introduction to hortonworks data cloud for aws
Post on 08-Jan-2017
346 Views
Preview:
TRANSCRIPT
1 ©HortonworksInc.2011– 2016.AllRightsReserved
HortonworksDataCloudEnterprisereadyHadooponthecloud
蒋逸峰(しょういつほう/YifengJiang)SolutionsEngineer,Hortonworks@uprushDecember14,2016
2 ©HortonworksInc.2011– 2016.AllRightsReserved
About Me
蒋 逸峰 (しょう いつほう / Yifeng Jiang)• Solutions Engineer, Hortonworks• Apache HBase book author• I like hiking & running• Twitter: @uprush
3 ©HortonworksInc.2011– 2016.AllRightsReserved
HortonworksDataPlatform(HDP)
4 ©HortonworksInc.2011– 2016.AllRightsReserved
What’sMissing?
à Ambari makesdeployingHDPsupereasy,but..– Itisnoteasytogetthere– Clustersizing– HWpurchase,setupinDC,network– OSsetup
à Averagethreeweeksorevenmore
5 ©HortonworksInc.2011– 2016.AllRightsReserved
©HortonworksInc.2011– 2016.AllRightsReserved6
IntroducingHortonworksDataCloudforAWS
à AnewcloudproductfromHortonworks– PoweredbyHortonworks DataPlatform
à OffersPay-As-You-Go(PAYG)pricing
à DeliveredandsoldviaAWSMarketplace
à HandlesmostcommonbigdatausecaseswithApacheHadoop,Spark,andHive– Choosefromasetofprescriptiveclustertypes
à Focusesoneaseofuseandbusinessagility– Avoidsinfiniteconfigurabilityandcustomization
à OptionalFreeCommunitySupport**
**EnterpriseSupportoptioncomingsoon
7 ©HortonworksInc.2011– 2016.AllRightsReserved
DEMO
8 ©HortonworksInc.2011– 2016.AllRightsReserved
Architecture
AmazonWebServices
CloudbreakServices
Cloudcontroller(akaCloudbreak)
CloudbreakDB
Connector
AWS GCE Azure
HDPCluster:ETL/EDW
MasterGroupMasterGroup:Hive,Spark
Ambari
SlaveGroup
Blueprint
HDPCluster:Analytics
MasterGroupMasterGroup:LLAP,Zeppelin
Ambari
SlaveGroup
Blueprint
CloudbreakDeployer
Accesstools
Shell RESTAPI WebUI
OpenStack
S3aFileSystem
S3aFileSystem
9 ©HortonworksInc.2011– 2016.AllRightsReserved
HortonworksDataCloud- Summary
à Launchandmanageclustersbyworkloadtype– ETL/EDW,Datascience,Businessanalytics
à Usehighlyscalable,durablestoragefordata(S3)&metadata(RDS)
à Sharedataandmetadataamongmultipleephemeralclusters
à Scaleupanddownattheclickofabutton
à SecureclusterswithIAMroles,securitygroups,etc.
10 ©HortonworksInc.2011– 2016.AllRightsReserved
ImprovingEnterpriseReadiness
11 ©HortonworksInc.2011– 2016.AllRightsReserved
EnterpriseReadiness
Improvingenterprisereadinessinthecloudà Cloudstorageà Securityandgovernanceà Reliabilityandfaulttolerance
12 ©HortonworksInc.2011– 2016.AllRightsReserved
MatchingHadoopwiththeCloud
Datacenter• DataLocality• Consistent
Storage• Singlecluster
administration
Cloud• Scalablestorage• Customizability• Costeffective
compute
• Scalablestoragewithperformanceandconsistency
• Customizabilitywitheaseofadministration
• Costeffectivecomputewith SLApolicies
13 ©HortonworksInc.2011– 2016.AllRightsReserved
CloudStorageaccessfacts
HDFS
Application
Input Output tmp
Interactionmodels
Application
HDFSInput
Output
Copy
à Cloudstorageoptimizesforscale– S3dataisreplicatedforhighscaleaccess,durability
à Dataaccessisremote– Datalocality– Costliermetadataoperations(E.g.hadoop fs–mv isactuallyacopyanddelete)
à EventualConsistency– Takestimeforeffectofmodificationoperationstopermeatetoallcopies
14 ©HortonworksInc.2011– 2016.AllRightsReserved
PerformancewithScalability
à Generalstrategy:Optimizebyworkloadtypes
à ETLworkloads
– Typicalpipeline:Bringindata=>Transform=>Repairpartitions=>Computestatistics
– Multiplemetadatacalls:Batchedandissuedinparallelforperformancegains
à Distcp
– Optimizedbuffermanagementfortransferringlargefiles
– RandomizeinputtoDistcp toavoidhot-spottingS3nodes
15 ©HortonworksInc.2011– 2016.AllRightsReserved
PerformancewithScalability
à Analyticsworkloads– ORCfilerelatedoptimizations
– Supportfastrandomaccessreads(bothdirections)byavoidingtearingdown
S3HTTPconnections
– Passindexinformationtocomputetasksaspartofsplitdatatoavoidre-
computation
à Status:Available,butperformanceoptimizationsneverstopJ
https://hortonworks.github.io/hdp-aws/s3-performance/index.html
16 ©HortonworksInc.2011– 2016.AllRightsReserved
Correctnesswithstrongconsistency
à Writeoperationsfollowedbyreadmaynotreturncorrect
results
– Issuesfordatapipelines,multi-stagejobs,etc.
à S3Guardproject:Intermediate,consistentmetadatastore
à WritecallsfromS3AFileSystemupdatebothS3andmetadata
store
à S3AFileSystemautomaticallytriestoreconcilemetadata
betweenS3andmetadatastoreonsubsequentreads
– Inconsistenciesarehandledbasedonpolicy
à Status:Inprogress
16
https://issues.apache.org/jira/browse/HADOOP-13345
17 ©HortonworksInc.2011– 2016.AllRightsReserved
SecuringdataaccessviaIAMRoles
à Integrationwithcloudprovider
à ProvideanIAMroleasinstanceprofileforacluster
à AttachpoliciesforaccessingS3totherole– E.g.Read-onlyaccessforBIclusterto
specificbuckets
à Status:Available
18 ©HortonworksInc.2011– 2016.AllRightsReserved
DataSecurityinHadoop
ApacheRangerà Finegrained,role-basedaccesspoliciesto
data– Table/columnlevelACL
à Auditaccessinformationà Rowlevelfilteringà Dynamicdatamasking
19 ©HortonworksInc.2011– 2016.AllRightsReserved
DataGovernanceinHadoop
ApacheAtlasà Autodiscover&indexmetadataà Tagdataà Trackdatalineage
20 ©HortonworksInc.2011– 2016.AllRightsReserved
Datagovernancetechnicalarchitecture– OnPremise
OnPremiseHDPCluster
RangerAdmin
PolicyPolicy
AtlasAdmin Metadata
GovernedHDPComponent(E.g.Hive)
RangerPlugin
AtlasPlugin
LDAP/AD
DataSteward
21 ©HortonworksInc.2011– 2016.AllRightsReserved
DataGovernanceintheCloud:Easeofadministrationwithflexibility
à Nolongerasinglecomputeclustergenerating/accessingdata
à Data&Metadataarestillsingleandshared
à EvolveAtlasandRangertobedatalakecentricthanclustercentric– SharedlongrunningAdmincomponents– Ephemeralpluginsoncomputeclusters
à Status:AvailableasaTechPreview
https://github.com/hortonworks/hdc-cli/blob/master/shared_cluster.md
22 ©HortonworksInc.2011– 2016.AllRightsReserved
SharedRanger/Atlasadminservices
AvailableinTechPreviewinHortonworksDataCloud
ETL-EDWCluster
GovernedHDPComponent(E.g.Hive)
LDAP/AD
RangerPlugin
AtlasPlugin
DataAnalyticsCluster
GovernedHDPComponent(E.g.Hive)
RangerPlugin
AtlasPlugin
RangerAdmin Policy
Policy
AtlasAdmin Metadata
CloudController
SharedEnterpriseServices
DataSteward
23 ©HortonworksInc.2011– 2016.AllRightsReserved
HDPCloudComputenodesonAWS
à RegularEC2instancesà CanattachEBSvolumesorephemeralstoragedisksà Groupedaccordingtofunctionality/accessrequirementsà Opportunisticprovisioning– spotinstances(workinprogress)
HDPCluster
MasterGroupGroup#1
Gatewaynode:Ambari
MasterGroupGroup#2
CloudController
24 ©HortonworksInc.2011– 2016.AllRightsReserved
HDPCloudComputenodesonAWS
24
25 ©HortonworksInc.2011– 2016.AllRightsReserved
Reliabilitywithcostbenefits
à HDPhostinstancescouldbecomeunhealthy– Unreliableunderlyinginfrastructure– Spotinstancesaretransient,dependentonbidprice– SLAimpactforworkloads
à Automaticallyreplaceun-healthynodes– Nocostsincurredifnodeisnotfunctional– Replaceunhealthyinstancestomaintainadesiredcapacity
à Status:Workinprogress
26 ©HortonworksInc.2011– 2016.AllRightsReserved
Auto-recoveryofslavenodes
à UseAmbaritodetectunhealthystatus¬ifyCloudbreakà Decommissionandterminateunhealthyinstancesà Provisionnewinstancesandaddtocluster
HDPCluster
MasterGroupGroup#1
Gatewaynode:Ambari
MasterGroupGroup#2CloudController
27 ©HortonworksInc.2011– 2016.AllRightsReserved
Summary
28 ©HortonworksInc.2011– 2016.AllRightsReserved
OurConnectedDataPlatformSolutions
Hortonworks:PoweringtheFutureofData(Everybusinessisadatabusiness,mastervalueofdataviaopenapproach)
ModernDataApplications(CyberSecurity,IoT,Partners,Custom,etc.)
ConnectedDataPlatforms(ManageAllData:data-at-rest,data-in-motion,datacenter&cloud)
Training|Consulting|CommunityConnection|Partnerworks
DataCenterSolutions CloudSolutions
HortonworksDataCloudforAWS
AzureHDInsight
RackspaceAccentureOthers
HDP HDFSyncsortAtScale
PivotalHDBOthers
EnterpriseSubscription
SmartSense operationalsvc’s24x7SupportMaintenance
Etc.
29 ©HortonworksInc.2011– 2016.AllRightsReserved
http://hortonworks.com/info/aws-marketplace-credits-signup/
30 ©HortonworksInc.2011– 2016.AllRightsReserved
THANKYOU
top related