introduction to hortonworks data cloud for aws

Post on 08-Jan-2017

346 Views

Category:

Software

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1 ©HortonworksInc.2011– 2016.AllRightsReserved

HortonworksDataCloudEnterprisereadyHadooponthecloud

蒋逸峰(しょういつほう/YifengJiang)SolutionsEngineer,Hortonworks@uprushDecember14,2016

2 ©HortonworksInc.2011– 2016.AllRightsReserved

About Me

蒋 逸峰 (しょう いつほう / Yifeng Jiang)• Solutions Engineer, Hortonworks• Apache HBase book author• I like hiking & running• Twitter: @uprush

3 ©HortonworksInc.2011– 2016.AllRightsReserved

HortonworksDataPlatform(HDP)

4 ©HortonworksInc.2011– 2016.AllRightsReserved

What’sMissing?

à Ambari makesdeployingHDPsupereasy,but..– Itisnoteasytogetthere– Clustersizing– HWpurchase,setupinDC,network– OSsetup

à Averagethreeweeksorevenmore

5 ©HortonworksInc.2011– 2016.AllRightsReserved

©HortonworksInc.2011– 2016.AllRightsReserved6

IntroducingHortonworksDataCloudforAWS

à AnewcloudproductfromHortonworks– PoweredbyHortonworks DataPlatform

à OffersPay-As-You-Go(PAYG)pricing

à DeliveredandsoldviaAWSMarketplace

à HandlesmostcommonbigdatausecaseswithApacheHadoop,Spark,andHive– Choosefromasetofprescriptiveclustertypes

à Focusesoneaseofuseandbusinessagility– Avoidsinfiniteconfigurabilityandcustomization

à OptionalFreeCommunitySupport**

**EnterpriseSupportoptioncomingsoon

7 ©HortonworksInc.2011– 2016.AllRightsReserved

DEMO

8 ©HortonworksInc.2011– 2016.AllRightsReserved

Architecture

AmazonWebServices

CloudbreakServices

Cloudcontroller(akaCloudbreak)

CloudbreakDB

Connector

AWS GCE Azure

HDPCluster:ETL/EDW

MasterGroupMasterGroup:Hive,Spark

Ambari

SlaveGroup

Blueprint

HDPCluster:Analytics

MasterGroupMasterGroup:LLAP,Zeppelin

Ambari

SlaveGroup

Blueprint

CloudbreakDeployer

Accesstools

Shell RESTAPI WebUI

OpenStack

S3aFileSystem

S3aFileSystem

9 ©HortonworksInc.2011– 2016.AllRightsReserved

HortonworksDataCloud- Summary

à Launchandmanageclustersbyworkloadtype– ETL/EDW,Datascience,Businessanalytics

à Usehighlyscalable,durablestoragefordata(S3)&metadata(RDS)

à Sharedataandmetadataamongmultipleephemeralclusters

à Scaleupanddownattheclickofabutton

à SecureclusterswithIAMroles,securitygroups,etc.

10 ©HortonworksInc.2011– 2016.AllRightsReserved

ImprovingEnterpriseReadiness

11 ©HortonworksInc.2011– 2016.AllRightsReserved

EnterpriseReadiness

Improvingenterprisereadinessinthecloudà Cloudstorageà Securityandgovernanceà Reliabilityandfaulttolerance

12 ©HortonworksInc.2011– 2016.AllRightsReserved

MatchingHadoopwiththeCloud

Datacenter• DataLocality• Consistent

Storage• Singlecluster

administration

Cloud• Scalablestorage• Customizability• Costeffective

compute

• Scalablestoragewithperformanceandconsistency

• Customizabilitywitheaseofadministration

• Costeffectivecomputewith SLApolicies

13 ©HortonworksInc.2011– 2016.AllRightsReserved

CloudStorageaccessfacts

HDFS

Application

Input Output tmp

Interactionmodels

Application

HDFSInput

Output

Copy

à Cloudstorageoptimizesforscale– S3dataisreplicatedforhighscaleaccess,durability

à Dataaccessisremote– Datalocality– Costliermetadataoperations(E.g.hadoop fs–mv isactuallyacopyanddelete)

à EventualConsistency– Takestimeforeffectofmodificationoperationstopermeatetoallcopies

14 ©HortonworksInc.2011– 2016.AllRightsReserved

PerformancewithScalability

à Generalstrategy:Optimizebyworkloadtypes

à ETLworkloads

– Typicalpipeline:Bringindata=>Transform=>Repairpartitions=>Computestatistics

– Multiplemetadatacalls:Batchedandissuedinparallelforperformancegains

à Distcp

– Optimizedbuffermanagementfortransferringlargefiles

– RandomizeinputtoDistcp toavoidhot-spottingS3nodes

15 ©HortonworksInc.2011– 2016.AllRightsReserved

PerformancewithScalability

à Analyticsworkloads– ORCfilerelatedoptimizations

– Supportfastrandomaccessreads(bothdirections)byavoidingtearingdown

S3HTTPconnections

– Passindexinformationtocomputetasksaspartofsplitdatatoavoidre-

computation

à Status:Available,butperformanceoptimizationsneverstopJ

https://hortonworks.github.io/hdp-aws/s3-performance/index.html

16 ©HortonworksInc.2011– 2016.AllRightsReserved

Correctnesswithstrongconsistency

à Writeoperationsfollowedbyreadmaynotreturncorrect

results

– Issuesfordatapipelines,multi-stagejobs,etc.

à S3Guardproject:Intermediate,consistentmetadatastore

à WritecallsfromS3AFileSystemupdatebothS3andmetadata

store

à S3AFileSystemautomaticallytriestoreconcilemetadata

betweenS3andmetadatastoreonsubsequentreads

– Inconsistenciesarehandledbasedonpolicy

à Status:Inprogress

16

https://issues.apache.org/jira/browse/HADOOP-13345

17 ©HortonworksInc.2011– 2016.AllRightsReserved

SecuringdataaccessviaIAMRoles

à Integrationwithcloudprovider

à ProvideanIAMroleasinstanceprofileforacluster

à AttachpoliciesforaccessingS3totherole– E.g.Read-onlyaccessforBIclusterto

specificbuckets

à Status:Available

18 ©HortonworksInc.2011– 2016.AllRightsReserved

DataSecurityinHadoop

ApacheRangerà Finegrained,role-basedaccesspoliciesto

data– Table/columnlevelACL

à Auditaccessinformationà Rowlevelfilteringà Dynamicdatamasking

19 ©HortonworksInc.2011– 2016.AllRightsReserved

DataGovernanceinHadoop

ApacheAtlasà Autodiscover&indexmetadataà Tagdataà Trackdatalineage

20 ©HortonworksInc.2011– 2016.AllRightsReserved

Datagovernancetechnicalarchitecture– OnPremise

OnPremiseHDPCluster

RangerAdmin

PolicyPolicy

AtlasAdmin Metadata

GovernedHDPComponent(E.g.Hive)

RangerPlugin

AtlasPlugin

LDAP/AD

DataSteward

21 ©HortonworksInc.2011– 2016.AllRightsReserved

DataGovernanceintheCloud:Easeofadministrationwithflexibility

à Nolongerasinglecomputeclustergenerating/accessingdata

à Data&Metadataarestillsingleandshared

à EvolveAtlasandRangertobedatalakecentricthanclustercentric– SharedlongrunningAdmincomponents– Ephemeralpluginsoncomputeclusters

à Status:AvailableasaTechPreview

https://github.com/hortonworks/hdc-cli/blob/master/shared_cluster.md

22 ©HortonworksInc.2011– 2016.AllRightsReserved

SharedRanger/Atlasadminservices

AvailableinTechPreviewinHortonworksDataCloud

ETL-EDWCluster

GovernedHDPComponent(E.g.Hive)

LDAP/AD

RangerPlugin

AtlasPlugin

DataAnalyticsCluster

GovernedHDPComponent(E.g.Hive)

RangerPlugin

AtlasPlugin

RangerAdmin Policy

Policy

AtlasAdmin Metadata

CloudController

SharedEnterpriseServices

DataSteward

23 ©HortonworksInc.2011– 2016.AllRightsReserved

HDPCloudComputenodesonAWS

à RegularEC2instancesà CanattachEBSvolumesorephemeralstoragedisksà Groupedaccordingtofunctionality/accessrequirementsà Opportunisticprovisioning– spotinstances(workinprogress)

HDPCluster

MasterGroupGroup#1

Gatewaynode:Ambari

MasterGroupGroup#2

CloudController

24 ©HortonworksInc.2011– 2016.AllRightsReserved

HDPCloudComputenodesonAWS

24

25 ©HortonworksInc.2011– 2016.AllRightsReserved

Reliabilitywithcostbenefits

à HDPhostinstancescouldbecomeunhealthy– Unreliableunderlyinginfrastructure– Spotinstancesaretransient,dependentonbidprice– SLAimpactforworkloads

à Automaticallyreplaceun-healthynodes– Nocostsincurredifnodeisnotfunctional– Replaceunhealthyinstancestomaintainadesiredcapacity

à Status:Workinprogress

26 ©HortonworksInc.2011– 2016.AllRightsReserved

Auto-recoveryofslavenodes

à UseAmbaritodetectunhealthystatus&notifyCloudbreakà Decommissionandterminateunhealthyinstancesà Provisionnewinstancesandaddtocluster

HDPCluster

MasterGroupGroup#1

Gatewaynode:Ambari

MasterGroupGroup#2CloudController

27 ©HortonworksInc.2011– 2016.AllRightsReserved

Summary

28 ©HortonworksInc.2011– 2016.AllRightsReserved

OurConnectedDataPlatformSolutions

Hortonworks:PoweringtheFutureofData(Everybusinessisadatabusiness,mastervalueofdataviaopenapproach)

ModernDataApplications(CyberSecurity,IoT,Partners,Custom,etc.)

ConnectedDataPlatforms(ManageAllData:data-at-rest,data-in-motion,datacenter&cloud)

Training|Consulting|CommunityConnection|Partnerworks

DataCenterSolutions CloudSolutions

HortonworksDataCloudforAWS

AzureHDInsight

RackspaceAccentureOthers

HDP HDFSyncsortAtScale

PivotalHDBOthers

EnterpriseSubscription

SmartSense operationalsvc’s24x7SupportMaintenance

Etc.

29 ©HortonworksInc.2011– 2016.AllRightsReserved

http://hortonworks.com/info/aws-marketplace-credits-signup/

30 ©HortonworksInc.2011– 2016.AllRightsReserved

THANKYOU

top related