bigdatatech 2015 is hadoop enterprise ready?
Post on 15-Apr-2017
60 Views
Preview:
TRANSCRIPT
ISP promotes ambitious goals
(actuals 2014)
WPC
85% FTEs
489.3Headcount
490
assessed byErnst & Young
Process maturity
4
(actuals 2014)
Average systems
availability
100%
of KPIs on target
General IT controls
86%
Security monitoring,Remote management,
System hosting,Security services
ISP has been growing as a solid business partner
Countries
18SLAs
191Business partners
35+
(1-10 scale, Q4 2014)
Customer satisfaction
8.1
Services
The „A” team
• Don’t hire, train them!• Break out of the silo mentality• DevOps• Agile• Let them choose their own tools• Automation
http://www.pragmatictestlabs.com/
Cloud vs on-premise
• Legal and Regulatory Issues (e.g. data locality, limited responsibility)• Network speed (we are talking BIG data)• Time to market• Initial costs
http://www.softwarefit.com/cloud-erp-vs-on-premise-erp/
Basic network principles
• Machines should be on an isolated network from the rest of the data center
• Machines should have static IPs • Reverse DNS should be setup• Top-of-the-rack switches – hadoop servers are quite chatty• Multi-homed networks are tricky
VLAN configuration example
VLAN Fabric NIC Port Function Failovervlan160_mgmt A eth0 Management,
User connectivity
Fabric failover to B
vlan12_HDFS B eth1 Hadoop Fabric failover to A
vlan11_DATA A eth2 SAN/NAS access, ETL
Fabric failover to B
Cisco reference architecture
Linux general recommendations
• Use FQDNs – required by Ambari, Kerberos • Disable IPTables – since we are within isolated network • Disable SELinux – enabling it can be very challenging• Set swappiness to 1 • Set ulimits to 64k• Disable Transparent Huge Pages• Disable atime• Enable NTP• JBOD for hadoop drives• RAID1 for system drives (if dedicated)
http://blog.cloudera.com/blog/2015/01/how-to-deploy-apache-hadoop-clusters-like-a-boss/
What else do we need?
• Code repository e.g. Stash, GitLab• Open Source package repository for Python (pip), Perl (cpan), R (cran),
Maven Repository Manager …• Integration tools e.g Jenkins• Stepping stone (edge) server • Other RDBMS to store aggregates e.g. MySQL, PostgreSQL• Data scientists server – RStudio, Ipython etc.
Hadoop DR strategy
• No inherent cross data center replication• DistCp can be used for large inter/intra-cluster copying• Data can be ingested into two separate hadoop clusters• Wandisco Non-Stop Hadoop
https://www.wandisco.com/system/files/documentation/WD-Datasheet-NonStop-Hadoop-HortonWorks-WEB.pdf
RHEL
• Kickstart installation• Bladelogic jobs to provision software components e.g. monitoring
agents, security monitoring components• Bladelogic jobs to harden RHEL security according to best practicies• Red Hat Satellite as package distribution and versioning center
• Let Hadoop team manager servers themself – create organization• Create server profile template• Create profiles from a template
UCS Manager - organisation
Ambari blueprint example
{ "configurations" : [ { "configuration-type" : { "property-name" : "property-value", "property-name2" : "property-value" } }, { "configuration-type2" : { "property-name" : "property-value" } } ... ], "host_groups" : [ { "name" : "host-group-name", "components" : [...
https://cwiki.apache.org/confluence/display/AMBARI/Blueprints
Ambari REST API
curl -u admin:$PASSWORD -i -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo": {"context" :"Start HDFS via REST"}, "Body": {"ServiceInfo": {"state": "STARTED"}}}' http://AMBARI_SERVER_HOST:8080/api/v1/clusters/CLUSTER_NAME/services/HDFS
curl -u admin:$PASSWORD -H 'X-Requested-By: ambari' -X GET "http://AMBARI_SERVER_HOST:8080/api/v1/clusters/ing_hdp/components/?ServiceComponentInfo/category.in(SLAVE,MASTER)&host_components/HostRoles/host_name=CLUSTERNODE&fields=host_components/HostRoles/component_name,host_components/HostRoles/state
https://cwiki.apache.org/confluence/display/AMBARI/API+usage+scenarios%2C+troubleshooting%2C+and+other+FAQs
Did you know?
• Upgrading hadoop stack can be still a painful (80 man pages) proceshttp://docs.hortonworks.com/HDPDocuments/Ambari-1.7.0.0/Ambari_Upgrade_v170/Ambari_Upgrade_v170.pdf
• Automated rolling upgrade proces TBDhttps://issues.apache.org/jira/browse/AMBARI-7804
Hadoop metrics monitoring
http://hakunamapdata.com/ganglia-configuration-for-a-small-hadoop-cluster-and-some-troubleshooting/
Hadoop security
• Hadoop is not a single product, choose your components wisely• Up until recently there was no single point for user managment• Maintaining ACL in HDFS is a painful process• No out of the box Active Directory integration
http://blogs.gartner.com/merv-adrian/2014/01/21/security-for-hadoop-dont-look-now/
Typical Flow – Add Wire and File Encryption
http://www.slideshare.net/hortonworks/hdp-security-overview
Is there anything we can do? Start simple!
1. Do not store sensitive data within Hadoop 2. Separate Hadoop environment in a separate network zone (dedicated
vlan/s, firewall filtered traffic)3. Kerberize cluster environment
a) Watch for unkerberized componentsb) Keep your keytabs safe
4. LDAP for central user managment5. Manager your ACLs – start simple with POSIX groups6. Auditting7. Automated HDP cluster kerberization (TBD) https://issues.apache.org/jira/secure/attachment/12671235/12671235_AmbariClusterKerberization.pdf
IPA
At the most basic level, Red Hat Identity Management is a domain controller for Linux and Unix machines.
Where to continue from here?
• hadoop distribution best practicies • Reference architecture papers• http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.2.0/
Cluster_Plan_Gd_v22/Cluster_Plan_Gd_v22.pdf• http://hortonworks.com/get-started/• http://blog.cloudera.com/blog/2015/01/how-to-deploy-apache-hadoop-
clusters-like-a-boss/• http://www.slideshare.net/vinnies12/hadoop-security-today-tomorrow-
apache-knox• http://www.slideshare.net/Hadoop_Summit/radia-srinivas-
june261120amroom210c• http://www.slideshare.net/KevinMinder/knox-
hadoopsummit20140505v6pub• http://blog.sequenceiq.com/blog/2014/12/04/multinode-ambari-1-7-0/
Q&A
krzysztof.adamski@ingservicespolska.pl
http://pl.linkedin.com/in/adamskikrzysztof
top related