adjust carbon topology to match high availability scenario requirements

53
Adjusting Carbon Topology to Match High Availability Scenario Requirements Afkham Azeez Director of Architecture WSO2 Inc 1

Upload: wso2

Post on 18-Nov-2014

1.621 views

Category:

Documents


0 download

DESCRIPTION

 

TRANSCRIPT

  • 1. Adjusting Carbon Topology toMatch High Availability Scenario Requirements Afkham Azeez Director of Architecture WSO2 Inc 1
  • 2. About Me PMC member Apache Axis, Committer Synapse & Web Services Member, Apache Software Foundation Co-author, Axis2 Web Services Director of Architecture, WSO2 Inc Blog: http://blog.afkham.org Twitter: afkham_azeez 2
  • 3. Agenda A brief look at the WSO2 platform Carbon clustering for availability Cost of availability & related topologies 3
  • 4. WSO2 Offerings WSO2 Carbon Full platform of servers for deployment on-premise, in private or public cloud Products share a consistent architecture and core platform services (e.g. logging, management, security, identity, caching) through OSGi and the Carbon Core Includes ESB, AppServer, Data Services, Governance, Identity, Business Process, and more WSO2 Stratos Platform-as-a-Service (PaaS) Foundation Supports running servers as elastic, metered, billed, multi-tenant with self-service Including all Carbon Servers, PHP, Jetty, and a growing list through a standard Cartridge model WSO2 StratosLive http://stratoslive.wso2.com WSO2s Public PaaS An instance of Stratos running in the cloud with all Carbon Servers available 4
  • 5. Consistent Architecture Carbon: A consistent set of class-leading enterprise servers The same products run either on-premise or in the cloud, single-tenant or multi- tenant Utilize the same Carbon core runtime for a seamless experience Stratos: A cloud platform for enterprise, hybrid and public deployment Extends the deployment to support full self-service, elastic scaling, metering and billing Supports Carbon and native server runtimes Including Java and non-Java servers such as Jetty and PHP Re-uses the same core Carbon architecture to offer core PaaS services including: Identity, Logging, File, Relational Storage, Column Storage, Code Deployment, etc Both projects share a common set of OSGi modules and a core runtime architecture 5
  • 6. WSO2 SOA Platform 6
  • 7. WSO2 Carbon 7
  • 8. AvailabilityThe degree to which a system, subsystem, or equipment is in a specified operable and committable state at the start of a mission, when the mission is called for at an unknown, i.e., a random, time.Simply put, availability is the proportion of time a system is in a functioning condition. 8
  • 9. Availability 9
  • 10. Availability 10
  • 11. High Availability (HA)A system that is designed for continuous operation in the event of a failure of one or more components. However, the system may display some degradation of service, but will continue to perform correctly.The proportion of time during which the service is accessible with reasonable response times should be close to 100%.All single points of failure should be eliminated 11
  • 12. HA, CO & CA Continuous Operation (CO) Ability to avoid planned outages. hardware and software maintenance carried out while applications remains available users. Continuous Availability (CA) Combines the characteristics of HA and CO to keep the applications running without any noticeable downtime Hot update/ graceful round-robin restart 12
  • 13. High Availability Techniques Redundancy Time retransmit Data e.g. parity bits Processing e.g. redundant nodes Diversity e.g. Hybrid deployments, do the same thing using different implementations 13
  • 14. How to decide required availability? Average throughput (TPS) Max throughput (TPS) Monetary value of a transaction Average loss & max loss per second of downtime Decide on how much to invest on availability based on cost vs. benefit tradeoff 14
  • 15. Patching Production Deployments Patch Distribution Coordinator 1. Check patch list 2.Pull new patch 3. Push patch 3. Push patch 3. Push patch 3. Push patch 15
  • 16. Patching Production Deployments Patch Distribution Coordinator Round-robin 4. Maintenance mode 5. Graceful restart 16
  • 17. Clustering Clustering for scalability Clustering for availability 17
  • 18. Clustering for scalability 18
  • 19. Clustering for availability Group Communication Channel/State replication 19
  • 20. Carbon Clustering Membership types Static Dynamic Hybrid Membership modes Multicast Well-known address 20
  • 21. Static membership Predefined members Other (non-predefined) nodes cannot join Static group M1 M2 N M3 M4 21
  • 22. Dynamic membership No predefined members Nodes can join & leave Dynamic group M1 M2 N Join M3 M4 22
  • 23. Hybrid membership Some predefined (well-known) members, and some dynamic members Nodes can join & leave Membership revolves around the static members Hybrid group Dynamic members Static members N Join (IP, M5 M6 M1 M2 Port) M7 M3 M4 23
  • 24. Multicast based membership management M4 M1 N Join (IP, Port) M2 M3 24
  • 25. Well-known Address (WKA) based membership managementHybrid group Dynamic members Static members M6 M5 WK1 N WK2 Notify Join (IP, Port) M7 WK3 WK4 25
  • 26. Multicast vs. WKAMulticast WKAAll nodes should be in the same subnet Nodes can be in different networksAll nodes should be in the same multicastdomain No multicasting requirementMulticasting should not be blockedNo fixed IP addresses or hosts required At least one well-known IP address or host requiredFailure of any member does not affect New members can join with some WKAmembership discovery nodes down, but not if all WKA nodes are downDoes not work on IaaSs such as Amazon IaaS-friendlyEC2 Requires keepalived, elastic IPs or some other mechanism for remapping IP addresses of WK members in cases of failure 26
  • 27. Multicast vs. WKA how to decide? Multicast Cluster is going to be setup in a network where multicasting is allowed WKA Cloud based deployment Members are distributed across datacenters & regions Multicasting blocked 27
  • 28. HTTP Session Replication catalina-server.xml web.xml 28
  • 29. State ReplicationJSR-107/JCache A standard Java Caching API for use by developers and a standard SPI ("Service Provider Interface") for use by implementers. import javax.cache.* CacheManager cacheMgr = Caching.getCacheManager(); Cache cache =cacheMgr .getCache(cacheName); cache.put(key, sampleValue); Integer i = cache.get(key); 29
  • 30. State ReplicationCarbonContext based API Cache cache = CarbonContext.getCurrentContext().getCache(); cache.put(key, sampleValue); Integer i = cache.get(key);Axis2 Contexts Using Axis2 clustering StateManager axis2.xml 30
  • 31. Elastic Load Balancer 2.0 New sysadmin-friendly configuration language High performance PassThrough transport Tenant-aware load balancing Ability to dedicate clusters for tenants (private jet mode) Improved auto-scaler Separate IaaS-aware Cloud controller takes care of spawning new instances on different IaaSs 31
  • 32. Tenant-aware LB 32
  • 33. Private Jet mode Analogy Economy class no SLA management, only elasticity Business class elasticity plus SLA guarantees Private Jet Guaranteed isolated VMs or machines for a specific tenant Still elastically scaled
  • 34. Private Jet Mode 34
  • 35. Topologies Single node Multi-node with LB Multi-node with elasticity using ELB Management & worker node separated Multi-zone or multi-datacenter deployment Multi-region 35
  • 36. Single nodeHIGHEST Availability CostLOWEST 36
  • 37. Primary-secondaryHIGHEST Availability CostLOWEST Primary Secondary 37
  • 38. Primary-secondary, multiple LBHIGHEST keepalived Availability CostLOWEST Primary Secondary 38
  • 39. Active cluster, multiple LBHIGHEST keepalived Availability CostLOWEST Active Active Active 39
  • 40. Management & Worker Node Separation Proper separation of concerns - management nodes specialize in management of the setup while worker nodes specialize in serving requests to deployment artifacts Only management nodes are authorized to add new artifacts into the system or make configuration changes Worker nodes can only deploy artifacts & read configuration Lower memory foot in the worker nodes because the management console related OSGi bundles are not loaded Improved security - management nodes can be behind the internal firewall & be exposed to clients running within the organization only, while worker nodes can be exposed to external clients. Isolation of failures 40
  • 41. Management & Worker Node SeparationHIGHEST Availability CostLOWEST 41
  • 42. Regions & Zones 42
  • 43. Stratos 2.0 Architecture 43
  • 44. Multi-zone or multi-datacenter DeploymentHIGHEST Cloud Controller Zone 1 Zone 2 Availability Region X CostLOWEST 44
  • 45. Multi-region deploymentHIGHEST Zone 1 Zone 2 Region X Zone 1 Availability Zone 2 CostLOWEST Region Y 45
  • 46. Multi-IaaS Deployment Cloud Controller 46
  • 47. Multiple IaaS (hybrid) DeploymentHIGHEST Zone 1 Private cloud (data center) Zone 2 Zone 1 Zone 2 Amazon EC2 Zone 1 Availability Cost Zone 2LOWEST Rackspace Cloud 47
  • 48. Single Node Primary-Secondary, single LB Primary-Secondary, with multiple LBs Multi-node active cluster - Single zone Cost of Availability Multi-zone Multi-region Multi-IaaS48
  • 49. HA for the Load Balancer Load balancer cluster Keepalived Elastic IP address Round Robin DNS 49
  • 50. Monitoring Servers Monit Automatically provide alerts & restart processes when monitored items (e.g. latency) fall below certain thresholds. New Relic Nagios 50
  • 51. ReferencesInformation on tenant-aware load balancinghttp://sanjeewamalalgoda.blogspot.com/2012/03/tenant-aware-load-balancer-is-upcoming.htmlhttp://sanjeewamalalgoda.blogspot.com/2012/05/tenant-aware-load-balancer.htmlScaling Stratoshttp://srinathsview.blogspot.com/2012/06/scaling-wso2-stratos.htmlhttp://blog.afkham.org/2011/09/how-to-setup-wso2-elastic-load-balancer.htmlhttp://blog.afkham.org/2011/09/wso2-load-balancer-how-it-works.html 51
  • 52. Auto-scaler service deploymenthttp://nirmalfdo.blogspot.com/2012/07/autoscaler-service-deployment.htmlAuto-scaler servicehttp://nirmalfdo.blogspot.com/2012/07/wso2-autoscaler-service-part-i.htmlAutomatic failover for WSO2 ELBhttp://gonesimple.org/2012/09/24/automatic-fail-over-for-wso2-elb/ 52
  • 53. Questions? http://www.flickr.com/photos/oberazzi/ 53