self-management for large scale distributed systems · introduction niche platform robust...
TRANSCRIPT
![Page 1: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/1.jpg)
Self-Management for Large Scale DistributedSystems
Ahmad Al-Shishtawy ([email protected])
Advisors:Vladimir Vlassov ([email protected])
Seif Haridi ([email protected])
KTH Royal Institute of TechnologyStockholm, Sweden
The 5th EuroSys Doctoral Workshop (EuroDW 2011)April 10, 2011
![Page 2: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/2.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Outline
1 Introduction
2 Niche Platform
3 Robust Management Elements
4 Future Work
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 2/25
![Page 3: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/3.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
The ProblemAutonomic ComputingThe Goal
Outline
1 Introduction
2 Niche Platform
3 Robust Management Elements
4 Future Work
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 3/25
![Page 4: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/4.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
The ProblemAutonomic ComputingThe Goal
Dealing with Complexity
ProblemAll computing systems need to be managed
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 4/25
![Page 5: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/5.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
The ProblemAutonomic ComputingThe Goal
Dealing with Complexity
ProblemAll computing systems need to be managed
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 4/25
![Page 6: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/6.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
The ProblemAutonomic ComputingThe Goal
Dealing with Complexity
ProblemComputing systems are getting more and more complex
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 4/25
![Page 7: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/7.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
The ProblemAutonomic ComputingThe Goal
Dealing with Complexity
ProblemComplexity means higher administration overheads
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 4/25
![Page 8: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/8.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
The ProblemAutonomic ComputingThe Goal
Dealing with Complexity
ProblemComplexity poses a barrier on further development
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 4/25
![Page 9: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/9.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
The ProblemAutonomic ComputingThe Goal
Dealing with Complexity
SolutionThe Autonomic Computing initiative by IBM
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 4/25
![Page 10: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/10.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
The ProblemAutonomic ComputingThe Goal
Dealing with Complexity
SolutionSelf-Management: Systems capable of managing themselves
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 4/25
![Page 11: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/11.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
The ProblemAutonomic ComputingThe Goal
Dealing with Complexity
SolutionUse Autonomic Managers
Monitor
Analyze Plan
Execute
Autonomic Manager
Knowledge Monitor
Analyze Plan
Execute
Autonomic Manager
Knowledge Monitor
Analyze Plan
Execute
Autonomic Manager
Knowledge Monitor
Analyze Plan
Execute
Autonomic Manager
Knowledge
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 4/25
![Page 12: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/12.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
The ProblemAutonomic ComputingThe Goal
Dealing with Complexity
Open Question
How to achieve Self-Management?
Monitor
Analyze Plan
Execute
Autonomic Manager
Knowledge Monitor
Analyze Plan
Execute
Autonomic Manager
Knowledge Monitor
Analyze Plan
Execute
Autonomic Manager
Knowledge Monitor
Analyze Plan
Execute
Autonomic Manager
Knowledge
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 4/25
![Page 13: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/13.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
The ProblemAutonomic ComputingThe Goal
The Autonomic Computing Architecture
Managed Resource
Touchpoint (Sensors &Actuators)Autonomic Manager
MonitorAnalyzePlanExecute
Knowledge Source
Communication
Manager Interface
Monitor
Analyze Plan
Execute
Touch Point
Autonomic Manager
Managed Resource
Knowledge
Managed Resource
Touch Point
Manager
Interface
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 5/25
![Page 14: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/14.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
The ProblemAutonomic ComputingThe Goal
The Goal
Large-scale distributed systems
Complex and require self-management
May run on unreliable resourcesMajor sources of complexity:
Scale (resources, events, users, . . . )Dynamism (resource churn, load changes, . . . )
GoalA platform (concepts, abstractions, algorithms. . . ) that facilitatesdevelopment of self-managing applications in large-scale and/ordynamic distributed environment.
A methodology that help us to achieve self-management.
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 6/25
![Page 15: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/15.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
The ProblemAutonomic ComputingThe Goal
The Goal
Large-scale distributed systems
Complex and require self-management
May run on unreliable resourcesMajor sources of complexity:
Scale (resources, events, users, . . . )Dynamism (resource churn, load changes, . . . )
GoalA platform (concepts, abstractions, algorithms. . . ) that facilitatesdevelopment of self-managing applications in large-scale and/ordynamic distributed environment.
A methodology that help us to achieve self-management.
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 6/25
![Page 16: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/16.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
The ProblemAutonomic ComputingThe Goal
Research Plan
Self-Management in large-scale distributed systems. Consists of fourmain parts:
Part 1: Touchpoints and feedback loops in distributed systems
Part 2: Robust Management
Part 3: Improve management logic
Part 4: Integrate previous parts in a self-managing system.
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 7/25
![Page 17: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/17.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Niche OverviewManagement PartRuntime Environment
Outline
1 Introduction
2 Niche Platform
3 Robust Management Elements
4 Future Work
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 8/25
![Page 18: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/18.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Niche OverviewManagement PartRuntime Environment
Niche
Niche is a Distributed Component ManagementSystem
Niche implements the Autonomic ComputingArchitecture for large-scale distributed environment
Niche leverages Structured Overlay Networks forcommunication and for provisioning of basic services(DHT, Publish/Subscribe, Groups, etc.)
Monitor
Analyze Plan
Execute
Touch Point
Autonomic Manager
Managed Resource
Knowledge
Managed Resource
Touch Point
Manager
Interface
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 9/25
![Page 19: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/19.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Niche OverviewManagement PartRuntime Environment
Management Part
Management ElementsWatchersAggregatorsManagersExecutors
Communicate through events
Publish/Subscribe
Autonomic Managers (controlloops) built as network of MEs
Sensors and Actuators forcomponents and groups
Actuation APIFunctional Part
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 10/25
![Page 20: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/20.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Niche OverviewManagement PartRuntime Environment
Management Part
Management ElementsWatchersAggregatorsManagersExecutors
Communicate through events
Publish/Subscribe
Autonomic Managers (controlloops) built as network of MEs
Sensors and Actuators forcomponents and groups
Actuation APIFunctional Part
Management Part
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 10/25
![Page 21: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/21.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Niche OverviewManagement PartRuntime Environment
Management Part
Management ElementsWatchersAggregatorsManagersExecutors
Communicate through events
Publish/Subscribe
Autonomic Managers (controlloops) built as network of MEs
Sensors and Actuators forcomponents and groups
Actuation APIFunctional Part
Management Part
Watcher
Aggreg.
Manager
ExecutorExecutorWatcher Watcher
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 10/25
![Page 22: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/22.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Niche OverviewManagement PartRuntime Environment
Management Part
Management ElementsWatchersAggregatorsManagersExecutors
Communicate through events
Publish/Subscribe
Autonomic Managers (controlloops) built as network of MEs
Sensors and Actuators forcomponents and groups
Actuation APIFunctional Part
Management Part
Watcher
Aggreg.
Manager
ExecutorExecutorWatcher Watcher
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 10/25
![Page 23: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/23.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Niche OverviewManagement PartRuntime Environment
Management Part
Management ElementsWatchersAggregatorsManagersExecutors
Communicate through events
Publish/Subscribe
Autonomic Managers (controlloops) built as network of MEs
Sensors and Actuators forcomponents and groups
Actuation APIFunctional Part
Management Part
Watcher
Aggreg.
Manager
ExecutorExecutorWatcher Watcher
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 10/25
![Page 24: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/24.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Niche OverviewManagement PartRuntime Environment
Management Part
Management ElementsWatchersAggregatorsManagersExecutors
Communicate through events
Publish/Subscribe
Autonomic Managers (controlloops) built as network of MEs
Sensors and Actuators forcomponents and groups
Actuation APIFunctional Part
Management Part
Watcher
Aggreg.
Manager
ExecutorExecutorWatcher Watcher
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 10/25
![Page 25: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/25.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Niche OverviewManagement PartRuntime Environment
Runtime Environment
Containers that hostcomponents and MEs
Use a Structured OverlayNetwork for communication
Provide overlay services
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 11/25
![Page 26: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/26.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Niche OverviewManagement PartRuntime Environment
Runtime Environment
Containers that hostcomponents and MEs
Use a Structured OverlayNetwork for communication
Provide overlay services
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 11/25
![Page 27: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/27.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Niche OverviewManagement PartRuntime Environment
Runtime Environment
Containers that hostcomponents and MEs
Use a Structured OverlayNetwork for communication
Provide overlay services
0 12
3
4
5
6
7
8
9
10
11
12
13
14151617
1819
20
21
22
23
24
25
26
27
28
29
3031
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 11/25
![Page 28: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/28.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Niche OverviewManagement PartRuntime Environment
Dealing with Resource Churn
How to deal with failures?MEs heal the functional partHow to heal failed MEs?
Programmatically in themanagement logicTransparently by theplatform
0 12
3
4
5
6
7
8
9
10
11
12
13
14151617
1819
20
21
22
23
24
25
26
27
28
29
3031
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 12/25
![Page 29: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/29.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Niche OverviewManagement PartRuntime Environment
Dealing with Resource Churn
How to deal with failures?MEs heal the functional partHow to heal failed MEs?
Programmatically in themanagement logicTransparently by theplatform
0 12
3
4
5
6
7
8
9
10
11
12
13
14151617
1819
20
21
22
23
24
25
26
27
28
29
3031
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 12/25
![Page 30: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/30.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Niche OverviewManagement PartRuntime Environment
Dealing with Resource Churn
How to deal with failures?MEs heal the functional partHow to heal failed MEs?
Programmatically in themanagement logicTransparently by theplatform
0 12
3
4
5
6
7
8
9
10
11
12
13
14151617
1819
20
21
22
23
24
25
26
27
28
29
3031
??!!
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 12/25
![Page 31: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/31.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
Outline
1 Introduction
2 Niche Platform
3 Robust Management Elements
4 Future Work
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 13/25
![Page 32: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/32.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
Robust Management Elements
A Robust Management Element (RME):
is replicated to ensure fault-tolerance
tolerates continuous churn by automatically restoring failedreplicas on other nodes
maintains its state consistent among replicas
provides its service with minimal disruption in spite of resourcechurn (high availability)
is location transparent, i.e., RME clients communicate with itregardless of current location of its replicas
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 14/25
![Page 33: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/33.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
Robust Management Elements
A Robust Management Element (RME):
is replicated to ensure fault-tolerance
tolerates continuous churn by automatically restoring failedreplicas on other nodes
maintains its state consistent among replicas
provides its service with minimal disruption in spite of resourcechurn (high availability)
is location transparent, i.e., RME clients communicate with itregardless of current location of its replicas
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 14/25
![Page 34: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/34.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
Robust Management Elements
A Robust Management Element (RME):
is replicated to ensure fault-tolerance
tolerates continuous churn by automatically restoring failedreplicas on other nodes
maintains its state consistent among replicas
provides its service with minimal disruption in spite of resourcechurn (high availability)
is location transparent, i.e., RME clients communicate with itregardless of current location of its replicas
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 14/25
![Page 35: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/35.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
Robust Management Elements
A Robust Management Element (RME):
is replicated to ensure fault-tolerance
tolerates continuous churn by automatically restoring failedreplicas on other nodes
maintains its state consistent among replicas
provides its service with minimal disruption in spite of resourcechurn (high availability)
is location transparent, i.e., RME clients communicate with itregardless of current location of its replicas
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 14/25
![Page 36: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/36.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
Robust Management Elements
A Robust Management Element (RME):
is replicated to ensure fault-tolerance
tolerates continuous churn by automatically restoring failedreplicas on other nodes
maintains its state consistent among replicas
provides its service with minimal disruption in spite of resourcechurn (high availability)
is location transparent, i.e., RME clients communicate with itregardless of current location of its replicas
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 14/25
![Page 37: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/37.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
Robust Management Elements
A Robust Management Element (RME):
is replicated to ensure fault-tolerance
tolerates continuous churn by automatically restoring failedreplicas on other nodes
maintains its state consistent among replicas
provides its service with minimal disruption in spite of resourcechurn (high availability)
is location transparent, i.e., RME clients communicate with itregardless of current location of its replicas
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 14/25
![Page 38: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/38.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
Solution Outline
Replicated state machine
An algorithm to reconfigure the replicated state machine. (Weused the SMART algorithm)
Our decentralized algorithm to automate reconfiguration
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 15/25
![Page 39: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/39.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
Solution Outline
Replicated state machine
An algorithm to reconfigure the replicated state machine. (Weused the SMART algorithm)
Our decentralized algorithm to automate reconfiguration
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 15/25
![Page 40: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/40.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
Solution Outline
Replicated state machine
An algorithm to reconfigure the replicated state machine. (Weused the SMART algorithm)
Our decentralized algorithm to automate reconfiguration
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 15/25
![Page 41: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/41.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
SMART
A B C D E
Admin
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 16/25
![Page 42: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/42.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
SMART
A B C D E
Admin
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 16/25
![Page 43: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/43.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
SMART
A B C D E
Admin
ConfigurationRepository
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 16/25
![Page 44: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/44.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
SMART
A B C D E
Admin
ConfigurationRepository
{A,B,C,D,E}
Configuration {A,B,C,D,E}
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 16/25
![Page 45: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/45.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
SMART
A B C D E
Admin
ConfigurationRepository
{A,B,C,D,E}
Configuration {A,B,C,D,E}
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 16/25
![Page 46: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/46.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
SMART
A B C D E
Admin
ConfigurationRepository
{A,B,C,D,E}
Configuration {A,B,C,D,E}
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 16/25
![Page 47: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/47.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
SMART
A B C D E X Y
Admin
ConfigurationRepository
{A,B,C,D,E}
Configuration {A,B,C,D,E}
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 16/25
![Page 48: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/48.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
SMART
A B C D E X Y
Admin
ConfigurationRepository
{A,B,C,D,E}
Configuration {A,B,C,D,E}
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 16/25
![Page 49: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/49.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
SMART
A B C D E X Y
Admin
ConfigurationRepository
{A,B,C,D,E}
Configuration {A,B,C,D,E}
{A,X,C,D,E}
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 16/25
![Page 50: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/50.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
SMART
A B C D E X Y
Admin
ConfigurationRepository
{A,X,C,D,E}
Configuration {A,X,C,D,E}
{A,X,C,D,E}
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 16/25
![Page 51: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/51.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
Creating a Replicated State Machine (RSM)
Any node can create a RSM. Select ID and replication degree
0 12
3
4
5
6
7
8
9
10
11
12
13
14151617
1819
20
21
22
23
24
25
26
27
28
29
3031
A
B
C
DE
G
H
I
F
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 17/25
![Page 52: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/52.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
Creating a Replicated State Machine (RSM)
Any node can create a RSM. Select ID and replication degree
0 12
3
4
5
6
7
8
9
10
11
12
13
14151617
1819
20
21
22
23
24
25
26
27
28
29
3031
A
B
C
DE
G
H
I
F
RSM ID = 10, f=4, N=32
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 17/25
![Page 53: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/53.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
Creating a Replicated State Machine (RSM)
The node uses symmetric replication scheme to calculate replica IDs
0 12
3
4
5
6
7
8
9
10
11
12
13
14151617
1819
20
21
22
23
24
25
26
27
28
29
3031
A
B
C
DE
G
H
I
F
RSM ID = 10, f=4, N=32RSM ID = 10, f=4, N=32
Replica IDs = 10, 18, 26, 2
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 17/25
![Page 54: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/54.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
Creating a Replicated State Machine (RSM)
The node uses lookups to find responsible nodes . . .
0 12
3
4
5
6
7
8
9
10
11
12
13
14151617
1819
20
21
22
23
24
25
26
27
28
29
3031
A
B
C
DE
G
H
I
F
RSM ID = 10, f=4, N=32RSM ID = 10, f=4, N=32
Replica IDs = 10, 18, 26, 2
RSM ID = 10, f=4, N=32
Replica IDs = 10, 18, 26, 2
Responsible Node IDs = 14, 20, 29, 7
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 17/25
![Page 55: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/55.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
Creating a Replicated State Machine (RSM)
. . . and gets direct references to them
0 12
3
4
5
6
7
8
9
10
11
12
13
14151617
1819
20
21
22
23
24
25
26
27
28
29
3031
A
B
C
DE
G
H
I
F
RSM ID = 10, f=4, N=32RSM ID = 10, f=4, N=32
Replica IDs = 10, 18, 26, 2
RSM ID = 10, f=4, N=32
Replica IDs = 10, 18, 26, 2
Responsible Node IDs = 14, 20, 29, 7
RSM ID = 10, f=4, N=32
Replica IDs = 10, 18, 26, 2
Responsible Node IDs = 14, 20, 29, 7
Configuration = D, F,I ,B
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 17/25
![Page 56: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/56.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
Creating a Replicated State Machine (RSM)
The set of direct references forms the configuration
0 12
3
4
5
6
7
8
9
10
11
12
13
14151617
1819
20
21
22
23
24
25
26
27
28
29
3031
A
B
C
DE
G
H
I
F
RSM ID = 10, f=4, N=32RSM ID = 10, f=4, N=32
Replica IDs = 10, 18, 26, 2
RSM ID = 10, f=4, N=32
Replica IDs = 10, 18, 26, 2
Responsible Node IDs = 14, 20, 29, 7
RSM ID = 10, f=4, N=32
Replica IDs = 10, 18, 26, 2
Responsible Node IDs = 14, 20, 29, 7
Configuration = D, F,I ,B
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 17/25
![Page 57: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/57.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
Creating a Replicated State Machine (RSM)
The node sends a Create message to the configuration
0 12
3
4
5
6
7
8
9
10
11
12
13
14151617
1819
20
21
22
23
24
25
26
27
28
29
3031
A
B
C
DE
G
H
I
F
SM10 r1
SM10 r2
SM10 r3
SM10 r4
RSM ID = 10, f=4, N=32RSM ID = 10, f=4, N=32
Replica IDs = 10, 18, 26, 2
RSM ID = 10, f=4, N=32
Replica IDs = 10, 18, 26, 2
Responsible Node IDs = 14, 20, 29, 7
RSM ID = 10, f=4, N=32
Replica IDs = 10, 18, 26, 2
Responsible Node IDs = 14, 20, 29, 7
Configuration = D, F,I ,B
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 17/25
![Page 58: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/58.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
Creating a Replicated State Machine (RSM)
Now replicas communicate directly using the configuration
SM10 r1
SM10 r2
SM10 r3
SM10 r4
Configuration_1 D F I B1 2 3 4
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 17/25
![Page 59: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/59.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
SMART with Multiple Admins
A B C D E X Y
Admin
ConfigurationRepository
{A,B,C,D,E}
Configuration {A,B,C,D,E}
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 18/25
![Page 60: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/60.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
SMART with Multiple Admins
A B C D E X Y
Admin Admin2
ConfigurationRepository
{A,B,C,D,E}
Configuration {A,B,C,D,E}
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 18/25
![Page 61: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/61.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
SMART with Multiple Admins
A B C D E X Y
Admin Admin2
ConfigurationRepository
{A,B,C,D,E}
Configuration {A,B,C,D,E}
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 18/25
![Page 62: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/62.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
SMART with Multiple Admins
A B C D E X Y
Admin Admin2
ConfigurationRepository
{A,B,C,D,E}
Configuration {A,B,C,D,E}
{A,X,C,D,E}
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 18/25
![Page 63: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/63.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
SMART with Multiple Admins
A B C D E X Y
Admin Admin2
ConfigurationRepository
{A,B,C,D,E}
Configuration {A,B,C,D,E}
{A,X,C,D,E}
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 18/25
![Page 64: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/64.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
SMART with Multiple Admins
A B C D E X Y
Admin Admin2
ConfigurationRepository
{A,B,C,D,E}
Configuration {A,B,C,D,E}
{A,X,C,D,E} {A,B,C,Y,E}
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 18/25
![Page 65: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/65.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
SMART with Multiple Admins
A B C D E X Y
Admin Admin2
ConfigurationRepository
{?,?,?,?,?}
Configuration {?,?,?,?,?}
{A,X,C,D,E} {A,B,C,Y,E}
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 18/25
![Page 66: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/66.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
SMART with Multiple Admins
A B C D E X Y
Admin Admin2
ConfigurationRepository
{A,B,C,Y,E}
Configuration {A,B,C,Y,E}
{A,X,C,D,E} {A,B,C,Y,E}
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 18/25
![Page 67: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/67.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
SMART with Multiple Admins
A B C D E X Y
Admin Admin2
ConfigurationRepository
{A,B,C,D,E}
Configuration {A,B,C,D,E}
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 18/25
![Page 68: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/68.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
SMART with Multiple Admins
A B C D E X Y
Admin Admin2
ConfigurationRepository
{A,B,C,D,E}
Configuration {A,B,C,D,E}
{ ,X, , , }
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 18/25
![Page 69: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/69.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
SMART with Multiple Admins
A B C D E X Y
Admin Admin2
ConfigurationRepository
{A,B,C,D,E}
Configuration {A,B,C,D,E}Proposed Changes { ,X, , , }
{ ,X, , , }
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 18/25
![Page 70: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/70.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
SMART with Multiple Admins
A B C D E X Y
Admin Admin2
ConfigurationRepository
{A,B,C,D,E}
Configuration {A,B,C,D,E}Proposed Changes { ,X, , , }
{ ,X, , , } { , , ,Y, }
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 18/25
![Page 71: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/71.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
SMART with Multiple Admins
A B C D E X Y
Admin Admin2
ConfigurationRepository
{A,B,C,D,E}
Configuration {A,B,C,D,E}Proposed Changes { ,X, , Y, }
{ ,X, , , } { , , ,Y, }
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 18/25
![Page 72: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/72.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
SMART with Multiple Admins
A B C D E X Y
Admin Admin2
ConfigurationRepository
{A,B,C,D,E}
Configuration {A,B,C,D,E}Proposed Changes { ,X, , Y, }
New Configuration{A,X,C,Y,E}
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 18/25
![Page 73: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/73.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
SMART with Multiple Admins
A B C D E X Y
Admin Admin2
ConfigurationRepository
{A,X,C,Y,E}
Configuration {A,X,C,Y,E}
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 18/25
![Page 74: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/74.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
Handling Churn
0 12
3
4
5
6
7
8
9
10
11
12
13
14151617
1819
20
21
22
23
24
25
26
27
28
29
3031
A
B
C
DE
G
H
I
F
SM10 r1
SM10 r2
SM10 r3
SM10 r4
Configuration_1 D F I B1 2 3 4
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 19/25
![Page 75: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/75.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
Handling Churn
0 12
3
4
5
6
7
8
9
10
11
12
13
14151617
1819
20
21
22
23
24
25
26
27
28
29
3031
A
B
C
DE
G
H
I
F
SM10 r1
SM10 r2
SM10 r3
SM10 r4
Configuration_1 D F I B1 2 3 4
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 19/25
![Page 76: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/76.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
Handling Churn
0 12
3
4
5
6
7
8
9
10
11
12
13
14151617
1819
20
21
22
23
24
25
26
27
28
29
3031
A
B
C
DE
G
H
I
SM10 r2
F
SM10 r1
SM10 r2
SM10 r3
SM10 r4
Configuration_1 D F I B1 2 3 4
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 19/25
![Page 77: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/77.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
Handling Churn
0 12
3
4
5
6
7
8
9
10
11
12
13
14151617
1819
20
21
22
23
24
25
26
27
28
29
3031
A
B
C
DE
G
H
I
SM10 r2
SM10 r2 = G
F
SM10 r1
SM10 r2
SM10 r3
SM10 r4
Configuration_1 D F I B1 2 3 4
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 19/25
![Page 78: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/78.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
Handling Churn
0 12
3
4
5
6
7
8
9
10
11
12
13
14151617
1819
20
21
22
23
24
25
26
27
28
29
3031
A
B
C
DE
G
H
I
SM10 r2
SM10 r2 = G
G1 2 3 4
F
SM10 r1
SM10 r2
SM10 r3
SM10 r4
Configuration_1 D F I B1 2 3 4
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 19/25
![Page 79: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/79.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
Handling Churn
0 12
3
4
5
6
7
8
9
10
11
12
13
14151617
1819
20
21
22
23
24
25
26
27
28
29
3031
A
B
C
DE
G
H
I
G1 2 3 4
SM10 r1
SM10 r2
SM10 r3
SM10 r4
Configuration_1 D F I B1 2 3 4
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 19/25
![Page 80: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/80.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
Handling Churn
0 12
3
4
5
6
7
8
9
10
11
12
13
14151617
1819
20
21
22
23
24
25
26
27
28
29
3031
A
B
C
DE
G
H
I
G1 2 3 4
Configuration_2 D G I B1 2 3 4
SM10 r1
SM10 r2
SM10 r3
SM10 r4
Configuration_1 D F I B1 2 3 4
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 19/25
![Page 81: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/81.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
Evaluation
Built a prototype implementation of RME
Simulation-based performance evaluation
Focused on the effect of the churn rate and replication degree onrequest critical path and failure recovery
Used the King latency dataset
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 20/25
![Page 82: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/82.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Solution OutlineThe SMART Reconfiguration AlgorithmAutomatic ReconfigurationEvaluation
Request latency for a single client
0
1000
2000
3000
4000
5000
6000
7000
8000
0 1000 2000 3000 4000 5000 6000 7000 8000
Re
qu
est
La
ten
cy (
ms)
Request Number
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 21/25
![Page 83: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/83.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Outline
1 Introduction
2 Niche Platform
3 Robust Management Elements
4 Future Work
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 22/25
![Page 84: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/84.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Improve Management Logic
Apply control theory to distributed systems
Distributed optimization
Reinforcement Learning
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 23/25
![Page 85: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/85.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Self-Management in Cloud Applications
Study elastic services in the Cloud
Develop self-management techniques for Cloud applications
Integrate all pieces into an elastic storage system
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 24/25
![Page 86: Self-Management for Large Scale Distributed Systems · Introduction Niche Platform Robust Management Elements Future Work The Problem Autonomic Computing The Goal Outline 1 Introduction](https://reader033.vdocuments.site/reader033/viewer/2022052011/60277627d0fafc4cf54e9997/html5/thumbnails/86.jpg)
IntroductionNiche Platform
Robust Management ElementsFuture Work
Thank you for careful listening :-)
Questions?
Self-Management for Large Scale Distributed Systems (A. Al-Shishtawy) 25/25