andreas fischer (university passau, germany): resilience in networks: elements and approach for a...
Post on 18-Dec-2014
583 Views
Preview:
DESCRIPTION
TRANSCRIPT
Resilience in Networks: Elements and Approach for a Trustworthy Infrastructure
Andreas Fischer and Hermann de Meer
2
The ResumeNet project
Resilience & Survivability for future networkingframework, mechanisms & exp. evaluation
Contract number: FP7 – 224619 Effort
• 36 months (Sep 2008 – Aug 2011)
• 437 person-months
• Equivalent of 12 scientists working full-time
Partners
Website: http://www.resumenet.eu/
Swiss Federal Institute of Technology University of Passau
Lancaster University Delft University of Technology
Munich University of Technology University of Uppsala
France Telecom University of Liege
NEC Europe Ltd.
3
What is network resilience?
“The ability of the network to provide and maintain an acceptable level of service in the face of various faults and
challenges.”
Robustness
Challenge Tolerance
Energy
Delay Mobility
Connectivity
DisruptionTolerance
Environmental
attack
legitimate
TrafficTolerance
FaultTolerance
Survivability
SecurityNonrepudiabilityConfidentiality
AAA
Dependability
Availability Integrity
Reliability Safety
Maintainability
Performability
QoS measures
Trustworthiness
4
The ResumeNet approach to resilience Resilient Networking Architecture – D²R²+DR
• Defend, Detect, Remediate, Repair Real-time control loop React to changes of the network
• Diagnosis and Refinement Long-term actions Improve overall resilience
Network and Service Resilience
• Service: Acceptable, impaired, unacceptable
• Network: Normal, partially, severelydegraded
5
Resilience Control Loop
6
Virtual service migration
Use service virtualisation as resilience enabler
• Virtualisation enables migration ofarbitrary network services
• Counter certain types of challenges Hardware destruction (e.g. due to natural disaster) Communication environment (e.g. unreliable link) Unusual but legitimate requests for
service (e.g. insufficient CPU power)
Advantages
• Flexible reaction to challenges
• (Mostly) transparent to services Problems
• Limitation of time
• Limitation of resourcesH
ot
sta
te
Real Machine
Virtualisation Layer
Migration
Real Machine
Virtualisation Layer
Cold
sta
teS
erv
ice
Hot
sta
te
Cold
sta
teS
erv
ice
ENISA, Cloud Computing research problems:•Long distance live migration of virtual machines•Resilience of Cloud Computing
ENISA, Cloud Computing research problems:•Long distance live migration of virtual machines•Resilience of Cloud Computing
7
Virtual Service migration phases
Dynamic Composite Service Migration
• Compose migration from distinct actions (migration primitives)
• Separate migration of service primitives Overall goal: Keeping service operational during challenge
8
State transfer strategiesStrategy Description Pros Cons State lost? Applicable
scenario
None No migration No changes introduced
No resilience introduced
All state is lost No spare resources available
Cold spare Don’t transfer hot state, fall back to cold state
Simple, fast, works with disconnected source & dest.
Does not keep hot state
Hot state is lost Hardware fails unexpectedly
Cold migration: Stop and copy
On challenge detection: Stop service, copy all state, restart on target machine
Simple Significant service downtime
State is kept Service downtime acceptable, challenge still some time off
Cold migration: Update hot state
Initially distribute cold state. On chall. det.: Stop service, copy hot state, restart on target machine
Simple, total time to recovery only dependent on size of hot state
Requires initial distribution and regular update of cold state
State is kept Small service downtime acceptable, challenge pending
Live migration: Continuous state update
Start service on target host and continuously synchronize state
Almost no service downtime
Complex, uses unnecessary bandwidth, high total time to repair
State is kept Service downtime should be minimized
Live migration: Update state on demand
Start service on target, transfer state as needed
Low service downtime, state is copied only once
Complex, very high total time to repair
State is kept Initially low service response time is acceptable
Hot spare Synchronize all state during operation
No service downtime
Very complex, uses resources even in absence of challenges
State is kept Critical service – no downtime acceptable
9
Virtual Service migration for resilience
Challenge Analysis
Migration Manager
External Sources
Policies
Initiate and supervise migrationand recovery
Providemonitoring
data
Provide external input
Take into account
Company policies
Service level agreements (SLA)
Available migration strategies
Provide challenge information
MigrationHot
sta
te
Real Machine
Virtualisation Layer
Cold
sta
teS
erv
ice
Real Machine
Virtualisation Layer
Hot
sta
te
Cold
sta
teS
erv
ice
Providemonitoringdata
10
Conclusions
Resilience is a wide topic
• Tackles important Future Internet issues
• Needs an understanding of challenges
• Needs a coordinated approach
D²R²+DR provides a design guideline
• Systematic approach to resilience
• Blueprint for designing resilient systems
top related