chaos engineering - users.informatik.haw-hamburg.deubicomp/... · chaos engineering is the...
TRANSCRIPT
![Page 1: Chaos Engineering - users.informatik.haw-hamburg.deubicomp/... · Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the](https://reader030.vdocuments.site/reader030/viewer/2022040409/5ec467044e7aca03a87f2d2b/html5/thumbnails/1.jpg)
Chaos Engineering13.11.2018 Dennis Pietruck
1
![Page 2: Chaos Engineering - users.informatik.haw-hamburg.deubicomp/... · Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the](https://reader030.vdocuments.site/reader030/viewer/2022040409/5ec467044e7aca03a87f2d2b/html5/thumbnails/2.jpg)
13.11.2018 Dennis Pietruck
Agenda
● Motivation● Grundlagen● Forschung & Konferenzen
2
![Page 3: Chaos Engineering - users.informatik.haw-hamburg.deubicomp/... · Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the](https://reader030.vdocuments.site/reader030/viewer/2022040409/5ec467044e7aca03a87f2d2b/html5/thumbnails/3.jpg)
Motivation
3
![Page 4: Chaos Engineering - users.informatik.haw-hamburg.deubicomp/... · Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the](https://reader030.vdocuments.site/reader030/viewer/2022040409/5ec467044e7aca03a87f2d2b/html5/thumbnails/4.jpg)
13.11.2018 Dennis Pietruck
Fehler in einem Monolith
?
Was sieht der Client?
4
![Page 5: Chaos Engineering - users.informatik.haw-hamburg.deubicomp/... · Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the](https://reader030.vdocuments.site/reader030/viewer/2022040409/5ec467044e7aca03a87f2d2b/html5/thumbnails/5.jpg)
13.11.2018 Dennis Pietruck
Fehler in einem Monolith
?
Was sieht der Client?
● Internal Server Error - 500● Timeout
5
![Page 6: Chaos Engineering - users.informatik.haw-hamburg.deubicomp/... · Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the](https://reader030.vdocuments.site/reader030/viewer/2022040409/5ec467044e7aca03a87f2d2b/html5/thumbnails/6.jpg)
13.11.2018 Dennis Pietruck
Fehler in einem verteilten System
?
Was sieht der Client? A
B
C
6
![Page 7: Chaos Engineering - users.informatik.haw-hamburg.deubicomp/... · Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the](https://reader030.vdocuments.site/reader030/viewer/2022040409/5ec467044e7aca03a87f2d2b/html5/thumbnails/7.jpg)
13.11.2018 Dennis Pietruck
Fehler in einem verteilten System
?
Was sieht der Client?
● Internal Server Error - 500● Timeout● OK - 200 (mit fehlenden Daten)
○ nach langem Warten
A
B
C
7
![Page 8: Chaos Engineering - users.informatik.haw-hamburg.deubicomp/... · Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the](https://reader030.vdocuments.site/reader030/viewer/2022040409/5ec467044e7aca03a87f2d2b/html5/thumbnails/8.jpg)
13.11.2018 Dennis Pietruck
Fehler in einem verteilten System
?
Wie kann sich dieser Fehler auswirken?
● Keine Fehlerbehandlung○ A kann auf DB Fehler nicht reagieren○ B kann auf Fehler in A nicht reagieren○ C kann auf Fehler in B nicht reagieren
● A hat Retries nicht konfiguriert● A und B haben Timeout nicht konfiguriert
A
B
C
8
![Page 9: Chaos Engineering - users.informatik.haw-hamburg.deubicomp/... · Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the](https://reader030.vdocuments.site/reader030/viewer/2022040409/5ec467044e7aca03a87f2d2b/html5/thumbnails/9.jpg)
13.11.2018 Dennis Pietruck
Verteilte Systeme heute
Josh Evans - A Netflix Guide to Microserviceshttps://de.slideshare.net/JoshEvans2/mastering-chaos-a-netflix-guide-to-microservices
9
![Page 10: Chaos Engineering - users.informatik.haw-hamburg.deubicomp/... · Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the](https://reader030.vdocuments.site/reader030/viewer/2022040409/5ec467044e7aca03a87f2d2b/html5/thumbnails/10.jpg)
Grundlagen
10
![Page 11: Chaos Engineering - users.informatik.haw-hamburg.deubicomp/... · Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the](https://reader030.vdocuments.site/reader030/viewer/2022040409/5ec467044e7aca03a87f2d2b/html5/thumbnails/11.jpg)
13.11.2018 Dennis Pietruck
Was ist Chaos Engineering?
Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the system’s capability to withstand
turbulent conditions in production.
principlesofchaos.org
11
![Page 12: Chaos Engineering - users.informatik.haw-hamburg.deubicomp/... · Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the](https://reader030.vdocuments.site/reader030/viewer/2022040409/5ec467044e7aca03a87f2d2b/html5/thumbnails/12.jpg)
13.11.2018 Dennis Pietruck
Testing ≠ Chaos Engineering
Entspricht das Systemverhalten der Spezifikation?
Was passiert wenn…
● Latenz erhöht wird?● Services abstürzen?● Netzwerkkomponenten ausfallen?● Regionen nicht erreichbar sind?● ...
https://www.a1qa.com/blog/interview-with-daniel-knott-upside-down-testing
12
![Page 13: Chaos Engineering - users.informatik.haw-hamburg.deubicomp/... · Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the](https://reader030.vdocuments.site/reader030/viewer/2022040409/5ec467044e7aca03a87f2d2b/html5/thumbnails/13.jpg)
13.11.2018 Dennis Pietruck
Ursprünge von Chaos Engineering (1/2)
● 2010 Netflix entwickelt Chaos Monkeyfür den Umzug nach AWS
● 2011Simian Army erweitert Chaos Monkey
● 2012Chaos Monkey auf github
Chaos Monkey
Simian Army
13
![Page 14: Chaos Engineering - users.informatik.haw-hamburg.deubicomp/... · Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the](https://reader030.vdocuments.site/reader030/viewer/2022040409/5ec467044e7aca03a87f2d2b/html5/thumbnails/14.jpg)
13.11.2018 Dennis Pietruck
Ursprünge von Chaos Engineering (1/2)
● 2016Gremlin Inc. bietet “Failure as a service”
● 2017Chaos Engineering Book
Gremlin
14
![Page 15: Chaos Engineering - users.informatik.haw-hamburg.deubicomp/... · Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the](https://reader030.vdocuments.site/reader030/viewer/2022040409/5ec467044e7aca03a87f2d2b/html5/thumbnails/15.jpg)
13.11.2018 Dennis Pietruck
Vorgehen
Voraussetzung: Monitoring und Tracing
1. Messbaren stabilen Zustand definieren2. Annehmen, dass dieser Zustand weiter besteht3. Fehler simulieren4. Annahme überprüfen5. Maßnahmen treffen
15
![Page 16: Chaos Engineering - users.informatik.haw-hamburg.deubicomp/... · Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the](https://reader030.vdocuments.site/reader030/viewer/2022040409/5ec467044e7aca03a87f2d2b/html5/thumbnails/16.jpg)
13.11.2018 Dennis Pietruck
Monitoring
Anwendung
Container
Containerplattform
Host OS
(Virtuelle) Hardware
Netzwerk
Whitebox Monitoring (RED-Principle)
Blackbox Monitoring (USE-Principle)
16
![Page 17: Chaos Engineering - users.informatik.haw-hamburg.deubicomp/... · Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the](https://reader030.vdocuments.site/reader030/viewer/2022040409/5ec467044e7aca03a87f2d2b/html5/thumbnails/17.jpg)
13.11.2018 Dennis Pietruck
Tracing
● Zeigt Kommunikation der Services● Zeigt Latenzen● Unterstützt Debugging
[1]A. Blohowiak, A. Basiri, L. Hochstein, und C. Rosenthal, „A Platform for Automating Chaos Experiments“
jaegertracing.io/docs/1.7/architecture/
17
![Page 18: Chaos Engineering - users.informatik.haw-hamburg.deubicomp/... · Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the](https://reader030.vdocuments.site/reader030/viewer/2022040409/5ec467044e7aca03a87f2d2b/html5/thumbnails/18.jpg)
13.11.2018 Dennis Pietruck
Stabilen Zustand definieren
Metriken finden die zeigen, dass das System noch funktioniert
Bei Netflix: “Stream Starts per Second”
Streams per Second im stabilen Zustand A. Basiri u. a., „Chaos Engineering“, IEEE Software, Bd. 33, Nr. 3, S. 35–41, Mai 2016.
18
![Page 19: Chaos Engineering - users.informatik.haw-hamburg.deubicomp/... · Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the](https://reader030.vdocuments.site/reader030/viewer/2022040409/5ec467044e7aca03a87f2d2b/html5/thumbnails/19.jpg)
13.11.2018 Dennis Pietruck
Fehler simulieren
● Nur für ausgewählte Komponenten● Für einen kleinen Anteil der
Benutzer
Tools:
Anwendung
Container
Containerplattform
Datenbank
Host OS
(Virtuelle) Hardware
Netzwerk
Host OS
Availability Zone
Mögliche Fehlerquellen:
Chaos Toolkit Chaos Monkey Toxi Proxy Pumba
19
![Page 20: Chaos Engineering - users.informatik.haw-hamburg.deubicomp/... · Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the](https://reader030.vdocuments.site/reader030/viewer/2022040409/5ec467044e7aca03a87f2d2b/html5/thumbnails/20.jpg)
Forschungsthemen
20
![Page 21: Chaos Engineering - users.informatik.haw-hamburg.deubicomp/... · Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the](https://reader030.vdocuments.site/reader030/viewer/2022040409/5ec467044e7aca03a87f2d2b/html5/thumbnails/21.jpg)
13.11.2018 Dennis Pietruck
Forschungsthemen
● Tooling○ Fehlersimulation○ Automatisierung
■ der Experimente■ der Auswertung
● Anpassung an kleinere Organisationen● Chaos Engineering in nicht technischen Bereichen
21
![Page 22: Chaos Engineering - users.informatik.haw-hamburg.deubicomp/... · Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the](https://reader030.vdocuments.site/reader030/viewer/2022040409/5ec467044e7aca03a87f2d2b/html5/thumbnails/22.jpg)
13.11.2018 Dennis Pietruck
ForschungsthemenPaper
● The Business Case for Chaos EngineeringH. Tucker, L. Hochstein, N. Jones, A. Basiri, und C. Rosenthal
● A Platform for Automating Chaos ExperimentsA. Blohowiak, A. Basiri, L. Hochstein, und C. Rosenthal
● Why is random testing effective for partition tolerance bugs?R. Majumdar und F. Niksic
Konferenzen
● IEEE International Conference on Cloud Computing● IEEE International Symposium on Software Reliability Engineering
22
![Page 23: Chaos Engineering - users.informatik.haw-hamburg.deubicomp/... · Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the](https://reader030.vdocuments.site/reader030/viewer/2022040409/5ec467044e7aca03a87f2d2b/html5/thumbnails/23.jpg)
13.11.2018 Dennis Pietruck
The Business Case for Chaos Engineering
● Quantitative Vorteile○ Return on Investment
Voraussetzung: bestehendes Monitoring & Incident ManagementKosten-Nutzen Verhältnis für Chaos Engineering
● Qualitative Vorteile○ Förderung zur Entwicklung von Widerstandsfähigen Anwendungen ○ Widerstandsfähigkeit als Priorität
23
![Page 24: Chaos Engineering - users.informatik.haw-hamburg.deubicomp/... · Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the](https://reader030.vdocuments.site/reader030/viewer/2022040409/5ec467044e7aca03a87f2d2b/html5/thumbnails/24.jpg)
13.11.2018 Dennis Pietruck
Automating Failure Testing Research at Internet Scale
● Chaos Automation Platform● Konfiguration eines Experiments über ein Dashboard● Automatische Ausführung des Experiments● Auswahl von Metriken über ein Dashboard
Weitere Entwicklung:
● Automatische Konfiguration von Tests● Automatische Auswertung der Tests
24
![Page 25: Chaos Engineering - users.informatik.haw-hamburg.deubicomp/... · Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the](https://reader030.vdocuments.site/reader030/viewer/2022040409/5ec467044e7aca03a87f2d2b/html5/thumbnails/25.jpg)
13.11.2018 Dennis Pietruck
QuellenA. Basiri u. a., „Chaos Engineering“, IEEE Software, Bd. 33, Nr. 3, S. 35–41, Mai 2016.
„Principles of Chaos Engineering“. [Online]. Verfügbar unter: https://principlesofchaos.org/. [Zugegriffen: 10-Nov-2018].
A. Blohowiak, A. Basiri, L. Hochstein, und C. Rosenthal, „A Platform for Automating Chaos Experiments“, in 2016 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW), 2016, S. 5–8.
H. Tucker, L. Hochstein, N. Jones, A. Basiri, und C. Rosenthal, „The Business Case for Chaos Engineering“, IEEE Cloud Computing, Bd. 5, Nr. 3, S. 45–54, Mai 2018.
R. Majumdar und F. Niksic, „Why is random testing effective for partition tolerance bugs?“, Proceedings of the ACM on Programming Languages, Bd. 2, Nr. POPL, S. 1–24, Dez. 2017.
25