site reliability engineering · site reliability engineering devops on steroids big techday 12,...

Post on 11-Oct-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

SiteReliabilityEngineeringSiteReliabilityEngineering

DevOpsonSteroidsDevOpsonSteroids

BigTechday12,2019-06-07

MaximilianBode

https://techcrunch.com/2016/03/02/are-site-reliability-engineers-the-next-data-scientists/

https://insights.stackoverflow.com/survey/2019

SiteReliabilityEngineeringSiteReliabilityEngineering

1.Grundlagen

2.Prinzipien

3.Praxis

GrundlagenGrundlagen

Source:http://turnoff.us/geek/devops-explained/

Source:https://landing.google.com/sre/books/

SiteReliabilityEngineering

BenTreynor,VPofEngineering,Google

Source:https://www.linkedin.com/in/benjamin-treynor-sloss-207120/

Fundamentally,it’swhathappenswhenyouaskasoftwareengineertodesignanoperations

function.

DevOpsundSREalsKonkurrenten?DevOpsundSREalsKonkurrenten?

class SRE implements DevOps{ }

WerpraktiziertSRE?WerpraktiziertSRE?

Google

Apple

Twitter

Evernote

Atlassian

TheHomeDepot

TheNewYorkTimes

undvielemehr…

PrinzipienPrinzipien

UmgangmitRisikoUmgangmitRisiko

Hopeisnotastrategy.

MetrikenalsBasisfürEntscheidungenMetrikenalsBasisfürEntscheidungen

ServiceLevelObjectivesServiceLevelObjectives

SLI

Indikator

 

SLO

Objective,Ziel 

SLA

Agreement,Übereinkommen

ErrorBudgetsErrorBudgets

BalanceEntwicklungsgeschwindigkeit Zuverlässigkeit

AutomatisierungAutomatisierung

Eliminatingtoil

PraxisPraxis

Projekterfahrung

CloudCloud

InfrastructureasCodeInfrastructureasCode

Source:https://www.terraform.io/logos.html

resource "aws_lambda_function" "serverless_test" { filename = "my_code.zip" function_name = "lambda_function_name" role = "${aws_iam_role.iam_for_lambda.arn}" handler = "serverless.handler" source_code_hash = "${filebase64sha256("my_code.zip")}" runtime = "python3.7" }

ContainersContainers

Source:https://blog.docker.com/2013/06/announcing-new-docker-style/

Source:https://github.com/cncf/artwork

GitOpsGitOps

CI/CDCI/CD

Source:https://about.gitlab.com/press/press-kit/

MonitoringMonitoring

ThreePillarsofObservabilityThreePillarsofObservability

StructuredLogging

Metrics

Traces

MetrikenMetriken

FourGoldenSignalsFourGoldenSignals

Latenz

Traffic

Fehlerrate

Auslastung

MetrikenMetriken

PrometheusPrometheus

Source:https://en.wikipedia.org/wiki/File:Prometheus_software_logo.svg

DashboardsDashboards

AlarmeAlarme

- alert: FlinkJobsMissing expr: sum(flink_api_jobs_running) < 2 for: 3m annotations: summary: Fewer Flink jobs than expected are running.

TeamstrukturTeamstruktur

Ad-Hoc-Aufgabenvs.langfristigeVerbesserungen

OperationsManager

IncidentManagementIncidentManagement

IncidentPostmortemIncidentPostmortem

SchriftlicheAufzeichnungnachZwischenfall

Auswirkungen

Maßnahmen

RootCause

BlamelessKommunikation(intern&extern)

Wasnoch?Wasnoch?

Microservice-Architektur

ChaosEngineering

PsychologicalSafety

HandleyPageW.8,1919

16Passagiere

2Piloten

AirbusA380,2005

853Passagiere

2Piloten

top related