enforcing application sla with congress and monasca
TRANSCRIPT
Enforcing Application SLAs with Congress and MonascaFabio Giannetti, Ken Owens
April 28, 2016
2© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
• Vision• Congress and Monasca implementing:
• OPS/NOC SLA Policies• App Intent SLA Policies
• Current State and Next Steps
Outline
3© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Vision
4© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
• Application owners/developers do not care about the underlining infrastructure unless it is a problem.
• Microservices based architectures demands inherently granular application design.
• SLAs for applications must be holistic and independent of the underlining infrastructure
Vision
Host
Virtualization VirtualizationContainer Container
Container Container
Srvc Srvc Srvc Srvc Srvc Srvc Srvc
Application A Application B
5© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Enable business/application owners to easily define the aspects that are relevant in running their applications with the budget constraints that are imposed by IT.
Vision
6© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Monitoring is now holistic and has to consider various level of virtualization and harmonize data over the different layers.
Containers are short lived and moved around the available infrastructure.
Vision
Host
Virtualization VirtualizationContainer Container
Container Container
7© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Application owners’ soft limits (alarms) are notified back and hard limits (actions) are performed whenever required.
Vision
8© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
OPS/NOC SLA using Congress and Monasca
9© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Underutilized Servers OPS/NOC Policy Example
error(vm, email) :-nova:server_owner(vm, owner),two_months_before_today(start, end),
ceilometer:statistics(vm, start, end, “cpu-util”, cpu),cpu < 5,keystone:email(owner, email)
two_months_before_today(start, end) :-date:today(end),date:minus(end, “2 months”, start)
If a VM has less than 5% CPU utilization for the last 2 months, then notify its owner via email
10© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Current Solution
Ceilometer API
Congress APIPolicy Engine
Ceilometer Datasource
GET /v2/meters/cpu_util/statistics?resource_id=…
VM UUID (Resource ID) CPU
xxxxxxxx-0001-xxxx-xxxxxxxxxxx
xxxxxxxx-0002-xxxx-xxxxxxxxxxx
xxxxxxxx-0003-xxxx-xxxxxxxxxxx
xxxxxxxx-0004-xxxx-xxxxxxxxxxx
xxxxxxxx-0005-xxxx-xxxxxxxxxxx
Poll every <n>s403027055
11© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Current Solution
Congress APIPolicy Engine
Ceilometer Datasource
VM UUID (Resource ID) CPU
xxxxxxxx-0001-xxxx
xxxxxxxx-0002-xxxx
xxxxxxxx-0003-xxxx
xxxxxxxx-0004-xxxx
xxxxxxxx-0005-xxxx
403027055
Nova API
Nova Datasource
Keystone Datasource
Keystone API
VM Owner
xxxxxxxx-0001-xxxx Ann
xxxxxxxx-0002-xxxx Fabio
xxxxxxxx-0003-xxxx Fabio
xxxxxxxx-0004-xxxx Ken
xxxxxxxx-0005-xxxx Ken
Owner Email
Fabio [email protected]
VM Email
xxxxxxxx-0003-xxxx
12© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
From Policy to Alarmerror(vm, email) :-
nova:server_owner(vm, owner),two_months_before_today(start, end),
monasca_alarms:stats(vm, start, end, “cpu.user_perc”, cpu),cpu < 5,keystone:email(owner, email)
two_months_before_today(start, end) :-date:today(end),date:minus(end, “2 months”, start)
{ "name":"Average CPU percent is less than 5", "description":"The average CPU percent is lesser than 5", "expression":"(avg(cpu.user_perc{resource_id=vm}) < 5)", "match_by":[ "resource_id" ], "severity":”HIGH", "ok_actions":[ ”action_id_for_ok" ], "alarm_actions":[ ”action_id_for_alarm" ]}
13© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Proposed Solution (receiving notif.)
MetricsDB
Monasca Agents
Monasca API
Notification Engine
Threshold Engine Persister
Kafka Cluster
Congress API
Policy Engine
Monasca Alarm Datasource
Webhook:…/v1/data-sources/monasca_alarm?execute&action=handle_alarm
Settings DB
monasca notification-create congress WEBHOOK http:…/v1/data-sources/monasca_alarm?execute&action=handle_alarm
handle_alarm(params)
VM UUID (Resource ID) CPU
xxxxxxxx-0003-xxxx 2
POST /v2.0/alarm-definitions
14© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Proposed Solution (receiving notifications)
Congress API
Policy Engine
Monasca Alarm Datasource
VM UUID (Resource ID) CPU
xxxxxxxx-0003-xxxx 2
Nova API
Nova Datasource
Keystone Datasource
Keystone API
VM Owner
xxxxxxxx-0003-xxxx Fabio
Owner Email
Fabio [email protected]
VM Email
xxxxxxxx-0003-xxxx
15© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Application Intent SLA using Congress and Monasca
16© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
VM Evacuation for Biz Critical App if Host has potential health issues App Intent Policy Example
error(vm) :- nova:show(vm, hostID), monasca_alarm:host_issues(hostID)
If a Host has issues, for instance:
1. Unhealthy: cannot be pinged and or SSH into
2. Network errors and packet loss
3. Disk space below certain threshold
17© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
App Intent Policy: Metrics Correlationerror(vm) :- nova:show(vm, hostID), monasca_alarm:host_issues(hostID)
Metric Name Dimensions Valuehost_alive_status observer_host=fqdn,
hostname=supplied hostname being checked,test_type=ping or ssh
0=online, 1=offline
disk.space_used_perc device, mount_point The percentage of disk space that is being used on a device
net.in_packets_dropped_sec device Number of inbound network packets dropped per second
net.out_packets_dropped_sec device Number of outbound network packets dropped per second
18© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
App Intent Policy: Multi-Alarms #1{ "name":”Host is Unhealty", "description":"The host is considered unhealty", "expression":"(host_alive_status{host_id=hostID}) = 1)", "match_by":[ "host_id" ], ...}
{ "name":”Host disk getting full", "description":"The host disk is reaching capacity", "expression":"(disk.space_used_perc{host_id=hostID}) > 90)", "match_by":[ "host_id" ], ...}
Metric Name Valuehost_alive_status 0=online, 1=offline
disk.space_used_perc The percentage of disk space that is being used on a device
net.in_packets_dropped_sec Number of inbound network packets dropped per second
net.out_packets_dropped_sec
Number of outbound network packets dropped per second
19© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
App Intent Policy: Multi-Alarms #2{ "name":”Host is Unhealty", "description":"The host is considered unhealty", "expression":"(net.in_packets_dropped_sec{host_id=hostID}) > 30)", "match_by":[ "host_id" ], ...}
{ "name":”Host disk getting full", "description":"The host disk is reaching capacity", "expression":"(net.out_packets_dropped_sec{host_id=hostID}) > 30)", "match_by":[ "host_id" ], ...}
Metric Name Valuehost_alive_status 0=online,
1=offline
disk.space_used_perc The percentage of disk space that is being used on a device
net.in_packets_dropped_sec Number of inbound network packets dropped per second
net.out_packets_dropped_sec Number of outbound network packets dropped per second
20© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Current State and Future Work
21© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Overall Architecture
Settings DB
MetricsDB
Monasca Agents
Monasca API
Keystone
Notification Engine
Threshold Engine Persister
Kafka Cluster
Congress APIPolicy Engine
Monasca Alarm Datasource
Metric Valuemetric1 val1
metricN valN
In Mem DB
webhookrpc
22© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
• Done:• Developed a Monasca Datasource to validate integration.• Designed the solution and found the main integration points
• To be Done:• Developed a Monasca Alarm Datasource leveraging the RPC
capabilties in Congress.• Create a Congress Notification Webhook for Monasca• Develop a policy to alarm conversion component to develop
policies prefixed with monasca-alarm.
Current Status and Next Steps
OpenStack SummitAustin, Texas 2016
Thank You!