stabilizing the jenga tower: scaling out ceilometer
Post on 11-Aug-2015
121 Views
Preview:
TRANSCRIPT
Stabilizing the Jenga tower: Scaling out Ceilometer
Gordon Chung & Pradeep KilambiEngineers @ Red Hat, Inc.
Our Mission
“To reliably collect measurements of the utilization of physical & virtual resources comprising deployed clouds, persist this data for subsequent retrieval & analysis, and trigger actions when defined criteria are met."
Overview
● Collect physical and virtual resource data ● Transform data to something measurable● Publish data to various targets● Persist data to storage● Retrieve data via API for further analysis, billing,
triggering actions etc.
Collect Transform Publish Persist Retrieve
Architecture (Icehouse)OpenStack Services
Notification Bus
AP
I
External Systems
Notification Agents
Agent1
AgentN
Agent2
Pipeline
Polling Agents
Agent1
AgentN
Agent2
Pipeline
Database
EventsMetersAlarms
Alarm
Evaluator
Alarm
Notifier
Collectors
Collector1
CollectorN
Collector2
Partial HASupport
Active-Active HA support
Ceilometer as it’s perceived
Ceilometer
Cloud Admin
“API response too slow”
“When Ceilometer dies, Glance dies.”
“Ceilometer is leaking memory”
“Ceilometer doesn’t scale”
“HAProxy is messing with MongoDB replica-sets”
“Ceilometer is not Production Ready”
Evolution of Ceilometer
Architecture (Juno)OpenStack Services
Notification Bus
AP
I
External Systems
Notification Agents
Agent1
AgentN
Agent2
Pipeline
Polling Agents
Agent1
AgentN
Agent2
Pipeline
Databases
Alarms
EventsMeters
Alarm
Evaluator
Alarm
Notifier
Collectors
Collector1
CollectorN
Collector2
Partial HASupport
Active-Active HA support
Active/Active Workload Partitioning
Architecture (Kilo)OpenStack Services
Notification Bus
AP
I
External Systems
Notification Agents
Agent1
AgentN
Agent2
Pipeline
Polling Agents
Agent1
AgentN
Agent2
Pipeline
Databases
Alarms
Alarm
Evaluator
Alarm
Notifier
Collectors
Collector1
CollectorN
Collector2
Meters
Events
Active-Active HA support
best Practices
Best Practices (Data Collection)
● Modify your pipeline to match requirements○ Collect only meters you need by tuning pipeline.yaml○ Tweak polling interval as needed
● Enable jittering to polling (Kilo+)● Scale out - add agents as load increases (Juno+)● Use notifier publisher vs rpc publisher (Juno+)
Best Practices (Data Storage)
● Avoid open-ended queries, query on a time range● Install API behind mod_wsgi● Tweak WSGIDaemon settings such as threads and
processes● Set a TTL, expire data to minimise database size● Run mongodb on a separate node
○ Use sharding and replica-sets
Different Strokes for Different Folks
Deployment Scenarios (Lambda Design)
Polling /Notification
AgentsQueue1
Queue2
Short-TermDatabase
ArchiveDatabase
Collector (short-term)
Collector (short-term)
Collector (short-term)
Collector (short-term)
Collector (short-term)
Collector (long-term)
Deployment Scenarios (Data Segregation)
Polling /Notification
AgentsQueue1
Queue2
Database
AuditDatabase
Collector (short-term)
Collector (short-term)Collector (public)
Collector (short-term)
Collector (short-term)
Collector(audit)
Deployment Scenarios (JSON Files)
Polling /Notification
AgentsQueue1
Collector (short-term)
Collector (short-term)Collector Apache Spark
JSON files
Deployment Scenarios (Fraud Detection)
Polling /Notification
AgentsQueue
Collector (short-term)
Collector (short-term)Collector
Proprietary Alerting System
HTTP
Deployment Scenarios (Custom consumers)
Polling /Notification
AgentsKafka Apache Storm
Deployment Scenarios (Debugging)
Polling /Notification
Agents
EventQueue Collectors ElasticSearch
Kibana
OpenStack Services
Deployment Scenarios (Noisy Services)
Notification Bus
Notification Bus
Databases
Alarms
Collectors
Collector1
CollectorN
Collector2
Meters
Events
Notification Agents
Agent1
AgentN
Agent2
Pipeline
Continual Evolution
Continual Evolution
Liberty
● Gnocchi Integration● Building up events● Declarative data collection● Minimise the bloat
Gnocchi: Resource Metering as a Service
● Lightweight time-series metadata● Separate storage and data models for
resources and time-series data● indexer for metrics and resources● Eagerly pre-aggregates metric data● Supports restricted cross-metric
aggregation● Per time-series configurable retention policy
Size matters { "_id": ObjectId("55103dd3bf4d2c7a7de6e319"), "counter_name": "cpu", "user_id": "72bd0799d496476f9eed16d49e0b86e9", "resource_id": "d7f94857-a0d8-4864-8ab1-124055950973", "timestamp": ISODate("2015-03-23T16:22:43Z"), "message_signature": "539736605d14c0aa8c85058e6e9e67a078146f2e80a218d8dc6711c8d6875ae5", "message_id": "d559f244-d178-11e4-9fa9-28b2bd01ed52", "source": "openstack", "counter_unit": "ns", "counter_volume": NumberLong("22450000000"), "recorded_at": ISODate("2015-03-23T16:22:43.412Z"), "project_id": "99fb96cb63624163975dcbf95d7d2d6f", "resource_metadata": { "status": "active", "cpu_number": 1, "ephemeral_gb": 0, "display_name": "inst-3", "name": "instance-00000003", "disk_gb": 0, "kernel_id": "4e303a91-ae5b-43c7-b823-fd6f2cceab4e", "image": { "id": "490af6b0-2402-45d8-bcb1-c81376326e8d", "links": [ { "href": "http://10.162.32.175:8774/837660dc95324be594a0607d80a22c53/images/490af6b0-2402-45d8-bcb1-c81376326e8d", "rel": "bookmark" } ], "name": "cirros-0.3.2-x86_64-uec" }, "ramdisk_id": "7112ea15-3ece-4805-9f23-f6141a6f27b0", "vcpus": 1, "memory_mb": 64, "instance_type": "42", …..}
{"2015-03-23T16:22:43Z" : 1 }
gnocchi datapoint
ceilometer datapoint (mongodb)
Vs
Gnocchi Benchmarks
Gnocchi Benchmarks
Discussions
● operators session - May 19, 2015 (12:05pm) Rm 306● design track - May 20, 2015 (9:00am - 3:30pm)
○ event alarms; ceilometer componentisation ● design track - May 21, 2015 (9:00am - 12:30pm)● speaker session:
○ The Anatomy of an Action - May 21, 2015 (1:30pm)
● irc: #openstack-ceilometer● mailing-list: openstack-dev@lists.openstack.org
Gnocchi
Architecture (Gnocchi)OpenStack Services
Notification Bus
AP
I
External Systems
Notification Agents
Agent1
AgentN
Agent2
Pipeline
Polling Agents
Agent1
AgentN
Agent2
Pipeline
Databases
Alarms
Alarm
E
valuator
Alarm
Notifier
Collectors
Collector1
CollectorN
Collector2
Events
Active-Active HA support
API
Metric Resources
● https://wiki.openstack.org/wiki/ReleaseNotes/Juno● https://wiki.openstack.org/wiki/ReleaseNotes/Kilo● http://nejc.saje.info/ceilometer-central-agent.html● https://julien.danjou.info/blog/2015/openstack-gnocchi-
first-release● https://blog.sileht.net/writing-a-gnocchi-storage-driver-
for-ceph.html
Resources
Thank You
top related