openstack trove in production at hp - troveday 2014

Post on 29-Jun-2015

1.096 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presentation by Vipul Sabhaya, Software Development Lead, HP Cloud at OpenStack Trove Day 2014

TRANSCRIPT

August 19, 2014

OpenStack Trove Day

Vipul Sabhaya, Software Development Lead, HP Cloud

Trove in Production at HP

tesora.com 2

What is this about?• Trove• How to deploy Trove with HA• How we do config management• Monitoring Trove• Operating Trove

8/19/14

tesora.com 3

Trove• Database as a Service• MySQL• MongoDB• Cassandra• Postgres• …

• Integrated Openstack Project• Icehouse Release

8/19/14

tesora.com 4

Architecture

8/19/14

tesora.com 5

Which Cloud?• Trove has only API dependencies• Overcloud (bare-metal)?• In-Cloud (vms)?

8/19/14

tesora.com 6

HA Trove• HA OverCloud• Availability Zones

• HA Trove Control Plane• Control Plane across availability zones• Galera Cluster• RabbitMQ Cluster• Multiple Trove API, TaskManager, Conductors

8/19/14

tesora.com 8

How did we get here?• Salt Stack

• Salt-based Trove deployment• https://github.com/saurabhsurana/trove-installer/tree/m

aster/saltstack

• Salt-based Openstack deployment• https://github.com/EntropyWorks/salt-openstack

8/19/14

tesora.com 9

Configuration Management• Helps define/control • Packages and dependencies to be installed• Configuration files to be copied• Users / groups

• Gives a reproducible state of the infrastructure

• Highstate Trove-managed VMs on first boot

8/19/14

tesora.com 10

Remote Execution• No SSH

• Can control infrastructure from single machine

• Can define user and resource level access

• Specifically useful for Trove to help manage DB instances

8/19/14

tesora.com 11

trove-api.slstrove:  user.present:    - name: trove

trove-package:  pip.installed:    - name: trove    - require:      - user: trove

/etc/trove/trove.conf:  file.managed:    - source: salt://trove/api/trove.conf    - template: jinja    - user: trove    - require:      - pip: trove-package      - user: trove

trove-api:  service:    - running    - enable: True    - watch:      - pip: trove-package      - file: /etc/trove/trove.conf

8/19/14

tesora.com 12

trove.conf# Number of child processes to runtrove_api_workers = {{ pillar['trove_worker_threads']}}

# AMQP Connection inforabbit_password = {{ pillar['trove_rabbit_password'] }}rabbit_hosts = {{ pillar['trove_rabbit_hosts'] }}rabbit_userid = {{ pillar['trove_rabbit_userid'] }}

sql_connection = {{ pillar['trove_mysql_connection']}}

{% if not pillar['devstack_setup'] %}

# Updates service and instance task statuses if instance failed become activeupdate_status_on_fail = True

# how long to wait for guest agent to become active (in sec) (default is 300)usage_sleep_time = 30usage_timeout = {{ salt['pillar.get']('trove_guestagent_active_timeout', 600) }}

{% endif %}

# Path to the extensionsapi_extensions_path = {{ pillar['trove_path'] }}/extensions/routes

8/19/14

tesora.com 13

Trove @ HP Helion• Image-based Deploys• TripleO• Trove Heat Templates• Trove Image Elements

• Saltcloud / Nova wrapper -> Salt Master -> Trove

• Seed -> Under -> Over -> Heat -> Trove

8/19/14

tesora.com 14

Operations - SaltStack• Most of the DBaaS operations are based on

SaltStack• HA Deployment of Salt Masters• Control the access to infrastructure with Salt Stack• Control access to customer instances • To help Debug the issues• But protect the data and access to MySQL database

• Each Trove guest instance becomes a minion

8/19/14

tesora.com 15

Trove Upgrades• Trove Datastore must be usable during all upgrades• Upgrades usually involve downtime• RPC Versioning

• Upgrade Sequence that we follow:• Upgrade all the guest agents first (trove service)• Upgrade Task Manager and Conductor• Upgrade API servers• If new RPC method is introduced, it must be available on the

Guest before an api operation is performed

8/19/14

tesora.com 16

Security of key Trove components• Use SSL• Trove API• RabbitMQ

• Security Group• Database• Only Control Plane components needs access

• RabbitMQ• Control Plane and All the guestagent needs access, but use the range where

ever possible

• Use separate DB and RMQ Credentials for each service

8/19/14

tesora.com 17

Monitoring of Trove Service / Instances

8/19/14

• Trove doesn’t ship with monitoring• Upstart scripts respawn Trove services• Monitor Trove API ports with Nagios• Monitor RabbitMQ and DB connectivity from

Control plane nodes

tesora.com 18

Monitoring of key Trove components

• RabbitMQ• Number of Queues• Number of Sockets used• Number of Established Connections• Cluster Status• Failed access attempts

• Database• MySQL standard monitoring• Cluster status• Slow query log• error.log for unauthorized/failed access attempts

8/19/14

tesora.com 19

Monitoring of key Trove components

• Trove Guest Agent Heartbeat status• Trove Instance Audit (catch failed instances

to help identify service issues)• Connectivity to trove instances from outside

8/19/14

tesora.com 20

What we learned?

8/19/14

tesora.com 21

OpenStack Trove : RabbitMQ • RabbitMQ• Up the default socket descriptor limit (as that will blow up

pretty soon)• Number of queues and sockets will keep on growing, if you

don’t enable RabbitMQ connections with heartbeat• Monitoring is the key to deal with RabbitMQ cluster

configured with Mirrored queues

8/19/14

tesora.com 22

OpenStack Trove• GuestAgent Hearbeats (Service Status notifications)

should be monitored for failure• Upgrading the Guest Agent is tricky on xsmall • Quota mismatch between Trove and Nova would be

the biggest reason for instance failures• Resource mismatch between Trove and Nova• Schedule jobs to correct things

8/19/14

tesora.com 23

Thank you

8/19/14

top related