azure + datastax enterprise (dse) powers office365 per user store

22
Azure + DSE Powers O365 Per-User Store © 2015. All Rights Reserved.

Upload: datastax-academy

Post on 15-Apr-2017

931 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store

Azure + DSE Powers O365 Per-User Store

© 2015. All Rights Reserved.

Page 2: Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store

1 Introduction

2 What We Built

3 What to Pay Close Attention To

4 Deployment

5 Wrap Up

© 2015. All Rights Reserved.

Overview

Page 3: Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store

Sean UsherOffice 365Email: [email protected]: @seanushermsft

Introduction

© 2015. All Rights Reserved.

Mahesh ThiagarajanMicrosoft AzureEmail: [email protected]: @_cloudguy

Ben LackeyDataStaxEmail: [email protected]

Page 4: Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store

© 2015. All Rights Reserved.

Introduction – Office 365EmailCollaborationDocument AuthoringSocial NetworkingCalendaringFile StorageBusiness IntelligenceEtc…

Page 5: Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store

© 2015. All Rights Reserved.

Introduction – AzureAzure is Microsoft’s cloud computing platform, a growing collection of integrated services—analytics, computing, database, mobile, networking, storage, and web—for moving faster, achieving more, and saving money.

Page 6: Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store

© 2015. All Rights Reserved.

What We Built - OverviewA way to understand our users and organizations at a deeper level!

• Are users happy with the service they are receiving?• Are users fully utilizing the services they are paying us for?• Are users hitting issues that we can proactively help them with? • How has a user’s experience been over their lifetime?• Can we discover insights that we aren’t even aware of?

This requires ingesting and storing a lot of data. We need to be able to perform fast, scalable analytics on that data, or we will discover issues too late!

Questions:

Page 7: Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store

© 2015. All Rights Reserved.

What We Built – Why CassandraThe Good• Low Latency ✓• Linear Scale ✓• Highly Available ✓• Aggregations (Spark/Spark Streaming) ✓• Machine Learning (Spark ML) ✓• No Enforcement of Full Consistency ✓ ✓ ✓

The Not-So-Good• No Hosted Option in Azure ✗• Have to Install and Configure it Ourselves ✗

Page 8: Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store

Cassandra: 12 NodesAnalytics: 12 Nodes

VM Size: G4Heap Size: 30 GBGC: G1Ingestion: 20k – 50k events/sec

Data on ephemeral SSD drives.RF = 3 in both DCs

Cassandra: 30 NodesAnalytics: 15 Nodes (30 within 1 month)

VM Size: G4Heap Size: 30 GBGC: G1Ingestion: 200k+ events/sec

Data on ephemeral SSD drives.RF = 3 in both DCs

© 2015. All Rights Reserved.

What We Built – DSE Clusters

Cluster 1:

Cluster 2:

Page 9: Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store

What We Built - Pipeline Evolution

REST

API

O36

5Event Hub

Ingestion Worker

(Azure worker role using DataStax C#

driver)

C* Analytics

REST

API

O36

5

KafkaC*/

Spark Streaming

Analytics

G4 – Local SSD

Kafka: G4 – Data DiskZooKeeper: A7 – Data Disk

PaaS Small

G4 – Local SSD

© 2015. All Rights Reserved.

Cluster 1:

Cluster 2:

Page 10: Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store

What to Pay Close Attention To – Azure DisksVHD Storage: No more than 40 VMs per-storage account“… and for a Standard Tier VM, it is about 40 (20,000/500 IOPS per disk)…..”https://azure.microsoft.com/en-us/documentation/articles/azure-subscription-service-limits/

Disk Choice: 1. Local SSD (Ephemeral) – Fast but allows data loss.2. Data Disk (Standard Storage) – No data loss, network-attached which can add latency. 20k IOPs account Limit.3. Data Disk (Premium Storage) – No data loss, network-attached which can add latency. Per-disk IOPs Limit.https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-linux-how-to-attach-disk/ https://azure.microsoft.com/en-us/documentation/articles/azure-subscription-service-limits/#storage-limits

VM

SSD: /dev/sdb

Storage Account(Data Disk)

Storage Account(OS Disk)

OS: /dev/sda

© 2015. All Rights Reserved.

Page 11: Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store

What to Pay Close Attention To – Azure VM Size

VM Size: We chose G4 nodes, but are investigating moving to D14 nodes. Having a larger number of smaller nodes will allow for faster rebuild which can reduce recovery time.https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-size-specs/

© 2015. All Rights Reserved.

Page 12: Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store

What to Pay Close Attention To – Azure NetworkingNetworking: Virtual Network (VNet) vs Public IP 1. Public IPs – Default limit of 5 per subscription. Allows geo-redundant replication over Internet.2. VNet – Define your own subnets and IP ranges. Allows geo-redundant replication via Gateways/Express Route. No bandwidth limit within Vnet.

1. Standard Gateway – Max 100Mbs.2. High-Performance Gateway – Max 200Mbs.3. Express Route – Max 10Gbs.

https://azure.microsoft.com/en-us/documentation/articles/virtual-networks-instance-level-public-ip/ https://azure.microsoft.com/en-us/documentation/articles/vpn-gateway-vnet-vnet-rm-ps/https://msdn.microsoft.com/en-us/library/azure/mt586720.aspx

© 2015. All Rights Reserved.

Mahesh Thiagarajan
Can be increased.
Page 13: Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store

What to Pay Close Attention To – Azure NetworkingTest performance of every dependency and see if it meets the expectations of your application.

Network Performance: Iperf (https://iperf.fr/) – Test bandwidth between two VMs within various DCs

VNet

VM10.1.0.10

Iperf -s

VM10.1.0.11

Iperf –c 10.1.0.10

user@machine:~$ iperf -c 10.1.0.10------------------------------------------------------------Client connecting to 10.1.0.10, TCP port 5001TCP window size: 2.50 MByte (default)------------------------------------------------------------[ 3] local 10.1.0.10 port 42892 connected with 10.1.0.10 port 5001[ ID] Interval Transfer Bandwidth[ 3] 0.0-10.0 sec 45.7 GBytes 39.2 Gbits/sec

© 2015. All Rights Reserved.

Page 14: Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store

What to Pay Close Attention To – Azure Storage Test performance of every dependency and see if it meets the expectations of your application.

Disk: SysBench (https://wiki.gentoo.org/wiki/Sysbench) – Test write throughput and IOPsuser@machine:/mnt$ sysbench --test=fileio --file-total-size=1000G --file-test-mode=rndrw --init-rng=on --max-time=300 --max-requests=0 runsysbench 0.4.12: multi-threaded system evaluation benchmark

<….. Excess Logging Removed….>

Operations performed: 402240 Read, 268160 Write, 858065 Other = 1528465 TotalRead 6.1377Gb Written 4.0918Gb Total transferred 10.229Gb (34.917Mb/sec) 2234.67 Requests/sec executed

Test execution summary: total time: 300.0002s total number of events: 670400 total time taken by event execution: 16.1526 per-request statistics: min: 0.00ms avg: 0.02ms max: 2.20ms approx. 95 percentile: 0.05ms

Threads fairness: events (avg/stddev): 670400.0000/0.00 execution time (avg/stddev): 16.1526/0.00 © 2015. All Rights Reserved.

Page 15: Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store

What to Pay Close Attention To – Cassandra

Metrics!

Need to tune? Al Tobey can help - https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html © 2015. All Rights Reserved.

Page 16: Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store

What to Pay Close Attention To – Cassandra

SSTable Count• Too many SSTables can lead to OOM errors and nodes becoming unavailable.• Watch count and balance compaction throughput with system limits.• SSTable count may spike during repairs if data is inconsistent.

Dropped Mutations• Dropped mutations mean more repairs need to be done.• Impact of dropped mutations can be controlled by tuning write consistency.• Check iostat to see if disk queue is building up or write latency is high.

• iostat -x /dev/sdb 1 5 • Do drops only happen when Spark Jobs batch write? Tune Spark write throughput (

https://github.com/datastax/spark-cassandra-connector/blob/v1.2.5/doc/FAQ.md)

See memtables & flushing in Al’s Tuning Guide.

© 2015. All Rights Reserved.

Page 17: Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store

What to Pay Close Attention To – Cassandra

Pending Compactions• If you aren’t keeping up with compactions, performance will suffer.• Too many SSTables impact read speed, but also can lead to hitting OS limits. See:

• /etc/sysctl.conf - vm.max_map_count• /etc/security/limits.d/cassandra.conf – nofile• /etc/init.d/dse – Certain DSE versions overwrite nofile with: FD_LIMIT=100000

Heap Used• Heap usage changes over time. What works in week one, may not work in week 10.• We used a 20GB heap until nodes started hitting OOM when they needed 25 GB.• Use G1 if at all possible to see GC times decrease, and use a large (25 – 30 GB) heap.• Let G1 tune your young generation heap size.

© 2015. All Rights Reserved.

Page 18: Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store

What to Pay Close Attention To – SparkWe are still learning!

Scheduler Output:

NOT CRON!

Spark UI: Spark Job Logs:If you don’t enable Spark UI for security reasons, ship your Spark logs off box for analysis.

You may also find that jobs fail to read data because partitions are missing or nodes are timing out. This can indicate you are overwhelming Cassandra.

© 2015. All Rights Reserved.

Page 19: Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store

DeploymentUse the Azure/DataStax TemplateAzure will be investing in building more features into the Azure template, and you will get those easier if you use the existing template.

https://www.youtube.com/watch?v=vacp267zLBA&noredirect=1https://github.com/DSPN/azure-resource-manager-dse

We Didn’t Use the Template because it wasn’t ready yet. We had to write our own logic to deploy nodes and need to transition to the template so we can get all of these new features. We are scheduling time to do this because it will save us a lot of work!

Consider Security and Compliance: This will influence how you deploy (VNet vs Public IP), what Cassandra configuration you use (internode encryption, require_client_auth: true), and what OS configuration you use (CIS standards).

C* Hardening: http://thelastpickle.com/blog/2015/09/30/hardening-cassandra-step-by-step-part-1-server-to-server.html CIS Standards: https://benchmarks.cisecurity.org/downloads/show-single/?file=ubuntu1404.100

© 2015. All Rights Reserved.

Page 20: Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store

Azure Templates can:• Ensure Idempotency• Simplify Orchestration• Simplify Roll-back• Provide Cross-Resource

Configuration and Update Support

Azure Templates are: • Source file, checked-in• Specifies resources and

dependencies (VMs, WebSites, DBs) and connections (config, LB sets)

• Parametized input/output

Instantiation of repeatable config.Configuration Resource Group

Power of Repeatability

SQL - A Website VirtualMachines

SQL-AWebsite[SQL CONFIG] VM (2x)

DEPENDS ON SQLDEPENDS ON SQL

SQLCONFIG

Page 21: Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store

MICROSOFT CONF IDENT IAL – INTERNAL ONLY

Extending the power of your VMEnable easier managementSupport partner ecosystemFull control still with you!

Azure VM Extensions

IaaS extended

Azure

Curated ExtensionsAgent

Page 22: Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store

Thank youSean UsherOffice 365Email: [email protected]: @seanushermsft

Mahesh ThiagarajanMicrosoft AzureEmail: [email protected]: @_cloudguy

Ben LackeyDataStaxEmail: [email protected]