tetration analytics - network...
TRANSCRIPT
Tetration Analytics - Network Analytics & Machine Learning Enhancing Data Center Security and Operations
Mike Herbert, Principal Engineer, INSBU
BRKDCN-2040
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
• Tetration (or hyper-4) is the next hyperoperation after exponentiation, and is defined as iterated exponentiation
• It’s bigger than a Google [sic] (Googol)
• And yes the developers are a bunch of mathematical geeks
Okay what does Tetration Mean?
BRKDCN-2040 3
Tetration Analytics Platform
Introduction
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
We Are at the Cusp of a Major Shift
DIGITAL EXPERIENCESEFFICIENCY SIMPLICITY | SPEED
Adoption
Curve
IT as a Service IaaS | PaaS | SaaS | XaaS
Flexible Consumption Models
CONSOLIDATIONVIRTUALISATION
HYBRID
CLOUDS
2000 2010 2015 The Next 5+ Years
AUTOMATION
TRADITIONAL DATA CENTRE
We are here
CLOUD DATA CENTRE
Efficiency
BRKDCN-2040 5
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Modern data centers are getting increasingly complex
• Zero trust model
• Multi cloud orchestration
• Application portability
Hybrid cloud
• Increase in east-west traffic
• Expanded attack surface
• Open source
Big and fast data
• Continuous development
• Application mobility
• Micro services
Rapid app deployment
BRKDCN-2040 6
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
What if you could actually look at every
data packet header that has ever
traversed the network without sampling?
BRKDCN-2040 7
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Tetration Analytics Platform Every Packet, Every Flow, Every Speed
BRKDCN-2040 8
Cisco Tetration Analytics™
Network
Pervasive
Visibility
and Forensics
Application
Insight
Policy
Compliance
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco Tetration Analytics
Application
Insights
Policy Simulation
and Impact
Assessment
Automated
Whitelist Policy
Generation
Forensics:
Every Packet,
Every Flow, Every
Speed
Policy Compliance
and Auditability
BRKDCN-2040 9
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco Tetration AnalyticsPervasive Sensor Framework
Provides correlation of data sources across entire application infrastructure
Enables identification of point events and provides insight into overall systems behavior
Monitors end-to-end lifecycle of application connectivity
BRKDCN-2040 10
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Information
about Consumer
– Provider and
type of traffic
Detail
information
about the flow
Datacenter Wide Traffic Flow Visibility
BRKDCN-2040 11
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Application Discovery and Endpoint Grouping
Cisco Tetration
Analytics™
Platform
BM VM VM BM
BM VM VM BM
Brownfield
BM VM VM VM BM
Cisco Nexus® 9000 Series
Bare-metal, VM, & switch telemetry
VM telemetry (AMI …)
Bare-metal & VM telemetry
BM VM
BMVM
VM BM
VMVM
VM BM
BMVM
BM
Network-only sensors, host-only sensors, or both (preferred)
Bare metal and VM
On-premises and cloud workloads (AWS)
Unsupervised machine learning
Behavior analysis
BRKDCN-2040 12
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Whitelist Policy Recommendation
Application Discovery
AppTier
DBTier
Storage
WebTier
Storage
Policy Enforcement(Future Roadmap)
Whitelist Policy Recommendation(Available in JSON, XML, and YAML)
BRKDCN-2040 13
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Real-Time and Historical Policy Simulation
• Validating policy impact assessment in real time
• Simulating policy changes over historic traffic
• View traffic “outliers” for quick intelligence
• Audit becomes a function of continuous machine learning
Cisco Tetration
Analytics™
PlatformVM BM
VMVM
BM VM
VMVM
VM BM
VMVM
VM
BRKDCN-2040 14
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Policy Compliance
• Identify policy deviations
in real-time
• Review and update
whitelist policy with one click
• Policy lifecycle
management
VM BM
VMVM
BM VM
VMVM
VM BM
VMVM
VM
Cisco Tetration
Analytics™
PlatformVM
BM
VM
BRKDCN-2040 15
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Tetration Analytics
Servers
Buffer Stats
Process
User
Compute
Application
InsightsPolicy Forensics
Tetration Analytics EnginePB Scale Secure Appliance
Ecosystem
Partners
Network
Network flows
Ap
plic
ation
Dep
en
dency
Ap
plic
ation
Pe
rfo
rman
ce
Au
tom
ation &
Com
plia
nce
En
forc
em
ent
Infr
astr
uctu
re
Be
ha
vio
ral
An
om
alie
s
BRKDCN-2040 16
Tetration Analytics Platform
Architecture - Sensors
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Tetration Analytics Architecture Overview
Analytics Engine
Cisco Tetration
Analytics™
Platform
Visualization and
Reporting
Web GUI
REST API
Push Events
Data Collection
Host Sensors
Network Sensors
3rd-Party
Metadata Sources
Tetration
Telemetry
Configuration
Data
Cisco Nexus®
92160YC-X
Cisco Nexus
93180YC-EX
VM
BRKDCN-2040 18
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Pervasive Sensors
Host Sensors NW Sensors 3rd Party
Geo
Whois
IP Watch Lists
Load Balancers
…
Linux VM
Windows Server VM
Bare Metal(Linux and Windows Server)
Hypervisors
Containers
Available at FCS Next Generation 9K switches Future releases 3rd party Data Sources
Low CPU Overhead (SLA enforced)
Low Network Overhead (SLA enforced)
Highly Secure (Code Signed, Authenticated)
Every flow (No sampling), NO PAYLOAD
Nexus 9200-X
Nexus 9300-EX
BRKDCN-2040 19
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Traditional Monitoring Is Showing Its AgeNot suited for Modern Network and Security Operations
Where Data Is Created Where Data Is Useful
Non
Real
time
SNMP
CLI
Syslog
SNMP
CLI
Syslog
SNMP
Server
Syslog
Collector
Scripts
Storage & Analysis
Strong burden on
back-end
Normalize different
encodings, transports, data
models, timestamps
BRKDCN-2040 20
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Streaming Telemetry is a game changerMonitoring becomes a big data problem
Where Data Is Created Where Data Is Useful
• Streaming paradigm
• Dense Sensor Framework
• Increased Data Granularity
• Update on every event
• Multiple Data Sources
Volume – Scale of Data
Velocity – Analysis of Streaming Data
Variety – Different Forms of Data
Removing limitations and
complexity
Big Data and
Machine Learning
Problem
Real
time
BRKDCN-2040 21
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Why Multiple Sensors?Example monitoring temperature in a room
Lamp Sensor Plug Sensor
Heater
BRKDCN-2040 22
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Tetration SensorsLocations
9732C-EX
LC
HYPERVISORHYPERVISOR
92160CY-X
93180Y-EX
HYPERVISOR
Software Sensor
Processes & Socket
Packet and Flow Events
Hardware Sensor
Packet and Flow Events
Buffer and Switch State
Tetration Cluster
BRKDCN-2040 23
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
• Embedded Module (Flow Cache)
• Nexus 92160CY-X
• Nexus 93180Y-EX & 9732C-EX Line Cards
• Extracts Meta-Data from the forwarding pipeline
• No latency impact, no performance impact
Hardware Sensor
PRX LUA LUB
Flow Cache
LUC
BRKDCN-2040 24
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
• Not in the data path
• Sits in User Space
• Designed by Kernel Developers
• Secure
• Code Signed
• SLA Enforcement
• CPU and BW throttling
• FCS availability
• Windows
• 2008 / 2008 R2 / 2012 / 2012 R2
• Linux
• RedHat (5.3+, 6.x)
• CentOS (5.11+, 6.x)
• Ubuntu (12.04, 14.04, 14.10)
Software Sensor
NIC
Driver
Network Stack
Application
libpcap
Tetration Sensor
BRKDCN-2040 25
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
• Tetration Cluster runs an internal PKI
• Root CA is per cluster, inserted at Image creation
• Not accessible outside the cluster
• Cannot connect to an external PKI
• Certificate based authentication is performed for the Control Channel
• CN of the certificate is the IP address
• Certificates are rotated every 60 days
• Sensors are code signed
• Signature Authority is Cisco’s code signing certificate
• Code Signature is validated at process start
PKI within the Cluster/Sensor
BRKDCN-2040 26
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
How Sensor Communicate with the Cluster the First Time?
Config Server
Collector
Rails
Sensor
Register with web server via ssl
Assign UUID
Register with web server via ssl
Download config
Send meta data to collectors
BRKDCN-2040 27
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Components & CommunicationHardware Sensor
ASIC
NXOS Agent
Cisco Nexus 9000
Tetration
Cluster
Control Channel
TCP/443
Sensor Data
UDP/5640
Guest ShellAgent Communication
Unix Socket
BRKDCN-2040 28
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Components & CommunicationSoftware Sensor
Software Sensor
LINUX/Windows/…
Tetration
Cluster
Control Channel
TCP-SSL 443
Sensor Data
TCP-SSL 5640
Agent Communication
Unix Socket
BRKDCN-2040 29
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
• Windows 2008• Datacenter, Enterprise, Essentials,
Standard
• Windows 2008 R2• Datacenter, Enterprise, Essentials,
Standard
• Windows 2012• Datacenter, Enterprise, Essentials,
Standard
• Windows 2012 R2• Datacenter, Enterprise, Essentials,
Standard
• RedHat Enterprise Server• 5.3 & above
• 6.x
• CentOS• 5.11 & above
• 6.x
• Ubuntu• 12.04
• 14.04
• 14.10
Currently Supported Platforms
This list ’will’ grow based on what you need and ask for
BRKDCN-2040 30
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Methods to deploy the sensor
BRKDCN-2040 31
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Coming soon to a GitHub near you
github.com/datacenter
BRKDCN-2040 32
Tetration Analytics Platform
Architecture - Sensor Data
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Looking Beyond ConnectivityApplication Processes and Sockets
Provider/Service ProcessConsumer Process
Socket = 443Socket > 1023
Chrome NGINX
• Application developers implement business logic as code that runs as processes and threads
• TCP/IP which forms a foundation of the Internet was designed to allow these application processes
to interact via sockets
• Application logic can be viewed on one level as the interaction between a group of processes and
their associated sockets
• Understanding the inter-process communication and mapping that directly to the infrastructure
provides a direct correlation between the application and the infrastructure
BRKDCN-2040 34
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 35BRKDCN-2040
Looking Beyond ConnectivityApplication Processes and Sockets
#create an INET, STREAMing socket
serversocket = socket.socket(
socket.AF_INET, socket.SOCK_STREAM)
#bind the socket to a public host,
# and a well-known port
serversocket.bind((socket.gethostname(), 80))
#become a server socket
serversocket.listen(5)
#create an INET, STREAMing socket
s = socket.socket(
socket.AF_INET, socket.SOCK_STREAM)
#now connect to the web server on port 80
# - the normal http port
s.connect(("www.mcmillan-inc.com", 80))
Provider/Service ProcessConsumer Process
Socket = 80Socket > 1023
Chrome NGINX
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 36BRKDCN-2040
What do we mean by Application VisibilityInternet Stack
Application
Transport
Network
Data Link
Physical
Application
Transport
Network
Data Link
Physical
Network
Data Link
Physical
Network
Data Link
Physical
Sockets
ProcessProcess
Sockets
ProcessProcess
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 37BRKDCN-2040
What Does Tetration Sensor CollectSocket Connectivity, the data flows
Application
Transport
Network
Data Link
Physical
Application
Transport
Network
Data Link
Physical
Network
Data Link
Physical
Network
Data Link
Physical
Sockets
ProcessProcess
Sockets
ProcessProcess
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 38BRKDCN-2040
What does the Sensor Collect Context
Application
Transport
Network
Data Link
Physical
Application
Transport
Network
Data Link
Physical
Network
Data Link
Physical
Network
Data Link
Physical
Sockets
ProcessProcess
Sockets
ProcessProcess
Process Information:
Which process is it, who started it, etc.
Device Information: Buffer/ACL Drops, etc.
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Sensor DataProcess Information
• Host Sensor collects
information about the
consumer and provider
processes
• /proc
• runtime system
information (e.g.
system memory,
devices mounted,
hardware
configuration, etc).
BRKDCN-2040 39
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 40BRKDCN-2040
Additional ContextExternal Data Sources
Application
Transport
Network
Data Link
Physical
Application
Transport
Network
Data Link
Physical
Network
Data Link
Physical
Network
Data Link
Physical
Sockets
ProcessProcess
Sockets
ProcessProcess
Tetration
Analytics Engine
CMDB, DNS,
whois, Talos
(future), etc.
Pervasive Sensors
APIC
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
What does the Sensor CollectSocket Level Flow Information + Context Information
• Understanding of what happens TO ‘and’ INSIDE a flow
• Distributions (packet sizes, TCP windows…)
• Burstiness
Length
66
Length
9000
Accumulated Flow Information (Volume…)
Per Packet Variations
• Anomaly detection
• Latency (application and network)
• Events
• VXLAN information
BRKDCN-2040 41
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Full vs. Sampled What happens when you sample?
Full Packet Stream
Flow A
Flow B
Flow C
SYN SYNACK ACK FIN
Flow D
BRKDCN-2040 42
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Full vs. Sampled Reasons and Use Cases for Both
• Sampling has it’s use cases, in SP environments for example
• High Volume, no behavioral analysis
• Sampling provides a good statistical model
• For Trends
• For Traffic Visibility
• For Volume Indication
• Depending on the number of flows and type of flows
• Mice flows can go completely unseen
• Connection Oriented flows may not be tracked properly (missed flags)
• Accuracy of the flow increases with the packet count
• Type of sampling and quality of entropy
• Entropy is very important
Sampled Full
BRKDCN-2040 43
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Tetration Examines every packet
Full Packet Stream
• Variability ’within’ the flow
• Variability ‘between’ the flows
• Changes ‘within’ the flow
BRKDCN-2040 44
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Ethernet
Header
IP
Header
UDP
Header
VXLAN
Header
Ethernet
Header
IP
Header
TCP
HeaderPayload
Ethernet
Header
IP
Header
TCP
HeaderPayload
Ethernet
Header
IP
Header
UDP
HeaderPayload
Meta-Data – Including Overlay VXLAN/GRE/IPinIP Encapsulated Header
Privacy Risk
Collects the Meta-Data not the Packet
BRKDCN-2040 45
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
• COS
• Overlay Type (Native, 802.1q / 802.1p, VXLAN, iVXLAN, NVGRE, NSH, other)
• Source TEP or Port ID
• Destination TEP
• Disposition (RPF or Port Security failure, Policy drop, redirect or span)
• Port type (spine to leaf or leaf to host)
Sensor DataFlow Data – Forwarding
BRKDCN-2040 46
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
• Bytes, Packet Count
• IP options present
• IP length error
• DF bit set
• Fragment seen
• Last TTL
• Accumulated TCP flags
• Last ACK / SEQ
• Sampled Packet length
• Sampled Packet ID
Sensor Data Accumulated Flow Information
BRKDCN-2040 47
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
• Flow Cache has the notion of “bins” to build histograms
• TCP options length (8 bits)
• Payload length (12 bits)
• Receive window (6 bits)
• This means more visibility on the activity of flow
• Bin sizes are configurable
• Bins don’t need to be of equal size (but need to be contiguous)
• Last bin will capture the configured size and above
Sensor Data Histogram Bins
1 0 1 0 1 0 0 0
#1
82 bits
#2
82 bits
#3
0 bits
#4
165 bits
Export
0 0 1 1 1 0 0 0
#5
82 bits
#6
82 bits
#7
130 bits
#8
165 bits
Export
=Histogram of
the flow
BRKDCN-2040 48
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
• Measure the “burstiness” of a flow
• Current Burst
• Max Burst
• Burst Index
• Flowlets
• Burst are measured in 32k interval
• Each export period is divided by 128
• Flowlets are activity after a silence period (configurable)
Sensor Data Burst
0 1 2 3 128
Current – 128
Max – 128
Burst Index - 0
Current – 256
Max – 256
Burst Index - 3
Current – 1024
Max – 1024
Burst Index - 80
8030
Current – 32
Max – 256
Burst Index - 3
Current – 0
Max – 1024
Burst Index - 80
Max Burst occurred at 62.5ms with a value of 1024 and 2 flowlets
SilenceFlowlet #1 Flowlet #2
BRKDCN-2040 49
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
• TTL changed
• IP reserved flags are not 0
• DF bit has changed
• Ping of death
• Fragment is too small to contain L4 header (TCP, UDP and SCTP)
• TCP SYN and FIN are set
• TCP SYN and RST are set
• TCP FIN, PSH and URG are set
• TCP flags are zero’d
• TCP SYN with data
• TCP FIN with no ACK
• TCP RST with no ACK
• TCP SYN, FIN, RST and ACK zero’d
• URG set but no URG pointer
• URG pointer with no URG flag
• TCP seq outside the expected range
• TCP seq is less than expected (rexmit)
Sensor Data Anomaly List
BRKDCN-2040 50
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
• Way of approximating the RTT based on specific packet characteristics
• Preset ACK & SEQ
• Approximation as this includes the OS network stack
• Uses sampling, sample taken every 8192 bytes (by default, configurable)
• Tracks ACK for these specific SEQ and creates an event for each
• By using this global configuration, if return path is via another switch the ACK is still tracked
Sensor Data Events RTT Sample
BRKDCN-2040 51
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
RTT Sample – Example
TCP SEQN 8192
Event Triggered
TCP SEQN 100
Event time stamped
TCP ACK 100 TCP ACK 8192
Event time stamped
RTT = Event ACK TS – Event SEQ TS
BRKDCN-2040 52
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
• Mouse Packet
• Export the first “n” packets of a flow (configurable)
• Analytics Changed
• A parameter of the flow has changed (bit mask comparison), 1 mask configurable
• Packet Value Match
• A packet field contains a specific value, 1 field configurable (mask + value)
Sensor Data Events
BRKDCN-2040 53
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
• Let’s take this Web page request as an example
• Assumption is that it’s the first connection, this is a new flow
• One flow is created per direction
Sensor Data Example
76 77 78 79 82 83 84 85 86 88 89
Flow Export
Flow A A A A A AB B B B B
BRKDCN-2040 54
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
First Packet Event
76 77 78 79 82 83 84 85 86 88 89
Event Triggered
BRKDCN-2040 55
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Mouse Packet Event
76 77 78 79 82 83 84 85 86 88 89
Length 78 74 66 178 66 1299 66 66 66 66 66
• n = 2 (2nd packet of a flow, within an export interval)
Event Triggered
BRKDCN-2040 56
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Analytics Changed Event
76 77 78 79 82 83 84 85 86 88 89
Event Triggered
• Bitmask = sampled packet length (in the flow analytics TCAM)
Length 78 74 66 178 66 1299 66 66 66 66 66
Sampled 78 78 66 66 66 1299 66 66 66 66 66
BRKDCN-2040 57
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Packet Value Match Event
76 77 78 79 82 83 84 85 86 88 89
Event Triggered
TTL 64 58 64 64 58 58 64 64 58 58 64
• TTL = 64
BRKDCN-2040 58
Pervasive VisibilityFlow Search and Forensics
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKDCN-2040 60
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKDCN-2040 61
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKDCN-2040 62
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKDCN-2040 63
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKDCN-2040 64
Tetration Analytics Platform
Architecture - Cluster
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Tetration Analytics Architecture Overview
Analytics Engine
Cisco Tetration
Analytics™
Platform
Visualization and
Reporting
Web GUI
REST API
Push Events
Data Collection
Host Sensors
Network Sensors
3rd-Party
Metadata Sources
Tetration
Telemetry
Configuration
Data
Cisco Nexus®
92160YC-X
Cisco Nexus
93180YC-EX
VM
BRKDCN-2040 66
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
The Analytics ClusterComponents
• Hadoop Based Platform
• Self managed
• One touch deployment
• Tiered System
• Heavy Compute for Machine Learning
• Caching for light speed queries
• Extensibility (future)
• Messaging Bus
• API Access
Long Term Storage
(Data Lake)
Caching
(Search)
Front End
Compute
(Data Cleaning and
Analytics)
BRKDCN-2040 67
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
• The Analytics Cluster operates as an appliance
• Avoids the need for in house Big Data, Analytics expertise
• Supported by Cisco TAC
• Self Monitoring
• The cluster leverages a sensor architecture to track it’s state and provides event based notifications for
• Software upgrades and full install are all automated
The Analytics ClusterAppliance
BRKDCN-2040 68
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 69BRKDCN-2040
Cluster Monitoring and Maintenance
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 70BRKDCN-2040
Collector Monitoring and Maintenance
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 71BRKDCN-2040
Sensor Monitoring and MaintenanceSensor Throttled
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 72BRKDCN-2040
Hardware Sensor Monitoring
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
FCS Analytics Cluster Configurations
4 x 3-Phase PDU
22.5 KW Peak Power
4 x 1-Phase PDU
11.5 KW Peak PowerBRKDCN-2040 73
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 74BRKDCN-2040
Options for Future Cluster Models
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Analytics EngineThe Platform
• Hadoop Based Platform
• Self managed
• One touch deployment
• Tiered System
• Heavy Compute for Machine Learning
• Caching for light speed queries
• Extensibility (future)
• Messaging Bus
• API Access
Long Term Storage
(Data Lake)
Caching
(Search)
Front End
Compute
(Data Cleaning and
Analytics)
BRKDCN-2040 75
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 76BRKDCN-2040
Front EndGUI, RESTful API, Messaging BUS
• Servers hosting
front end
processes
• GUI and
Operational
Interfaces
• RESTful API
(post FCS)
• Messaging BUS
(post FCS)
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 77BRKDCN-2040
Data ProcessingPipeline
• Data Ingest and Processing
• Multiple Pipelines for different processing activities
• Scaled to Millions of events per second
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 78BRKDCN-2040
Caching LayerNatural Language Search
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 79BRKDCN-2040
Caching LayerSearch
• Caching Layer provides a large in memory and flash based data store for real time searches
e.g. 16 weeks of policy delta data accessible for real time search
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 80
Data Lake HDFS Storage
• Long Term Storage for collected observations, for pipeline processing tasks, etc
• Usage is based on
• Time Based Retention
• Space Based Retention
• Greedy Retention
• Max possible Retention period will depend on cluster size and observation rate
14.10 K hours of available capacity at the current collection rates (587 days)
BRKDCN-2040
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Standard Data Analytics PipelineTetration Data Analysis
Data
Prep &
cleansing
Data
Aggregation
Statistical Analysis
&
Prediction Tools
Automated
Data
Discovery&
Evaluation
Reporting,
Visualization
or Alerts
De-duplication, unification of uni-directional flows into bi-directional,
annotate flows with context information, etc.
Sensor Collectors
Various Pipelines (e.g. ADM) process the data to derive
appropriate insights
GUI, REST API, Kafka, Policy Export, …
BRKDCN-2040 81
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 82BRKDCN-2040
Data Collection Sensor to Collector
Data
Prep &
cleansing
Data
Aggregation
Statistical Analysis
&
Prediction Tools
Automated
Data
Discovery&
Evaluation
Reporting,
Visualization
or Alerts
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 83BRKDCN-2040
Data Prep
Data
Prep &
cleansing
Data
Aggregation
Statistical Analysis
&
Prediction Tools
Automated
Data
Discovery&
Evaluation
Reporting,
Visualization
or Alerts
• De-duplication, unification of uni-directional flows into bi-directional, annotate flows with context information, etc.
Collector
Collector
Application
Transport
Network
Data Link
Physical
Application
Transport
Network
Data Link
Physical
Network
Data Link
Physical
Network
Data Link
Physical
Sockets
ProcessProcess
Sockets
ProcessProcess
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 84BRKDCN-2040
Analyzing the Data
Data
Prep &
cleansing
Data
Aggregation
Statistical Analysis
&
Prediction Tools
Automated
Data
Discovery&
Evaluation
Reporting,
Visualization
or Alerts
• Endpoints are iteratively compared with each other to find which “profiles” are most similar
• Sensor Data: Ports provided and consumed, Addresses sent and received from, Properties of network flows, Running processes, Process originating flow, Hostname,
• External Context: Load balancers / DNS / route tags
• Human approved clusters from current or other workspaces and base cluster definition
• This is an example of where we use machine leaning
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Machine Learning
Cognitive Computing - Finding and remembering all the
relationships between data, querying the matrix of relationships
(Watson)
Machine Learning - Remember what has happened before and
then look at new data coming in that context to try and find
patterns, build up a body of knowledge and then use that data to
make a decision based on the new data. Can machines
remember and apply what they remember to new data
Deep Learning - Not trying to maintain data and relationships
over time but analyze that data through better representations
and create model to learn these representations from large scale
unlabeled data. Succession analysis
BRKDCN-2040 85
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Machine Learning
A "Field of study that gives computers the ability to learn
without being explicitly programmed“ Arthur Samuel (1959)
The programmers construction of algorithms that can learn from and make
predictions on data (as opposed to static programming instructions).
7:00 am = 65 degrees
8:00 am = 75 degrees
9:00 am = 85 degrees
How warm will it be at 8:30 am tomorrow?
77.5 degrees
Supervised learning: Linear regression , Logistics regression, SVMs
Unsupervised learning: K-means, PCA, Anomaly detection
BRKDCN-2040 86
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 87BRKDCN-2040
ADM ClusteringMachine Learning Example
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Randomly initialize cluster centroids
Repeat {for = 1 to
:= index (from 1 to ) of cluster centroid closest to
for = 1 to := average (mean) of points assigned to cluster
}
K-means AlgorithmFinding the Clusters
BRKDCN-2040 88
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKDCN-2040 89
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKDCN-2040 90
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKDCN-2040 91
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKDCN-2040 92
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKDCN-2040 93
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKDCN-2040 94
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKDCN-2040 95
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKDCN-2040 96
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKDCN-2040 97
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 98
SilhouettingValidation of the Cluster
https://en.wikipedia.org/wiki/Silhouette_(clustering)
• The silhouette value is a measure of how similar an object is to its own cluster (cohesion) compared to other clusters (separation)
• Produces a higher degree of probability that the clustering is representational
BRKDCN-2040
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 99
Results of the Clustering Machine Learning
BRKDCN-2040
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Tuning Cluster GranularityTuning the Algorithms
1 2 1 1 1
15
BRKDCN-2040 100
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 101BRKDCN-2040
Analyzing the DataFitting the Curve
Data
Prep &
cleansing
Data
Aggregation
Statistical Analysis
&
Prediction Tools
Automated
Data
Discovery&
Evaluation
Reporting,
Visualization
or Alerts
• Every data set (e.g. flow) is examined to find the best function that describes it’s behaviour
• Comparison within and between ‘flows’ can be used to find ‘outlier’ or anomaly conditions
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKDCN-2040 102
Visual Query with Flow Exploration
Replay flow details like a DVR
Information mapped across 25 different dimensions
Thick lines indicate common flows
Faint lines indicate uncommon flows
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
OutliersWhat does ot look like it ‘fits’
• Switch on Outlier view to
highlight uncommon flows
• Outlier dimension is
highlighted with purple
circle
BRKDCN-2040 103
Tetration Application Insight
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Why This Approach Is Different
App Insight derived based on actual communication
Automated grouping of similar endpoints in a cluster
Flexibility of using hardware or software sensors
Keep your App Insight up-to-date based on application evolution
BRKDCN-2040 105
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Dependencies
Why should I understand them?
106BRKDCN-2040
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Why should I understand
them?
What can I do with this information?
107BRKDCN-2040
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Why should I understand dependencies?
Identify a single point of failure that should be replicated
Find all the parts of a service that should be migrated together to the cloud
Replace infrastructure components of an undocumented application
ACI application profiles, end point groups, and contracts based on applications
BRKDCN-2040 108
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Load Balancer Database
App
Application Dependency Mapping
BRKDCN-2040 109
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Understand the communication
Load Balancer Database
App
BRKDCN-2040 110
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Initial recommendations
Load BalancerApp
DatabaseCache
BRKDCN-2040 111
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Optional and minimal human supervision
Load Balancer
App
Database
CacheBRKDCN-2040 112
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Approve the clustering
Load Balancer
App
Database
BRKDCN-2040 113
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKDCN-2040 114
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Enforcement Anywhere
Cisco
Tetration
Analytics™
Cisco ACI™ and Cisco Nexus® 9000 Series
Standalone
Linux and Microsoft Windows
Servers and VM
PublicCloud
Data
Whitelist policyWhitelist policy{
"src_name": "App",
"dst_name": "Web",
"whitelist": [
{"port": [ 0, 0 ],"proto": 1,"action": "ALLOW"},
{"port": [ 80, 80 ],"proto": 6,"action": "ALLOW"},
{"port": [ 443, 443 ],"proto": 6,"action":
"ALLOW"}
]
}
• Cisco ACI EGP/Contract Integration via Cisco ACI Toolkit
• Traditional Network ACL
• Firewall Rules
• Host Firewall Rules
Amazon
Web
Services
Microsoft
Azure
Cloud
BRKDCN-2040 115
Application Centric, Okay but how do I get there?
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Policy Creation Flow
Export Clusters and Policies in JSON/XML format
Import Policy using ACI Toolkit
Automatic creation of EPGs and Contracts
ACI Toolkit
DataNetwork
Policy
Application Policy
Nexus 9K
APIC
Cisco TetrationAnalytics™
BRKDCN-2040 117
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
ACI Toolkit• Simple toolkit built on top of APIC API
• Set of simple python classes
• Python Library
• Used to generate REST API calls
• Runs locally
• Small number of classes
• ~30 currently
• “Intuitive” names
• Not full functionality, most common
• Focused primarily on configuration
• Preserves ACI basic concepts
• Tenants, EPGs, Contracts, etc.
APIC
ACI Toolkit
Linux
Commands
NX-OS
like
CLI
Custom
Python
Scripts
BRKDCN-2040 118
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
• Runs as a command line tool or a REST service. The initial expected usage is as a command line tool.
• https://github.com/datacenter/acitoolkit/tree/master/applications/configpush
• Command line tool is here:
• https://github.com/datacenter/acitoolkit/blob/master/applications/configpush/apic_tool.py
• Takes the JSON provided by Tetration and pushes to the APIC. It requires the APIC credentials and which tenant/app profile to place the EPGs.
• https://acitoolkit.readthedocs.io/en/latest/
Configpush Application
BRKDCN-2040 119
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
• python apic_tool.py -husage: apic_tool.py [-h] [--maxlogfiles MAXLOGFILES]
[--debug [{verbose,warnings,critical}]] [--config CONFIG][-u URL] [-l LOGIN] [-p PASSWORD] [--displayonly][--tenant TENANT] [--app APP]
• optional arguments:-h, --help show this help message and exit--maxlogfiles MAXLOGFILES Maximum number of log files (default is 10)--debug [{verbose,warnings,critical}] Enable debug messages.--config CONFIG Configuration file-u URL, --url URL APIC IP address-l LOGIN, --login LOGIN APIC login ID.-p PASSWORD, --password PASSWORD APIC login password.--displayonly Only display the JSON configuration. Do not
actually push to the APIC.--tenant TENANT Tenant name for the configuration--app APP Application profile name for the configuration
Configpush Application Syntax
BRKDCN-2040 120
Policy Simulation and Compliance
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
We know the expected communication
Load Balancer Database
App
BRKDCN-2040 122
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Publish, export, and enforce Policy
Load BalancerApp Database
172.31.185.158
172.31.185.152
172.31.185.154
172.31.185.156
172.31.185.149
172.31.185.150
172.31.185.151
BRKDCN-2040 123
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Publish, export, and enforce Policy
Load BalancerApp Database
172.31.185.158
172.31.185.152
172.31.185.154
172.31.185.156
172.31.185.149
172.31.185.150
172.31.185.151
Load Balancer
Provides Port 3306
BRKDCN-2040 124
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Publish, export, and enforce Policy
Load BalancerApp Database
172.31.185.158
172.31.185.152
172.31.185.154
172.31.185.156
172.31.185.149
172.31.185.150
172.31.185.151
Database
Provides Port 3306
Load Balancer
Provides Port 3306
BRKDCN-2040 125
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
But how do we map this to real life?
Load BalancerApp Database
172.31.185.152
172.31.185.154
172.31.185.156
172.31.185.149
172.31.185.150
172.31.185.151
BRKDCN-2040 126
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
But how do we map this to real life?
Load BalancerApp Database
172.31.185.158
172.31.185.152
172.31.185.154
172.31.185.156
172.31.185.149
172.31.185.150
172.31.185.151
BRKDCN-2040 127
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
But how do we map this to real life?
Load BalancerApp Database
172.31.185.158
172.31.185.152
172.31.185.154
172.31.185.156
172.31.185.149
172.31.185.150
172.31.185.151
Misdroppedpackets!
BRKDCN-2040 128
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
But how do we map this to real life?
Load BalancerApp Database
172.31.185.158
172.31.185.152
172.31.185.154
172.31.185.156
172.31.185.149
172.31.185.150
172.31.185.151
Escaped out of policy flow!
Misdroppedpackets!
BRKDCN-2040 129
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKDCN-2040 130
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKDCN-2040 131
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKDCN-2040 132
What was seen
on the network
that was out of
Policy
Permitted Traffic
Seen on the
network
Policy Compliance Verification & Simulation
Summary
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
ACI Architecture
Intent (May)
Assurance (Can)Analytics (Did)
Configuration Analysis
“Very Large State-Space”
Traffic Analysis
“Lots of Data”
Guarantees
Compliance
Consistency
ACI
ADM
Security
Forensics
BRKDCN-2040 134
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Summary
Pervasive flow
telemetry that
supports
infrastructure for
multiple data
centers at scale
Ready-to-use
solution to address
critical data center
operational
use cases
Self-monitoring
and eliminate the
need for
in-house big data
expertise
Open platform
and northbound
APIs enable
transparent
integration
VM
Accelerated
adoption and
comprehensive
Solution
support with
Services
BRKDCN-2040 135
Q & A
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Complete Your Online Session Evaluation
Learn online with Cisco Live!
Visit us online after the conference
for full access to session videos and
presentations.
www.CiscoLiveAPAC.com
Give us your feedback and receive a
Cisco 2016 T-Shirt by completing the
Overall Event Survey and 5 Session
Evaluations.– Directly from your mobile device on the Cisco Live
Mobile App
– By visiting the Cisco Live Mobile Site http://showcase.genie-connect.com/ciscolivemelbourne2016/
– Visit any Cisco Live Internet Station located
throughout the venue
T-Shirts can be collected Friday 11 March
at Registration
BRKDCN-2040 137
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public
Continue Your Education
• Demos in the Cisco campus
• Walk-in Self-Paced Labs
• Lunch & Learn
• Meet the Engineer 1:1 meetings
• Related sessions
138BRKDCN-2040
Thank you