when devops and networking intersect by brent salisbury of socketplane.io
DESCRIPTION
When DevOps and Networking Intersect by Brent Salisbury of socketplane.ioTRANSCRIPT
when network and devops intersect
Brent Salisbury socketplane.io
socketplane.io - docker networking
John Willis Co-Founder & VP Business Development Formerly: Formerly CTO Stateless Networks
Madhu Venugopal Co-Founder & President Formerly: Principal Engineer Office of the CTO, Red Hat
Brent Salisbury Co-Founder & VP Engineering Formerly: Senior Engineer Office of the CTO, Red Hat
Dave Tucker Co-Founder, VP Product Formerly: Senior Engineer Office of the CTO, Red Hat
lessons_learned struct
1. the evolving network!2. lessons learned from controller development!3. netops from an operational+dev view!4. looking ahead
the problem
Cos
tNetwork
Compute - Storage
VerticalIntegration
Horizontal Scale
Number Widgets - Economies of Scale
Network Capacity
Needs
Over Provisioned
Netw
ork
Usag
e G
rowt
h
Time
Under Provisioned
Network Capacity
Needs
Net
wor
k U
sage
Gro
wth
Time
Efficient Provisioning
Where we were
• CLI for everything • vendor management tools did everything and nothing. • used to be Perl, TCL and later Python • zero ip management !• turned into a who can make the best obscure magic !
Where we are
• CLI for everything • vendor management tools did everything and nothing. • used to be Perl, TCL and later Python • zero ip management !
• turned into a who can make the best obscure magic !
where we are(ish)
• exponential growth with flat operating budgets!• incessant pressure for uptime + capex/opex cost
reduction!• the majority of networks still maintain proprietary hw,
sw and api!• datapaths are still barely programmable !• netops manages very little beyond the ToR.
quick review of node distribution
• distributed!• centralized!• de-centralized
Centralized
Centralized
Forwarding Population
Controller
Match + Action
the sdn approach
Decentralized
Decentralized
Topology
Forwarding Population + Clustered Controller
Orchestration
Match + Action
the sdn approach
similarly both hard problemsRouting Engine
Line Card 1
P...P1 P2
MAC Source Addres
s
MAC Destinati
on
IP Source Address
IP Destinati
on
Source
Port
Destination Port Instructions
Ingress
Port
Priority
* * * * * *
GOTO/Drop/
Controller/Normal
*.0
Protocol
*
Data Plane
Line Card 2
P...P1 P2
MAC Source Addres
s
MAC Destinati
on
IP Source Address
IP Destinati
on
Source
Port
Destination Port Instructions
Ingress
Port
Priority
* * * * * *
GOTO/Drop/
Controller/Normal
*.0
Protocol
*
Data Plane
Line Card ...
P...P1 P2
MAC Source Addres
s
MAC Destinati
on
IP Source Address
IP Destinati
on
Source
Port
Destination Port Instructions
Ingress
Port
Priority
* * * * * *
GOTO/Drop/
Controller/Normal
*.0
Protocol
*
Data Plane
Controller
OVS
P...P1 P2
MAC Source Addres
s
MAC Destinati
on
IP Source Address
IP Destinati
on
Source
Port
Destination Port Instructions
Ingress
Port
Priority
* * * * * *
GOTO/Drop/
Controller/Normal
*.0
Protocol
*
Data Plane
OF Switch
P...P1 P2
MAC Source Addres
s
MAC Destinati
on
IP Source Address
IP Destinati
on
Source
Port
Destination Port Instructions
Ingress
Port
Priority
* * * * * *
GOTO/Drop/
Controller/Normal
*.0
Protocol
*
Data Plane
Random Agent
P...P1 P2
MAC Source Addres
s
MAC Destinati
on
IP Source Address
IP Destinati
on
Source
Port
Destination Port Instructions
Ingress
Port
Priority
* * * * * *
GOTO/Drop/
Controller/Normal
*.0
Protocol
*
Data Plane
Bus EthernetFabric
Distributed
Distributedthe internets scales
Host 1
L2 Flooding and Learning
Host 2
Data PlaneData PlaneFlooding Flooding
VLAN xVLAN x
!• Live workload migration cripples network ops!• subnets for policy groupings are the only reason to think
in those terms anymore
the barrier to scale
shit that doesn't scale• the next few slides are
things i thought were possible at some point around the problem of L2!!
• lesson learned prototype and fail faster!!
• ask your team why they really need L2
Host 1
OpenFlow Controller
Proactive L2 Flooding and Learning with Legacy VLANs
Host 2
Data PlaneData Plane
Maintaining Legacy Broadcast Domains Controller Never Punts ARP
Flooding FloodingVLAN x
VLAN x
Proactive Rule - Match: ARP Action: Normal
Can Also Serve as a Fallback Failure Mode or Hybrid Mirgration Strategy
OpenFlow Switch
Data PlaneP3P1 P2
Svr 2 Svr 3Svr 1
OpenFlow Controller
MAC Source Addres
s
MAC Destinati
on
IP Source Address
IP Destinati
on
Source
Port
Destination Port Instructions
Ingress
Port
Priority
* * * * * *
GOTO/Drop/
Controller/Normal
*.0
Protocol
*
Packet-In A Flowmod Installs a Flow Rule for Subsequent Matching Packets
Reactive OpenFlow Flow Policy1st Packet in Flow
Controller Intercepting ARP and Proxy the Reply
Host 2
Data PlaneData Plane
VLAN ID Constraints Becomes IrrelevantTenancy Maintained in the Controller
Switch 1 Switch 2
Host 1
Match: ARP Action: Controller Match: ARP Action: Controller
ARP Requestand Reply
Controllers can Answers and/or Sends ARP (proxy)
OpenFlow Controller
Host 2 IP, MAC,Tenant ==> Tunnel 200 Tep IP Host (Key) Location (Value)
Controller Connect Source and Destination Hosts via Packet-In and Flowmods
Host 2
Data PlaneData PlaneSwitch 1 Switch 2
Host 1
Match: ARP Action: Controller Match: ARP Action: Controller
ARP Request
Data Path (Tunnel, or Flow Path
FlowmodBuilding Data Path
OpenFlow Controller
Host 2 IP, MAC,Tenant ==> Tunnel 200 Tep IP Host (Key) Location (Value)
FlowmodBuilding Data Path
VLAN ID Constraints Becomes IrrelevantTenancy Maintained in the Controller
not if but when
!• build infrastructure for the worst
case scenario, because it will be worse.!
• cascading failure suck!• focus on solving the problem
not the implementation!• intelligence in the datapath HW
is a good thing as long ideally if coupled with open and programmatically manageable
P3P1 P2
DPID DPIDDPID
Control Plane
Control and Data Plane Split Brain
?
?
???
Data Plane - DPID ::00:01
X
Linux Bridging
BridgeFrame In
IPTables
Frame Egress
HAProxy Functions X,Y, Z
this movie has a shitty ending
What Works: Performance and Reliability First
OVS/DPDK Packet Forwarding Pipeline
Classifier
Table 0Frame In
FunctionFoo
Table 2
FunctionBar
Frame Out…….. Table n
Stages
Data CenterL3 Core
Data CenterL3 Core
PhysicalSwitch vSwitch Physical
Switch
vSwitch PhysicalSwitch vSwitch
Firewall
North/South Security Policy
Data Center Today
traffic alignment from the 90’s
Data CenterL3 Core
Data CenterL3 Core
PhysicalSwitch vSwitch Physical
Switch
vSwitch PhysicalSwitch vSwitch
East West Security Policy
Distributed Policy Application For Data Center
new architectures for new workloads
trust what you know• rely your own operational experiences, if you don't have any go
get some even if its stalking customers!• don't fall in love with implementations, they are probably wrong!• ask questions but be open minded!• avoid slide jockeys!• avoid the vendor wars!• avoid cults!• complexity w/o abstraction fails!• almost all abstractions fail
serenity now, insanity later
• make time for research and planning!!• wether it is a big infra project or an dev sprint, don't
let the oppressive demand of execution compromise a practical design!!
• that said, if the plan sucks, change it.
nothing is easy, don't make it harder
• prototyping and early feedback should be your compass
• when users says, this seems a little too complex, LISTEN!
• odds are you aren't going to be able to get the right abstraction to hide your over-engineering
performance and reliability first
• network operators are measured in uptime first • don't compromise reliability for cost savings without
making it very clear to all leadership, not just the IT manager heroes.
• perform consistency checking
/dev• understand the problem first!!
• if you don't understand the problem stalk someone who does!!
• make readable code!!
• code for the worst case scenario
architecture• if it isn't broke, don't break it • architects need understandable components • architects need predictable components • predictive analysis is a big data problem • predict problems with operational tools and data • don't build a nuclear submarine when a bicycle will do
test and prototype!
• verify before you hit enter!• automate all production changes!• setup rollback processes!!
• the result:!• should be shorter change windows!• faster rollbacks!• better trained operators
everybody is smart
• "A great team doesn’t mean that they had the smartest people. What made those teams great is that everyone trusted one another. It can be a powerful thing when that magic dynamic exists." -Gene Kim
team culture
• not proving how much smarter you are then your co-workers.
• give credit to the team first, its just weird otherwise
• don't hoard contacts • find peoples passion and
maximize it • protect your cultures morale like it
is your bank account
• starting out!• no one can learn for you, find your passion!• learn linux!• explore vswitches, I recommend http://openvswitch.org!• connect with peers in the community and share experiences
where to start?
• explore compute (containers, hypervisors and everything else beyond the top of rack!!
• further along!• code, i recommend Golang atm fwiw!• learn CI tools and sw dev processes!• contributes to upstream open source!• build something that solves others
problems and open source it