control update focus on planetlab integration and booting
DESCRIPTION
Control Update Focus on PlanetLab integration and booting. Fred Kuhns [email protected] Applied Research Laboratory Washington University in St. Louis. Documents. Control documentation http://www.arl.wustl.edu/projects/techX/ppt/ This presentation - PowerPoint PPT PresentationTRANSCRIPT
WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Control UpdateFocus on PlanetLab integration and booting
Fred [email protected]
Applied Research Laboratory
Washington University in St. Louis
2WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/21/23
Documents• Control documentation
http://www.arl.wustl.edu/projects/techX/ppt/– This presentation
• http://www.arl.wustl.edu/projects/techX/ppt/ControlUpdate.ppt
– SRM interface• http://www.arl.wustl.edu/projects/techX/ppt/srm.ppt
– RMP interface• http://www.arl.wustl.edu/projects/techX/ppt/rmp.ppt
– SCD interface (ingress, egress and npe)• http://www.arl.wustl.edu/projects/techX/ppt/scd.ppt
• Datapath documentationhttp://www.arl.wustl.edu/projects/techX/design/SPP/– NAT overview (Interface??)
• http://www.arl.wustl.edu/projects/techX/design/SPP/SPP_V1_NAT_design.ppt
– FlowStats (Interface??)• http://www.arl.wustl.edu/projects/techX/design/SPP/FlowStats_Control.ppt
3WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/21/23
Traditional View of a PlanetLab Node
NodeManager
(“root” VM)
SystemServices
(VMs)
VM1 VMN...
Virtual Machine Monitor (VMM)
NICCPUDRAMDisk
Hardware Platform (General Purpose PC)
Planetlab node:site, owner, model, ssh_host_key, groupsHost = XXX, Domain = YYYIPAddress = A.B.C.D
• Linux OS, vserver• System services
– pl_netflow– sirius: brokerage service– stork: environmental service– CoMon: monitoring and discovery
• Resource model– focused on PCs with single device
instances (CPU, NIC)– standard Linux/UNIX tools to
measure utilization– homogeneous environment with
single vmm to manage all vm instances on a platform
– local node manager interface through loopback interface
• User requests slice on a set of distributed nodes
– assigned VM instance on each node– Fedora Linux environment– per slice flowstats
host.domainA.B.C.D
Internet
4WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/21/23
An SPP Node
SPP/PlanetLab node:site, owner, modelssh_host_key, groupsHost = XXXDomain = YYYIPAddress = A.B.C.D
*NodeManager
*SystemServices
VM1 VMX-1...
Virtual Machine Monitor (VMM)
DRAMDiskHardware Platform (General Purpose PC)
GPE1
*NodeManager
*SystemServices
VMX VMN...
Virtual Machine Monitor (VMM)Hardware Platform (General Purpose PC)
GPE2
CPUCPU NIC NICNIC
datacontrol
DRAMDisk CPUCPU NIC NICNIC
datacontrol
datadataNPE
vm1:fast path1
vm1:fast path2
vmX:fast path1
...
NPEvmX-1:fast path1
vmY:fast path2
vmN:fast path1
...
dataLine Card
External Interface
HUB: 1GbE Control (Base); 10GbE Data (fabric)
FwdDB/Filtersdatapath
CP
*NodeManager
*SystemServices
NAT
spp_host.domainA.B.C.D Internet
5WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/21/23
Challenges• Provide the standard PlanetLab slice environment
– configure and boot individual GPEs with standard planetlab software and supporting the standard operational environment
• Support standard interfaces– boot manager– node managers internal and external interfaces– resource monitoring
• Create interface for allocating and managing fast-paths– allocate/free NPE resources– manage meta-interface mappings to externally visible IP
address and UDP port– slice control of allocated fastpath resources
6WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/21/23
xmlrpcPLCAPIproxy
FlowStats
PCI
NP
U-A
NP
U-B
SPI
TC
AM
ingressSCD
xscale xscale
PCI
SPP Node
Base Ethernet Switch (1Gbps, control)
NPE NPE GPEGPE
interfaces
I2C(IPMI)
CP
Fabric Ethernet Switch (10Gbps, data path)
External Interfaces
vnet
NMP
RMP
user
sliv
ers
Hub
pl_n
etfl
ow
PXE,dhcpd
tftp
httpd
user info/home dirs
/var/www/
boot files
node DB
Resource DB Slice DB nodeconf.xml
Boot Files:dhcpd.confetherstftpboot:
bootcd.imgoverlay_gpeX.imgpxelinux.0pxelinux.cfg
C0A82031C0A82041
overlay.img:plnode.txtplc_configethersspp_conf.txtspp_netinit.pyserver*, certs
LC
NP
U-A
NP
U-B
SPI
TC
AMingress
SCD
NATD
xscale
egressSCD
xscale
10x1G/1x10GRTM
IP1 IP2 IPN...
SLM
sliceDB
flowDBSystem Resource Manager (SRM)
and node manager (GNM)
sshd*
ntpd
ntpntp
ntp
Shelf manager
NPESCD
ntp
7WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/21/23
Software Components• Control Processor (CP):
– Boot and Configuration Control (BCC): Node configuration, management and local state management (DB)• httpd, dhcpd, tftp and PXE server for GPE and NPE boards; maintain config files• Boot CD and distribution file management (overlay images, RPM and tar files) for GPEs and CP• PLCAPI proxy (plc_api) and system level BootManager (part of gnm)
– System Resource Manager (SRM): Centralized resource management• responsible for all resource allocation decisions and maintaining dynamic system state• delegates local operations to individual board-level managers
– System Node Manager (SNM, aka GNM): “top-half” of the PlanetLab node manager– Slice login manager (SLM) and ssh forwarding (modified sshd) -- Ritun– Flow Statistics (FS): aggregates pl_netflow data and translates NAT records– Set default (static) routes in line card– What about dynamic route management (BGP/OSPF/RIP)? For now assume single next hop router for all routes.
• General purpose Processing Element (GPE)– Local Boot Manager (LBM): Modified PlanetLab BootManager running on the GPEs– Resource Manager Proxy (RMP)– Node Manager Proxy (NMP), lower-half of PlanetLab’s node manage
• Network Processor Element (NPE)– Substrate Control Daemon (SCD):
• manages all NPE resources and provides mappings form slice to global name spaces– Kernel module to read/write memory locations (wumod)– Command interpreter for configuring NPU memory (wucmd)
• Line Card, Ingress– Substrate Control Daemon (scd_ingress)
• implements interface to srm• manage tcam access for ingress and egress• reads/writes scratch rings for NATD
– Network Address Translation daemon (NATD), port only• Line Card Egress:
– Substrate Control Daemon (scd_egress)• implements interface to srm• reads/writes scratch rings and communicates with the FS and NATD.
8WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/21/23
Boot and Configuration Control• Read node configuration DB: currently this is an xml file
– Allocate IP subnets and addresses for all boards
– Assign external IP addresses to GPE fabric interfaces with default VLAN id
– Create per GPE configuration DB: currently this is written to files.
• Create dhcp configuration file and start dhcpd, httpd and system sshd– assigns control IP subnets and addresses; assigns internal substrate IP subnet on fabric
Ethernet
• Start PLCAPI proxy (plc_api) server and system node manager– read node DB for initialization data: currently use static configuration data and/or re-read
xml file
– Create GPE overlay images: currently this is done manually
– Currently the SNM is split between the plc_api server and srm due to not having a DB and not wanting to implement transaction-like interface for the snm.
– begin periodic slice updates and gpe assignments, maintain DB
• Start SRM and bring up boards as they “report in”– Initialize Line Card to forward “default” (i.e. ssh and icmp) to CP
– Initialize Hub: base and fabric switches; Initialize any switches not within the chassis
• Start SLM and the ssh daemon– Remove the SLM configuration file for slices, may contain old mappings
9WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/21/23
eth1:0
Booting SPP1: Example Configurationsrm
f1/0
b1
eth0
eth0.2 dnr05.arl.wustl.eduvlan 2
cp_ctrl192.168.32.1/20
cp_data = 171.16.1.1/26
CP
noarp
eth2
/etc/dhcpd.confethers
/tftpboot/ramdisk.gzzImage.ppm10bootcd.imgoverlay_gpe1.imgoverlay_gpe2.imgpxelinux.0pxelinux.cfg/
C0A82031C0A82041
/var/www/html/boot/index.htmlbootmanager.shbootstrapfs-planetlab-i386.tar.bz2
dhcpd
httpd
plc_api
gnm*
fs
f1/0
f1/1
b1
eth0
eth0.2 dnr05.arl.wustl.eduvlan 2
gpe1_ctrl = 192.168.32.65/20
gbe1_data = 171.16.1.3/26
gpe1_int = 172.16.1.65/26
GPE1 (Slot 4)
noarp
eth1
eth2
rmp nm
f1/0
f1/1
b1
eth0
eth0.2 dnr05.arl.wustl.eduvlan 2
gpe2_ctrl = 192.168.32.49/20
gbe2_data = 171.16.1.4/26
gpe2_int = 172.16.1.66/26
GPE2 (Slot 3)
noarp
eth1
eth2
rmp nm
f1/0
b1aeth0lc_b1a = 192.168.32.97/20
lc1_data = 171.16.1.6/26...
Line Card (Slot 6)
scd
scd
b1beth0lc_b1b = 192.168.32.98/20
Ingress XScale
Egress XScale
f1/0
b1aeth0lc_b1a = 192.168.32.81/20
lc1_data = 171.16.1.5/26...
NPE (Slot 5)
scd
scd
b1beth0lc_b1b = 192.168.32.82/20
XScale A
XScale B
Hub
Ebony
128.252.153.31
eth0
eth0:0192.168.32.2
IP Routingproxy arp for drn05
128.252.153.78eth2.2
128.252.153.31
the ARL network
drn05.arl.wustl.edu128.252.153.209
192.
168.
32.1
7
natd
myPLCdrn06.arl.wustl.edu
vlan 2
b2f2/0f2/1
b2f2/0f2/1
10WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/21/23
Example Configuration, SPP3srm
f1/0
b1
eth0
eth0.2 spp3.arl.wustl.eduvlan 2
cp_ctrl192.168.0.1/20
cp_data = 171.16.1.1/26
CP
noarp
eth2
/etc/dhcpd.confethers
/tftpboot/ramdisk.gzzImage.ppm10bootcd.imgoverlay_gpe1.imgoverlay_gpe2.imgpxelinux.0pxelinux.cfg/
C0A82031C0A82041
/var/www/html/boot/index.htmlbootmanager.shbootstrapfs-planetlab-i386.tar.bz2
dhcpd
httpd
plc_api
gnm*
fs
f1/0
f1/1
b1
eth0
eth0.2 spp3.arl.wustl.eduvlan 2
gpe1_ctrl = 192.168.0.49/20
gbe1_data = 171.16.1.3/26
gpe1_int = 172.16.1.65/26
GPE1 (Slot 3)
noarp
eth1
eth2
rmp nm
f1/0
f1/1
b1
eth0
eth0.2 spp3.arl.wustl.eduvlan 2
gpe2_ctrl = 192.168.0.65/20
gbe2_data = 171.16.1.4/26
gpe2_int = 172.16.1.66/26
GPE2 (Slot 4)
noarp
eth1
eth2
rmp nm
f1/0
b1aeth0lc_b1a = 192.168.0.97/20
lc1_data = 171.16.1.6/26...
Line Card (Slot 6)
scd
scd
b1beth0lc_b1b = 192.168.0.98/20
Ingress XScale
Egress XScale
f1/0
b1aeth0lc_b1a = 192.168.0.81/20
lc1_data = 171.16.1.5/26...
NPE (Slot 5)
scd
scd
b1beth0lc_b1b = 192.168.0.82/20
XScale A
XScale B
Hub
cp5.arl.wustl.edu
128.252.153.39
eth0
eth0:0
eth1:0192.168.0.2IP Routing
proxy arp for drn05
128.252.153.34eth2.2
128.252.153.39
the ARL network
spp3.arl.wustl.edu128.252.153.3
192.
168.
0.17
natd
myPLCdrn06.arl.wustl.edu
vlan 2
b2f2/0f2/1
b2f2/0f2/1
11WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/21/23
bootcd file system/
bin/dev/home/lib/...etc/
init.d/pl_boot pl_netinit pl_validateconf pl_sysinit pl_hwinit
......root/selinux/sys/usr/
• pl_boot: modified to not use ssl or pgp to retrieve BootManager script from the cp• pl_netinit: sets boot_server to reference the cp• pl_validateconf: added SPP specific variables
12WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/21/23
overlay image/
etc/{issue, passwd}kargs.txtpl_versionusr/
isolinuxboot/
spp_netinit.py ethers spp_conf.txtboot_server boot_server_port boot_server_path plnode.txt cacert.pem plc_config pubring.gpgbackup/
boot_server boot_server_path boot_server_port cacert.pem pubring.gpg
bootme/BOOTPORT BOOTSERVER BOOTSERVER_IP IDcacert/drn06.arl.wustl.edu/cacert.pem
• Changed to list cp as boot server and port as 81• Added SPP initialization script and config files• Changed plnode.txt to list this GPEs mac address for control interface
13WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/21/23
GPE Configuration file: spp_conf.txt# Config name: spp1.txt[ nserv ] ctrl_ipaddr=192.168.32.1 ctrl_hwaddr=00:1E:C9:FE:76:22 data_ipaddr=172.16.1.1 data_hwaddr=00:1E:C9:FE:76:23[ domain ] hostname=drn05 domain=arl.wustl.edu dns1=128.252.133.45 dns2=128.252.120.1 gateway=128.252.153.31[ hosts ] nserv_f1.0=172.16.1.1 nserv=192.168.32.1 nserv_gbl=192.168.48.1 shmgr=192.168.48.2 hub=192.168.32.17 hub1_f1.0=172.16.1.2 hub1_m.0=192.168.48.17 gpe1_f1.0=172.16.1.3 gpe1_f1.1=172.16.1.65 gpe1_b1.0=192.168.32.65 gpe2_f1.0=172.16.1.4 gpe2_f1.1=172.16.1.66 gpe2_b1.0=192.168.32.49 npe1_f1.0=172.16.1.5 npe1_b1.0=192.168.32.81 npe1_m.0=192.168.48.81 npe1_b1.1=192.168.32.82 lc_f1.0=172.16.1.6 lc_b1.0=192.168.32.97 lc_m.0=192.168.48.97 lc_b1.1=192.168.32.98 drn05.arl.wustl.edu=128.252.153.209
[ iface ] __name__=eth0 dev=eth0 name=gpe1_f1.0 hwaddr=00:0e:0c:85:e4:40 type=data lanid=fabric1 port=0 vlan=0 ipaddr=172.16.1.3 ipnet=172.16.1.0 ipbcast=172.16.1.63 ipmask=255.255.255.192 arp=no enable=yes[ iface ] __name__=eth0.2 dev=eth0.2 name=gpe1_f1.0 hwaddr=00:0e:0c:85:e4:40 vlan=2 type=data lanid=fabric1 port=0 ipaddr=128.252.153.209 ipnet=128.252.0.0 ipbcast=128.252.255.255 ipmask=255.255.0.0 arp=no enable=yes
[ iface ] __name__=eth1 dev=eth1 name=gpe1_f1.1 hwaddr=00:0e:0c:85:e4:42 type=data lanid=fabric1 port=1 vlan=0 ipaddr=172.16.1.65 ipnet=172.16.1.64 ipbcast=172.16.1.127 ipmask=255.255.255.192 arp=no enable=yes[ iface ] __name__=eth2 dev=eth2 name=gpe1_b1.0 hwaddr=00:0e:0c:85:e4:3e type=control lanid=base1 port=0 vlan=0 ipaddr=192.168.32.65 ipnet=192.168.32.0 ipbcast=192.168.39.255 ipmask=255.255.248.0 arp=yes enable=yes
14WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/21/23
ethers# ----------------------------------------------------------------------# Board Type cp, Name cp1, Slot 0# nserv_f1.0 fabric1/000:1E:C9:FE:76:23 172.16.1.1# nserv base1/000:1E:C9:FE:76:22 192.168.32.1# nserv_gbl maint/000:10:18:32:00:76 192.168.48.1# ----------------------------------------------------------------------# Board Type shmgr, Name shmgr1, Slot 0# shmgr maint/000:50:C2:3F:D2:74 192.168.48.2# ----------------------------------------------------------------------# Board Type hub, Name hub1, Slot 1# hub base1/000:00:50:3D:10:6B 192.168.32.17# hub1_f1.0 fabric1/000:00:50:3D:10:B0 172.16.1.2# hub1_m.0 maint/000:00:50:3D:10:6C 192.168.48.17# ----------------------------------------------------------------------# Board Type gpe, Name gpe1, Slot 4# gpe1_f1.0 fabric1/000:0e:0c:85:e4:40 172.16.1.3# gpe1_f1.1 fabric1/100:0e:0c:85:e4:42 172.16.1.65# gpe1_b1.0 base1/000:0e:0c:85:e4:3e 192.168.32.65# ----------------------------------------------------------------------
# ----------------------------------------------------------------------# Board Type gpe, Name gpe2, Slot 3# gpe2_f1.0 fabric1/000:0E:0C:85:E6:08 172.16.1.4# gpe2_f1.1 fabric1/100:0E:0C:85:E6:0A 172.16.1.66# gpe2_b1.0 base1/000:0E:0C:85:E6:06 192.168.32.49# ----------------------------------------------------------------------# Board Type npe, Name npe1, Slot 5# npe1_f1.0 fabric1/000:00:00:00:00:00 172.16.1.5# npe1_b1.0 base1/000:00:50:3d:07:3e 192.168.32.81# npe1_m.0 maint/000:00:50:3D:07:3C 192.168.48.81# npe1_b1.1 base1/100:00:50:3D:07:3D 192.168.32.82# ----------------------------------------------------------------------# Board Type lc, Name lc1, Slot 6# lc_f1.0 fabric1/000:00:50:3d:0b:d4 172.16.1.6# lc_b1.0 base1/000:00:50:3D:08:26 192.168.32.97# lc_m.0 maint/000:00:50:3D:08:24 192.168.48.97# lc_b1.1 base1/100:00:50:3D:08:25 192.168.32.98# ----------------------------------------------------------------------# Gateway for drn05 (128.252.153.209), VLAN 200:00:50:3d:0b:d4 128.252.153.31
15WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/21/23
BootAPI calls made by the BootManager•PLCAPI/BootAPI calls
1. GetSession(node_id, auth, node_ip)returns new session key for node
2. BootCheckAuthentication(Session)returns true if Session id is valid
3. GetNodes(Session, node_id, [‘nodegroup_ids’,‘nodenetwork_ids’,‘model’,‘site_id’])returns the indicated parameters for this node (ie. node_id).
4. GetNodeNetworks(Session, node_id, nodenetwork_ids)returns list of interfaces[ broadcast, network, ip, dns1, dns2, hostname, netmask, gateway, nodenetwork_id, method, mac, node_id, is_primary, type, bwlimit, nodenetwork_settings_ids ]
5. GetNodes(Session, node_id, ‘nodegroup_ids’)returns list of group ids associated with this node
6. GetNodeGroups(Session, nodegroup_id, ‘name’)returns the name string for each node group (in out case ‘SPP’)
7. GetNodeNetworkSettings() 8. BootUpdateNode(Session, boot_state)
Sets node’s boot state at PLC9. BootNotifyOwners(Session, “event”, params)
causes email to be sent to the list of node owners.10.BootUpdateNode(Session, ssh_host_key)
records the latest ssh public key for node.
16WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/21/23
Other PLC/Server interactions• HTTP/HTTPS
– Upload alpina boot logs:BOOT_SERVER_URL += /alpina-logs/upload.php
– Compatibility step (we don’t use)BOOT_SERVER_URL +=/alpina-BootLVM.tar.gzBOOT_SERVER_URL +=/alpina-PartDisk.tar.gz
– Download file system tar file containing basic plab node environmentBOOT_SERVER_URL += /boot/bootstrapfs-”group”-”arch”.tar.bz2
– If not in config file get node idBOOT_SERVER_URL += /boot/getnodeid.php
– Get yum update configuration file:BOOT_SERVER_URL += /PlanetLabConf/yum.conf.php
17WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/21/23
System Initialization: Stage 1• Use PXE boot and download pxelinux and config file:
– boot using basic initial ramdisk, overlay and kernel– Use dhcp, tftp and pxe server on the cp, files stored in the
tfptboot directory.pxelinux.o, pxelinux.cfg/<GPE_IPADDR>bootcd.img, overlay_gpeX.img, kernel
– The overlay image is modified for each GPE to include it’s configuration file, modified planetlab config files and an spp node python script.
• Currently this is a manual step but ultimate (long term) plan is for the gnm daemon to create the individual images
• The overlay image contains several files that identify the node and provide the name and address for the PLC and Boot servers. I have modified these to point o the cp.
• Just before booting the final kernel I change these values to refer to the “real” plc/api servers.
18WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/21/23
System initialization: Stage 2• Boot into basic, intermediate environment• Initial configuration information obtained from the overlay
image– Includes spp_conf.txt defines gpe interfaces– Includes ethers file contains mac addresses for static arp entries– Updated plnode.txt with GPE’s control interface mac address– Modified bootserver files listing the cp as the bootserver– Includes spp_netinit.py, a python script to configure the interfaces
and update system configuration files.• Enables “primary” interface and key network configuration
files such as resolv.conf • Downloads BootManager source from the “boot_server”
– In our case we download from the CP– I explicitly disable the use of ssl and certs (the certifictes on the
overlay image are for the PLC server and not the CP)– Our assumption is that the control (base) network is “secure” plus
within an SPP node we don’t have to worry about authentication issues.
19WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/21/23
BootManager• Opens connection to PLCAPI on bootserver
– Opens connection to our proxy plcapi/bootapi server running on the CP
• Get node session key: GetSession(node_id, auth, node_ip)– Since each call to create a session invalidates any existing keys we intercept this
call on the cp and use a common session key for all gpes.
• Determines node’s configuration– reads plnode.txt for node_id, node_key and the primary interface settings
• we use DHCP to configure the control interface but I do not define a dns server
– if node_id is not found then reads URL=BootServer/boot/getnodeid.php
• Call BootCheckAuthentication(Session) to verify session key• Calls GetNodes to get the boot_state, node_groups, model, site_id• Calls GetNodeNetworks to get configuration information for all interfaces
– in our case the call would return the externally visible network parameters, which differ from how each GPE is configured
– long term, we can intercept this call and return GPE specific interface config info.– Short term we use a configuration file in the overlay image with similarly
formatted information. I have replaced the BootManager code that reads the config info and configures the interfaces.
– I had to add support for VLANs and our internal interfaces.
20WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/21/23
BootManager Continued• Download the nodes final filesystem image from the boot_server
– in our case this is the CP, http://CP/boot/bootstrap-planetlab-i386-tar.bz2• Download yum config file
– I am not currently downloading, http://CP/PlanetLabConf/yum.conf• Call BootUpdateNode with new boot_state
– we will need to intercept this call and both report and set node state based on all GPEs.
• Call BootNotifyOwners with new state– forward to PLC
• Update network configuration in new “sysimg”– downloads //BootServer/ PlanetLabConf/plc_config file
• In our case I have copied onto the overlay image in the /usr/boot directory.– calls GetNodeNetworkSettings for a list of any additional interface attributes
then creates various configuration files: hosts, resolv.conf, network, ifcfg-eth*• I have replaced this step with our own script spp_netinit.py and configuration file
spp_conf.txt which I use to create the same config files in both the current environment and the new sysimg.
– updates devices and creates the initrd image used for the next stage– finally boots a new kernel using the bootstrap file system
21WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/21/23
Boot States• The list of boot states
is changing as I write this
• In our version of the plc the states are shown on the right
State Next state Description
newinstall verified -> rins
error->dbgnew instal: verify install
with user.
instinstall verified -> rins
error->dbgInstall: same as new
rinssuccess->boot
error->dbg
reinstall: reformat disk and reinstall all software
and files.
bootboot
error -> dbgboot: boot using existing
partitions
dbgSuccess: same as boot
Fail: bootcd imagedebug: boot node
diag user controlleddiagnostics: bootcd
image
disable user controlled disable: bootcd image
22WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/21/23
PLC Database • The PlanetLab central database keeps a database describing
all nodes, slices and users/people.• Slice data base keeps track of all slices and their node
bindings• The Node database includes externally visible properties and
the ability to associate general attributes with these properties– the current (or next) node state (boot_state)– node identifier (node_id)– list of interface configuration parameters
• ip address information, mac address, generic list of attributes– node’s owner– node’s site identifier (site_id)– model, can be used to specify a set of attributes forthe node. For
example: minhw, smp– current ssh host key (ssh_host_key)– node groups: I believe this is being depricated in favor of associate a
generic set of attributes with a node or its interfaces.
23WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/21/23
SPP Specific Information• On an SPP node the resource manager needs to know what kind of board is
inserted in each slot and its I/O characteristics• Needs to associate interface MAC addresses with boards and interfaces. Or
with standalone system connected to an RTM or front panel (for example the CP).
• Also need to know which interfaces are connected to the base and which to the fabric switch when bringing up general purpose systems.
• There is not a convenient mechanism for determining this at run time so I have a configuration file.
• Also need to know what resources are available on each board and allocation policies.
• Must also have a list of external links, their addresses and the address of any peers (Ethernet).
• Need to keep track of current nodes state (as kept by PLC) as well as the state of each individual board.
• Need to share state between different daemons
24WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/21/23
Node Configuration File<?xml version="1.0" encoding="utf-8" standalone="yes"?><spp>
<code_options><IPv4 sram="fixed" queues="variable" id="0" fltrs="variable"> <sram> 1024 </sram> </IPv4><I3 sram="fixed" queues="variable" id="1" fltrs="variable"> <sram> 1024 </sram> </I3> </code_options>
<components><cp name="cp1" slot="0" cat="host" alias="nserv">
<interface name="nserv_f1.0" dev="GigE" lanid="fabric1" assoc="" port="0"> ... </interface> ... </cp><shmgr name="shmgr1" slot="0" cat="atca" alias="shmgr1">
<interface name="shmgr" dev="GigE" lanid="maint" assoc="" port="0"> ... </interface> ... </shmgr><hub name="hub1" slot="1" cat="atca" alias="hub1">
<switch lanid="base1"> </switch> <switch lanid="fabric1"> <bw> 10000000000 </bw> </switch><interface name="hub" dev="GigE" lanid="base1" assoc="" port="0"> ... </interface> ... </hub>
<gpe name="gpe1" slot="4" cat="atca" alias="gpe1"><interface name="gpe1_f1.0" dev="GigE" lanid="fabric1" assoc="" port="0"> ... </interface> ... </gpe>
<npe name="npe1" slot="5" cat="atca" alias="npe1"><product> Radisys_7010 </product> <model> NPEv1 </model><interface name="npe1_f1.0" dev="10GigE" lanid="fabric1" assoc="" port="0"> ... </interface> ... </npe>
<lc name="lc1" slot="6" cat="atca" alias="lc"><product> Radisys_7010 </product><model> LCv1 </model><interface name="lc_f1.0" dev="10GigE" lanid="fabric1" assoc="" port="0"> ... </interface> ... <interface name="drn05" dev="GigE" lanid="external" port="0"> ...
<link peering="true" primary="true" dev="GigE"> ... </link> ... </interface></lc></components>
</spp>
25WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/21/23
CP Record<!-- Interface parameters defined by user in original “xml” file --><cp name="cp1" slot="0" cat="host" alias="nserv">
<interface name="nserv_f1.0" dev="GigE" lanid="fabric1" assoc="" port="0"><!-- All internal IP addrs assigned by configuration software based on runtime parameters --><ipaddr>172.16.1.1</ipaddr> <ipnet>172.16.1.0</ipnet> <ipmask>255.255.255.192</ipmask> <ipbcast>172.16.1.63</ipbcast><!-- Device parameters and comment set by user in the original “xml” file --><device> eth0 </device> <hwaddr> 00:1E:C9:FE:76:23 </hwaddr><desc> Interface connected to HUB's fabric port </desc>
</interface><interface name="nserv" dev="GigE" lanid="base1" assoc="" port="0">
<ipaddr>192.168.32.1</ipaddr> <ipnet>192.168.32.0</ipnet><ipmask>255.255.248.0</ipmask> <ipbcast>192.168.39.255</ipbcast><device> eth1 </device> <hwaddr> 00:1E:C9:FE:76:22 </hwaddr><desc> System control processor's Base Ethernet connection </desc>
</interface><interface name="nserv_gbl" dev="GigE" lanid="maint" assoc="" port="0">
<ipaddr>192.168.48.1</ipaddr> <ipnet>192.168.48.0</ipnet><ipmask>255.255.248.0</ipmask> <ipbcast>192.168.55.255</ipbcast><device> eth2 </device> <hwaddr> 00:10:18:32:00:76 </hwaddr><desc> Connection to the maintenance ports </desc>
</interface></cp>
26WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/21/23
GPE Record<gpe name="gpe1" slot="4" cat="atca" alias="gpe1">
<interface name="gpe1_f1.0" dev="GigE" lanid="fabric1" assoc="" port="0">-- IP Address Info --<device> eth0 </device> <hwaddr> 00:0e:0c:85:e4:40 </hwaddr> (Device Data)<bw> 1000000000 </bw><share> 2 </share> (Resource Policy)<desc> MAC=N+2, Fabric 1/0 or AMC Port 0 </desc></interface>
<interface name="gpe1_f1.1" dev="GigE" lanid="fabric1" assoc="" port="1">-- IP Address Info -- <device> eth1 </device> <hwaddr> 00:0e:0c:85:e4:42 </hwaddr><desc> MAC=N+4, Fabric 1/1 or Maintenance Port 1 </desc></interface>
<interface name="gpe1_b1.0" dev="GigE" lanid="base1" assoc="" port="0">-- IP Address Info -- <device> eth2 </device> <hwaddr> 00:0e:0c:85:e4:3e </hwaddr><desc> MAC=N, Base connection to Primary HUB </desc></interface>
<interface name="gpe1_b2.0" dev="GigE" lanid="base2" assoc="" port="0">-- IP Address Info -- <device> eth3 </device> <hwaddr> 00:0e:0c:85:e4:3f </hwaddr><desc> MAC=N+1, Base connection to alternate HUB </desc></interface>
<interface name="gpe1_f2.0" dev="GigE" lanid="fabric2" assoc="" port="0">-- IP Address Info -- <device> eth4 </device> <hwaddr> 00:0e:0c:85:e4:41 </hwaddr><desc> MAC=N+3, Fabric 2/0 or AMC Port 1 </desc></interface>
<interface name="gpe1_f2.1" dev="GigE" lanid="fabric2" assoc="" port="1">-- IP Address Info -- <device> eth5 </device> <hwaddr> 00:0e:0c:85:e4:43 </hwaddr><desc> MAC=N+5, Fabric 2/1 or Maintenance Port 2 </desc></interface>
</gpe>
27WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/21/23
NPE Record<npe name="npe1" slot="5" cat="atca" alias="npe1">
<product> Radisys_7010 </product> <model> NPEv1 </model><interface name="npe1_f1.0" dev="10GigE" lanid="fabric1" assoc="" port="0">
-- IP Address Info ---- Device Data ---- Resource Policy -- <desc> Fabric interface used for both NPUs </desc></interface>
<interface name="npe1_b1.0" dev="GigE" lanid="base1" assoc="npua" port="0">-- IP Address Info ---- Device Data -- <desc> Primary control interface associated with NPUA </desc></interface>
<interface name="npe1_m.0" dev="GigE" lanid="maint" assoc="npua" port="0">-- IP Address Info ---- Device Data --<desc> NPUA Front Maintenance Port </desc></interface>
<interface name="npe1_b1.1" dev="GigE" lanid="base1" assoc="npub" port="1">-- IP Address Info ---- Device Data -- <desc> NPUB Front Maintenance Port -- But it's been patched to the Base switch </desc>
</interface></npe>
28WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/21/23
LC Record<lc name="lc1" slot="6" cat="atca" alias="lc">
<product> Radisys_7010 </product> <model> LCv1 </model> (Model Data)<interface name="lc_f1.0" dev="10GigE" lanid="fabric1" assoc="" port="0">
-- IP Address Info -- -- Device Data -- -- Resource Policy -- </interface><interface name="lc_b1.0" dev="GigE" lanid="base1" assoc="npua" port="0">
-- IP Address Info -- -- Device Data -- </interface><interface name="lc_m.0" dev="GigE" lanid="maint" assoc="npua" port="0">
-- IP Address Info -- -- Device Data -- </interface><interface name="lc_b1.1" dev="GigE" lanid="base1" assoc="npub" port="1">
-- IP Address Info -- -- Device Data --</interface><interface name="drn05" dev="GigE" lanid="external" port="0">
<hwaddr> 00:00:50:29:b1:46 </hwaddr><link peering="true" primary="true" dev="GigE">
-- Link IP Address Info -- -- Device Data -- -- Resource Policy --<domain> arl.wustl.edu </domain> <hostname> drn05 </hostname><dns1> 128.252.133.45 </dns1> <dns2> 128.252.120.1 </dns2><peerIP> 128.252.153.31 </peerIP> <peerMAC> 00:0F:B5:FB:D8:67 </peerMAC><vlan> 2 </vlan><port_pool> <!-- used for NAT -->
<udp count="500" start="30000"> </udp><tcp count="500" start="30000"> </tcp> </port_pool>
<desc> p2p link from drn05 to drn06, the plc </desc> </link></interface></lc>
29WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/21/23
SRM InterfaceNATD to SRM:
[egress_map, ingress_map]get_sched_map(LinkIP, BoardMAC)
Depricated: original natd interface!{fid, port} alloc_epmap(map)status free_epmap(fid)
FS to SRM:?? (map vlan to slice id)
RMP to SRM: Interfaces (Line Card Links):
if_list get_interfaces(plabID)ifn get_ifn(plabID, ipaddr)if_entry get_ifattrs(plabID, ifn) : ipaddr get_ifpeer(plabID, ifn) : retcode resrv_fpath_ifbw(bw, ifn)retcode reles_fpath_ifbw(bw, ifn)To be implemented:retcode resrv_slice_ifbw(plabID, bw, ifn)retcode reles_slice_ifbw(plabID, bw, ifn)
EndPoints (local IP and Port number):NATD changes may have broken theseep alloc_endpoint(PlabID, ep)status free_endpoint(PlabID, ipaddr,
port, proto)
Fast Path:fp_params alloc_fastpath(PlabID,
copt, bwspec,rcnts, mem) status free_fastpath()
Fast-Path Meta-Interfaces:[mi, ep] alloc_udp_tunnel(bw, ipaddr, port)ep get_endpoint(mi)status free_udp_tunnel(ipaddr, port)
30WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/21/23
RMP InterfacePrototype completed:
1. result noop()2. version get_version()3. result add_slice(plabID, len, name)4. result rem_slice(plabID)5. ret_t alloc_fastpath(copt, bw, rcnts, mem)6. void free_fastpath()7. if_list get_interfaces()8. ifn get_ifn(ipaddr)9. if_entry get_ifattrs(ifn)10. ipaddr get_ifpeer(ifn)11. retcode alloc_pl_ifbw(ifn, bw)12. retcode reles_pl_ifbw(ifn, bw)13. retcode alloc_fpath_ifbw(fpid, ifn, bw)14. retcode reles_fpath_ifbw(fpid, ifn, bw)15. retcode bind_queue(fpid, miid, list_type, qids)16. actual_bw set_queue_params(fpid, qid,
threshold, bw)17. [threshold, bw] get_queue_params(fpid, qid)18. [u32 Pkts, u32 Bytes] get_queue_len(fpid, qid)
To do:19. ep alloc_endpoint(ep)20. status free_endpoint(ipaddr, port, proto)21. -- alloc_tunnel --22. -- free_tunnel --23. [mi, ep] alloc_udp_tunnel(fpid, bw, ip, port)24. status free_udp_tunnel(ipaddr, port)25. ep get_endpoint(fpid, mi)26. retcode write_fltr(fpid, fid, fltr)27. retcode update_result(fpid, fid, result)28. fltr_t get_fltr_bykey(fpid, key)29. fltr_t get_fltr_byfid(fpid, fid)30. result lookup_fltr(fpid, key)31. retcode rem_fltr_bykey(fpid, key)32. retcode rem_fltr_byfid(fpid, fid)33. stats_t read_stats(fpid, sindx, flags)34. result clear_stats(sindx)35. handle create_periodic(fp,indx,P,cnt,flags)36. retcode delete_periodic(fpid, handle)37. retcode set_callback(fpid, handle, xport)38. stats_t get_periodic(fpid, handle)39. retcode mem_write(fpid, offset[, len], data)40. data mem_read(fpid, offset, len)
31WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/21/23
NPE SCD InterfaceSRM to SCDstatus set_fastpath(fpid, copt, VLAN, params, mem)status enable_fastpath(fpid)status disable_fastpath(fpid)status rem_fastpath(fpid)status set_sched_params(sid, ifn, BWmax, BWmin)status set_encap_cb(sid, srcIP, dMAC)status set_fpmi_bw(fpid, sid, miid, bw)status start_mes()status stop_mes()status set_encap_gpe(fpid, gpeIP, npeIP)result write_mem(kpa, len, data)data read_mem(kpa, len)
SRM & RMP to SCDret_t write_fltr(dbid, fid, key, mask, result)ret_t update_result(dbid, fid, result)fltr get_fltr_bykey(dbid, key)fltr get_fltr_byfid(dbid, fid)result lookup_fltr(dbid, key)retcode rem_fltr_bykey(dbid, key);retcode rem_fltr_byfid(dbid, fid)
RMP to SCDstatus set_gpe_info(exPort, ldPort,
exQID, ldQID)u32 result bind_queue(u16 miid,
u8 list_type, u16[] qid_list)
u32 bw set_queue_params(u16 qid, u32 threshold, u32 bw)
{u32 threshold, u32 bw} get_queue_params(u16 qid){u32 pktCnt, u32 byteCnt}
get_queue_len(u16 qid)result write_sram(offset, len, data)data read_sram(offset, len)stats = read_stats(sindx, flags)result = clear_stats(sindx)handle
create_periodic(sindx, P, cnt, flags)retcode del_periodic(handle)retcode set_callback(handle, udp_port)stats = get_periodic(handle)
32WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/21/23
LC SCD Interface
SRM to SCDstatus set_sched_params(sid, ifn, BWmax, BWmin)status set_sched_mac(sid, MACdst, MACsrc)u32 result set_queue_sched(u16 qid, u16 sid)result write_mem(kpa, len, data)data read_mem(kpa, len)
SRM and RMP to SCD:ret_t write_fltr(dbid, fid, key, mask, result)ret_t update_result(dbid, fid, result)fltr get_fltr_bykey(dbid, key)fltr get_fltr_byfid(dbid, fid)result lookup_fltr(dbid, key)retcode rem_fltr_bykey(dbid, key);retcode rem_fltr_byfid(dbid, fid)
RMP to SCDu32 actual_bw set_queue_params(u16 qid,
u32 threshold, u32 bw){u32 threshold, u32 bw}
get_queue_params(u16 qid){u32 pktCnt, u32 byteCnt}
get_queue_len(u16 qid)stats = read_stats(sindx, flags)result = clear_stats(sindx)handle create_periodic(sindx, P, cnt, flags)retcode del_periodic(handle)retcode set_callback(handle, udp_port)stats = get_periodic(handle)
33WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/21/23
Slice Example• Get list of interfaces, their Ip addresses and available bandwidth
if_list = {if_entry, ...}if_entry = {u16 ifn, // logical interface number u16 type, // peering or multi-access
u32 ipaddr, // interface’s IP addressu32 linkBW, // Link’s native BWu32 availBW} // BW available for allocation
struct epoint_t {u32 bw,u32 ipaddr; // interface’s IP address
u16 port, // UDP port number for meta-interfaceu32 bw;} // total BW required for meta-interface
iflist = get_interfaces(iflist); // return list of all available interfaces• Estimate the computational complexity and memory bandwidth requirements on NPE.
bwSpec = {BWmax=totalBW, BWmin=0}; // fast path total BW requirement• max general NPE resource counts for this example I just assume a max number but in
general it may be that a user scales it by the number of meta-interfaces they will use.fpCounts = {FLTR_CNT, QID_CNT, BUFF_CNT, STATS_CNT};
• Request substrate to allocate a fastpath instance for the IPv4 code option, assume we will use the default sram buffer sizes. Will also need to listen to returned sockes.
[fpid, sockets] = alloc_fastpath(ipv4_copt, bwSpec, fpCnts, {IPV4_SRAM_SZ, 0});
34WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/21/23
Slice Example - Continued• allocate one meta-interfaces for each external interface and assign
our default UDP port number and BW requirementstruct mi_t {uint_t mi; epoint_t rp;}; mi_t milist[iflist.len()];for (indx = 0, mi = 0; indx < len(iflist); ++indx) {
if (miBW > iflist[indx].availBW) throw Error; // allocate total BW required on this interfaceif (alloc_fpath_ifbw(fpid, iflist[indx].ifn, miBW)==-1)
throw Error;// Allocate one meta-interface on this interfacemilist[indx] = alloc_udp_tunnel(fpid, miBW, iflist[indx].ipaddr, myPort)my_bind_queues(milist+indx);my_add_routes(milist+indx);
}
35WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/21/23
Test SPP Node
srm
f1/0
b1
eth0
eth0.2vlan 2
cp_ctrl = 192.168.64.1/20
cp_data = 171.16.1.1/26
CP
noarp
eth2
/etc/dhcpd.confethershosts
dhcpd
f1/1*
f2/1*
b2
eth0
eth0.2vlan 2
gpe1_ctrl = 192.168.64.33/20
gbe1_data = 171.16.1.2/26
gpe1_int = 172.16.1.66/26
GPE1 (Slot 2)
noarp
eth1
eth2
f1/0
f1/1
b1
eth0
eth0.2 keystone.arl.wustl.eduvlan 2
gpe2_ctrl = 192.168.64.49/20
gbe2_data = 171.16.1.3/26
gpe2_int = 172.16.1.67/26
GPE2 (Slot 3)
noarp
eth1
eth2
f1/0
b1aeth0lc_b1a = 192.168.64.97/20
lc1_data = 171.16.1.6/26...
Line Card (Slot 6)
scd
scd
b1beth0lc_b1b = 192.168.64.98/20
Ingress XScale
Egress XScale
“Router”
128.252.153.XXX
eth0
eth0:0
eth1192.168.64.2
IP Routingproxy arp for keystone
128.252.153.YYYeth2.2
128.252.153.XXX
the ARL network128.252.153.*
keystone.arl.wustl.edu128.252.153.81
natd
vlan 2
f1/0
f1/1
b1
eth0
eth0.2
gpe2_ctrl = 192.168.64.65/20
gbe2_data = 171.16.1.4/26
gpe2_int = 172.16.1.68/26
GPE3 (Slot 4)
noarp
eth1
eth2
keystone.arl.wustl.edu
keystone.arl.wustl.edu
keystone.arl.wustl.edu
f1/0
f1/1
b1
eth0
eth0.2
gpe2_ctrl = 192.168.64.81/20
gbe2_data = 171.16.1.5/26
gpe2_int = 172.16.1.69/26
GPE4 (Slot 5)
noarp
eth1
eth2
keystone.arl.wustl.edu
/tftpboot/ramdisk.gzzImage.ppm10
IssueMounting /opt/crossbuild/* from ebony. Could export dirs form the “Router” host. Or could use ebony rather than “Router”. In that case will need an external switch connecting line cards of spp? to ebony’s eth2.2.
/etc/{ethers,hosts}/etc/sysconfig/network-scripts/ifcfg-eth*
/etc/{ethers,hosts}/etc/sysconfig/network-scripts/ifcfg-eth*
/etc/{ethers,hosts}/etc/sysconfig/network-scripts/ifcfg-eth*
/etc/{ethers,hosts}/etc/sysconfig/network-scripts/ifcfg-eth*
vlan 2
vlan 2
Hub
192.
168.
64.1
7
0/30/4
0/5
0/6
RTM3/2
RTM3/1
FP1/6
FP1/9
0/30/4
0/5
0/6
FP1/7
2/1
36WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/21/23
Test Bed Use• Core platform issues:
– Can we use the second fabric port on the GPE boards?– The hub does not display stats or mac fwd entries for the slots with GPEs. It
used to work.– The radisys shelf manager
• does not reliably reset boards• Base1 interface disabled on slot 2
• NAT/Line Card testing– Overall reliability– Add support for aging– Specific issues (jdd)
• restarting line card (without reboot) occasionally results in data-path thinking the scratch ring to the xscale is full.
• looping iperf test from cp occasionally stalls with no packets getting through LC• Lookup needs fix to not use DONE bit to indicate a tcam lookup is done.
• GPE/Intel board testing