server base manageability guide for sbsa compliant arm
TRANSCRIPT
Server Base Manageability Guide for SBSA Compliant Arm (AARCH64) Servers
Supreeth Venkatesh, Staff System ArchitectArm
Open System Firmware
Rationale• Arm Ecosystem Partners value standardized common server manageability
capabilities with scope for flexible customizations which add value to the end user.
• Standardization is key to ensure that Arm Ecosystem does not get fragmented by point solutions that plague the industry today.
• Leverage industry standard system management specifications including but not limited to Redfish, Platform Level Data Model (PLDM), Management Component Transport Protocol (MCTP) as defined by the Distributed Management Task Force (DMTF).
• Leverage Hardware Management Specifications and designs as defined by Open Compute Group (OCP).
History
• Server Base System Architecture (SBSA)-Hardware system architecture for servers based on 64-bit ARM processors.-Standardizes processor element features and key aspects of system architecture.-http://infocenter.arm.com/help/topic/com.arm.doc.den0029b/Server_Base_System_Architecture_v5_0_ARM_DEN_0029B.pdf
• Server Base Boot Requirements (SBBR)-Defines the Boot and Runtime Services expected by an enterprise platform Operating System or hypervisor, for an SBSA-compliant Arm AArch64 server.
-Based on UEFI, SMBIOS and ACPI specifications.-http://infocenter.arm.com/help/topic/com.arm.doc.den0044c/Server_Base_Boot_Requirements_v1_1_Arm_DEN_0044C.pdf
History
• ARM Enterprise ACS -Architecture Compliance Suite tests SBSA and SBBR specifications.https://github.com/ARM-software/arm-enterprise-acs
FVP
FWTS
Arm Partner OSS
SBBR SBSA
SBSA
SCTSBBR
PAL
TF-A
UEFI
LuvOS
Introduction
• Server Base Manageability is a specification that is under development in conjunction with partners across the industry.
• Together with SBSA and SBBR, the SBMG provides a standard based approach to building Arm servers, their firmware and their server management capabilities.
• SBMG is developed within the Arm ServerAC community.• Engineering change request process similar to other standards bodies.• Anybody in the community can raise a request for a change. Public to
community by sending a mail to [email protected] Or raising a public ticket on the mantis DB https://atg-mantis.arm.com/
Introduction – Compliance Levels• Defines several levels of manageability compliance (e.g., M0, M1, M2)
SoC-BMC Interfaces
BMC-Platform Elements Interfaces
BMC Management Services Interfaces Host Interface
BMC-IO Device Interface
LEVEL M0IMPLEMENTATION DEFINED
IMPLEMENTATION DEFINED
IMPLEMENTATION DEFINED
IMPLEMENTATION DEFINED
IMPLEMENTATION DEFINED
LEVEL M1IMPLEMENTATION DEFINED
IMPLEMENTATION DEFINED Redfish and IPMI
IPMI based Host Interface and Redfish Host Interface/MCTP Host Interface
IMPLEMENTATION DEFINED
LEVEL M2IMPLEMENTATION DEFINED Redfish/PLDM/MCTP Redfish and IPMI
IPMI based Host Interface and Redfish Host Interface/MCTP Host Interface NC-SI
Introduction – Sub teams
• To help guide the Arm server designers to provide common manageability functions to the end users.
• To accelerate development of Server Base Manageability Guide (SBMG), three different teams within Arm ServerAC community with participation from several different Arm Ecosystem Partners have been formed.
-Reliability, Availability, Serviceability (RAS) team.-Platform Monitoring and Control team.-Remote Debug team.
Introduction – RAS
• Define error record formats for RAS errors leveraging existing common platform error record (CPER) as specified in unified extensible firmware interface specification (UEFI).
• Define the SOC - BMC manageability interface requirements for RAS errors including in-band and out-of-band interfaces.
• Define fault notification signal.• Define the interface and mechanism for injecting RAS hardware errors.
Introduction – RAS
BMC Satellite Management
Controller
Application Processor (SOC)
EventMessageSupported(Version, TID)
EventMessageBufferSize(CperSize)
Generate/Store CPER
RAS Errror EventCPER/Async Event
return(EventClass)
RAS Event Supported? yes
return(MaxBufferSize)
MaxBufferSize < CperSize?Yes
PlatformEventMessage(RasPollEvent)
PollForPlatformEventMessage (GetFirstPart, 0x0000)
return(PLDM_BASE_CODE)
return(nextDataHandle, Start=0x01,
RasPollEvent, EventDataSize,
EventData )
PollForPlatformEventMessage (GetNextPart, 0xFFFF)
return(nextDataHandle,Middle=0x02, RasPollEvent,
EventDataSize, EventData )
PollForPlatformEventMessage (GetNextPart, 0xFFFF)return(nextDataHandle,
End=0x04, RasPollEvent,
EventDataSize, EventData,
EventDataIntegrityChecksum )
PollForPlatformEventMessage (AcknowledgementOnly, 0xFFFF)return(nextDataHandle,
StartAndEnd=0x05, EventID=0x0000 )
Introduction – Monitoring & Control
• Define interfaces and protocol needed for BMC – SOC communication in the scope of Platform Monitoring leveraging Open Hardware Management Specification for Remote Machine Management defined by OCP.
• List of use cases analyzed for Standardizing-BMC to Multiple SOC communication.-BMC assisted SOC power actions.-BMC to monitor critical health of SOC.-BMC to monitor SOC boot progress.-BMC watchdog use cases.
Introduction – Monitor &
Control
BMC Satellite Management
Controller
RunInitAgent(PLDMTerminusOnline)
GetPDRRepositoryInfo()
return(PDRInfo)
GetPDR(0x0000,..)
return(nextRecordHandle)
GetPDR(recordHandle,..)
return(0x0000)
Create Central PDR Repo
RunInitAgent(SystemHardReset)
GetSensorReading(SensorID, ...)
return(presentReading, ...)
Calculate Reading
Reading Conversion formula: Y = (m * X + B)
Where:
Y = converted reading in Units
X = reading from sensor
m = resolution from PDR in Units
B = offset from PDR in Units
Units = sensor/effecter Units, based on the Units and auxUnits fields from the PDR for the numeric sensor
Introduction – Remote Debug
• Server Remote Debug is the act of gaining visibility of the hardware and software behaviors of an SoC, using a debugger which is not directly connected to the Server SoC.
• Define protocols for communicating between the debugger and the BMC.• Define physical interfaces between BMC and SoC.• Define protocols for communicating between the BMC and the SoC.• Define mechanisms for ensuring only suitable debuggers can access the
SoC.
Proof of Concept
• OpenBMC is a Linux foundation project. It is a highly extensible framework for BMC software and implement for data-center computer systems.
• OpenBMC implementation will be used as a medium to realize Server Base Manageability Guide.
• Arm and Arm SiPs are participating in the OpenBMC development so that there will be an open source implementation to the SBMG requirements.
• Arm has joined OpenBMC Technical Steering Committee (Arm, IBM, Intel, Facebook, Google & Microsoft).https://github.com/openbmc/docs/blob/master/README.md#technical-steering-committee
Implementation
Proof of Concept –
RAS
RAS App(BMC) libPLDM (BMC) libPLDM (Satellite MC)
Init(Blocking/Non-Blocking)
RAS Driver (Satellite MC)
Driver Starts
return (pldm_response)
Success
libMCTP (Satellite MC)
libMCTP(BMC)
mctp_init
return (InitCode)
mctp_binding_int()
return (BindInitCode)
Daemon Starts
Init(Blocking/Non-Blocking) mctp_init
return (InitCode)
mctp_binding_int()
return (BindInitCode)return (InitCode)
encode_pldm_cmd(PlatformEventMessage,
RASPollEvent)return (pldm_msg)
send_msg(mctp_eids, pldm_msg)
mctp_message_tx(msg, len)
Binding Physical LayerPldmRequestNotification
PldmRequestNotification
GeneratePLDM
Response
send_msg(mctp_eids, pldm_msg)
mctp_message_tx(msg, len)
Binding Physical Layer PldmResponseNotification
decode_pldm_cmd(pldm_response,
)encode_pldm_cmd(PollforPlatformEventMessage,
GetFirstPart)return (pldm_msg)
send_msg(mctp_eids, pldm_msg)
mctp_message_tx(msg, len)
Binding Physical Layer PldmRequestNotification
PldmRequestNotification
GeneratePLDM
Response
Proof of Concept – Remote Debug• Proposed Design proposal is to integrate OpenOCD within OpenBMC stack.
https://lists.ozlabs.org/pipermail/openbmc/2019-July/017122.html• OpenOCD is an open source on-chip debugging solution for JTAG connected
processors. It enables source level debugging with GNU gdb. It can also integrate with and GDB aware IDE, such as eclipse.
• Completed first phase of implementation which includes the demonstration of GDB connection on port 3333 to OpenOCD debug server daemon running on OpenBMC.
• Next phase of implementation to utilize a wrapper which adds JTAG master core infrastructure by defining new JTAG class and provide generic JTAG interface to allow hardware specific drivers to connect this interface.https://patchwork.ozlabs.org/cover/848652/
• This will enable all JTAG drivers to use the common interface part and will have separate drivers for hardware implementation.
Roadmap/PlanH
ost F
irm
war
e
Management Console Transport Protocol Enablement.
Reference Implementation to demo Control and Monitor on board and on SOC sensors of a chosen reference platform.Reference PLDM
enablement to send sensor data when requested by MC.
Reference Implementation to demo Remote Firmware upgrade Capability on a chosen reference platform.Leverage existing UEFI
capsule update and arm-tfupdate methods.
Reference Implementation to demo BMC RAS Errors on a chosen reference platform.Leverage existing CPER
records.
Reference Implementation to demo Remote Debug Capability on a chosen reference platform.JTAG Interface.
Security requirements/Root of Trust.
Man
agem
ent C
ontr
olle
r Pick/Choose Reference Hardware Platform for OpenBMC Enablement.
Enable OpenBMC. Demonstrate
Reference IPMI/REDFISH Commands.
Reference Implementation to demo Control and Monitor on board and on SOC sensors of a chosen reference platform.IPMI/REDFISH Command
implementation to gather and control on board and on SOC sensor.
Reference Implementation to demo Remote Firmware upgrade Capability on a chosen reference platform from BMC.Leverage PLDM for Firmware
Update standard.
Reference Implementation to demo transfer of Binary encoded Json (BEJ) RAS errors from SOC to BMC.IPMI/REDFISH
implementation to transfer RAS errors
Reference Implementation to demo Remote Debug Capability from BMC.Integrate OpenOCD in BMC
stack.
Sync up with Security team.
Stan
dard
s
PMCI WG Participation –MCTP, PLDM, Redfish
PMCI WG Participation –MCTP, PLDM, RedfishWhitepaper/Reference
Solution/ECR - Control and Monitor on board and on SOC sensors on arm servers.
Arm Specific Requirement Updates to MCTP, PLDM and Redfish.Whitepaper/Reference
Solution/ECR - Remote Firmware upgrade Capability
Arm Specific Requirement Updates to MCTP, PLDM and Redfish.Whitepaper/Reference
Solution/ECR – BMC RAS.
https://www.dmtf.org/standards/pmci (Target PMCI documents.)Whitepaper/Reference
Solution/ECR - Remote Debug from BMC.
PMCI Security Updates. Interface
Standardization.
Call to Action• Participate in OpenBMC to enable reference implementation and open
source delivery option.
• Participate in OCP to influence Hardware Management Specifications and designs.
• Participate in ServerAC to help define SBMG.
• Send an email to Arm ([email protected]).
• Participate in Redfish, PMCI and other DMTF Workgroups.