qnx software systems charles eagan, engineering vice president [email protected]
TRANSCRIPT
QNX Software Systems
Charles Eagan,Engineering Vice [email protected]
2All content copyright QNX Software Systems
QNX: The choice of Networking Leaders
3All content copyright QNX Software Systems
Our Markets
MilitaryIndustrial Automation
NetworkingMedicalAutomotive Consumer
Our Fastest Growing MarketOur Fastest Growing Market• QNX is the #1 for automotive.• 140 car models with QNX
Our Largest Potential MarketOur Largest Potential Market• 2nd fastest growing market• Largest customer (Cisco)• Largest potential revenue
Our Largest MarketOur Largest Market• Most customers today• Most revenue today
Stable and Healthy MarketsStable and Healthy Markets
• Infotainment suitable for consumer
Networking
QNX: An excellent Networking partner
4All content copyright QNX Software Systems
Networking Decision
How does a networking company decide on an operating system strategy?
► Many executives are not aware of the productivity implications of a commercial operating system and development tools
► Engineers that are aware of the implications are often not able to significantly influence decisions
► The time window where an operating system can effectively be transitioned is narrow
► Strong leadership and determination is required to make an effective transition
5All content copyright QNX Software Systems
Why Companies Choose QNX
SynergisticEngineering
Culture
FlexibleOperations
Model
CustomizedBusiness
Terms
AdvancedTechnology
Suite
CompetitiveRoadmap
Tools andDevelopment
Aid
6All content copyright QNX Software Systems
Development Dynamics
Cisco choose QNX as a strategic partner in the 1996/1997 timeline
This is interesting as at this time QNX was primarily an Intel based technology and Cisco was mostly MIPS based
The skills and abilities of the QNX team combined with engineering chemistry and aligned roadmaps and vision led to the creation of an innovative joint collaboration
The use of QNX technologies has evolved into over 10 different groups within Cisco
7All content copyright QNX Software Systems
Initial Public Announcement
8All content copyright QNX Software Systems
Many other hardware platforms
QNX Ecosystem
QNXCore Technology
andTools Suite
CRS-12000Catalyst 6k
Ethernet switch
Cisco family of service ports
adapters
Many other hardware platforms
CRS-1
Many other hardware platforms
9All content copyright QNX Software Systems
Important Technology Areas
Highly Available Fast Architecture
Flexible Scaleable Architecture – Fully distributed or monolithic
Secure
Using/following technology and development standards► IEEE
► POSIX/Unix
► Java/C++/C/gcc
Flexible► Endian abstraction
► Processor neutral – MIPS/PPC/Intel
Productivity Tools
10QNX Confidential. All content copyright QNX Software Systems.
Scalable Solutions for Cisco
QNX Neutrino used in applications from framer interface support to CRS-1
From 1 CPU to thousands networked and functioning as a single compute resource
11QNX Confidential. All content copyright QNX Software Systems.
Industry Leading Multi-Core
Asymmetric Multiprocessing
• Support existing software base, non-optimized uni-processor approach
• Heterogeneous OS approaches require AMP• Sharing resources in AMP is non-trivial, scaling
beyond dual core is even tougher
Bound Multiprocessing
• Migrate existing software base• Mix existing applications with multi-
core optimized applications• Transparent scaling beyond dual
core• QNX Pioneer
Symmetric Multiprocessing
• Multi-core optimized applications• Resource sharing handled by OS• Transparent scaling beyond dual
core
Design Needs
Proven OS support for any multi-core processing model
Full suite of development tools to characterize and optimize multi-core applications
Expert professional services and support
Wide range of multi-core board support packages
The QNX solution enables software transition to multi-core processors:
12All content copyright QNX Software Systems
Highly Available Architecture
IngressEgress
FileSystem
ProcessManager
BGP ISIS
Forwarding
Using Messages:• Cleanly decouples processes• POSIX calls built on messages
Using Messages:• Cleanly decouples processes• POSIX calls built on messages
Memory Protection• The most important technology that is mandatory for true system availability• 90% of all system failures are due to foreign memory scribblers
Memory Protection• The most important technology that is mandatory for true system availability• 90% of all system failures are due to foreign memory scribblers
The best availability architecture in the worldProcess communicate by sending messagesThe best availability architecture in the worldProcess communicate by sending messages
ForwardingForwarding
Shared memorylarge data sets and hardware access
Shared memorylarge data sets and hardware access
µKMessage Bus
Microkernel
13All content copyright QNX Software Systems
MessageBridge
Flexible Architecture – Fully Distributed or Monolithic
FileSystem
ProcessManager
OSPF BGP MPLS
Bridging the kernel allows messages to flow transparently from one message bus to another over a variety of transports
(Ethernet, MOST, custom switching fabric, Internet, …)
Bridging the kernel allows messages to flow transparently from one message bus to another over a variety of transports
(Ethernet, MOST, custom switching fabric, Internet, …)
Applications and Servers become network distributed
without any special code.
You gain unified access to all remote hardware and
software resources with permission checking.
Applications and Servers become network distributed
without any special code.
You gain unified access to all remote hardware and
software resources with permission checking.
µK
internet
ProcessManager
Netflow
µK
CustomApplication
Networking
Traffic Engineering
Message BusMicrokernel
Adding new services on any CPU can transparently provide that service to all CPUs
System functions as one single routerseamlessly across many individualloosely or tightly coupled CPU’s
System functions as one single routerseamlessly across many individualloosely or tightly coupled CPU’s
14All content copyright QNX Software Systems
CPUCPUCPUCPUSMPSMPBMPBMPAMPAMP
DMPDMP
Route Processor
Routing controller Distributed Unified Router
Distributed Computing Architecture
TransparentTransparentSecureSecure
DistributionDistributionProtocolProtocol
OSPF BGP
Line Card/Forwarded Plane
MPLSBGP
Applications canApplications canseamlessly move to any seamlessly move to any
linecardlinecard
15QNX Confidential. All content copyright QNX Software Systems.
Auto-discovery and Load BalancingM
essa
ge
Bri
dg
e (E
ther
net
, fab
ric,
inte
rco
nn
ect…
)
Flash File SystemDatabase
Application
Microkernel
MessageQueues
NetworkingStack
Flash File System
Application
Microkernel
Internet
Message-Passing Bus
QNX Transparent Distributed Processing
► Distributed POSIX model► Framework for dynamic
interconnection of hardware and software among remote nodes
► Global Name Service for discovery of new hardware and applications
► Stop applications on one node and restart on a new node
No reboot required All connections are
maintained transparently► In use extensively in CRS-1
16All content copyright QNX Software Systems
Security Principles
Separation of privilege► Different privilege levels available to different applications
► Lowest level of privilege required assigned to application
Complete mediation► Check all accesses — no exceptions
Fail-safe defaults► Lowest level of privileges/access assigned by default
Design► “Object oriented” design principles
Abstract, modularize, encapsulate, isolate
► Very helpful if the OS “enforces” these principles
Resource protection at the application level► Memory, CPU cycles, hardware registers, peripherals, etc.
Operating system architecture can greatly affect how (or even if) these principles can be applied
17All content copyright QNX Software Systems
Fault Removal and Recovery: Availability
Availability =MTBF
MTBF + MTTR
Recovery capability can be characterized by a systems “availability”
► The probability that a system or subsystem will perform its intended function at a given instant of time.
► MTBF is mean time between failures and MTTR is mean time to repair
99.999% availability (five nines) = fewer than 5.25 minutes of annual downtime (scheduled or unscheduled)
Networking companies and industry watchers constantly monitor these statistics
18All content copyright QNX Software Systems
System Guarantees: Increase Availability
Increase MTBF► Test and debug (repeat often!)► Most OSs provide many tools to increase MTBF
Also reduce MTTR► Detect, contain, recover from error► Availability approaches 100% as MTTR approaches 0
Recovery Scenarios► System reboot (real-time executive, monolithic kernel)
Seconds to minutes to recover► Restart service (microkernel, monolithic application)
Milliseconds (<< 1 second) to recover
Combination of microkernel + recovery framework► Much easier to attain “five nines” availability
19QNX Confidential. All content copyright QNX Software Systems.
High Availability Framework - CPM
Developed with Cisco as lead customer
High Availability Framework► Construct custom failure recovery scenarios
► Design your system to reconnect instantly and transparently to minimize downtime
20All content copyright QNX Software Systems
Highly Available - CPM
High Availability Recovery Framework (CPM: Critical Process Monitor) monitors components and handles recovery of component failures
Guardian process provides software failover to ensure that the high availability process doesn’t become a single point of failure
Client-side library allows components to reconnect instantly and transparently
► User can easily add state information and customize recovery procedure
Can also provide heartbeat services to detect component hangs — this allows the system to monitor itself
CPM
App
Guardian
CPM Checkpointed State
21All content copyright QNX Software Systems
Critical Process Monitoring
Microkernel
Critical Process Monitor (CPM) CPM Guardian Application A
Driver Application B Driver
1. Driver faults due to illegal access to memory outside memory-protected space2. Kernel notifies CPM of process fault3. Debug information on faulting process is collected (standard core file)4. Driver exits and returns all resources to system; IPC channel destroyed5. CPM restarts new driver6. Driver IPC channels are reestablished by CPM client library
7. Driver requests information on last state checkpoint from CPM and service is restored
Shared Memory State Information
22QNX Confidential. All content copyright QNX Software Systems.
Dynamic Upgradeability
File System
Process Manager
Protocol Stack
Microkernel
Application
Microkernelis the only trusted
component
Audio Driver
Graphics Driver
Message Bus
…
Applications, File Systems and Drivers► Exist as processes on a message bus
► Reside in memory-protected address space
► Can be started, stopped, added, removed, relocated and upgraded without rebooting
► Cannot corrupt other software components
23QNX Confidential. All content copyright QNX Software Systems.
Momentics: Eclipse Leader and Founder
Scalable, Reliable and High Performance
Out-of-the-box support for: Multiple hosts, targets, languages and BSPs
Optimizing compilers
Compatible with all 3rd party Eclipse plug-ins
24All content copyright QNX Software Systems
QNX: Introducing Adaptive Partitioning A critical technology for networking applications
25All content copyright QNX Software Systems
Introducing Adaptive Partitioning
What is Adaptive Partitioning?► Adaptive partitioning is a new QNX product that extends the Neutrino RTOS► Allows you to build secure compartments or “partitions” around a set of
applications or threads► Partitions enforce CPU guarantees for applications, controlled by easy to
use budgets
Why is it Adaptive?► Patent-pending design ensures all available CPU cycles are given to
partitions that need processing time – no CPU cycles wasted
► Provides performance advantage by permitting full processor utilization to accommodate spikes in demand
Easy to get started► No changes to how designers work today
POSIX programming model for the same, familiar design, programming & debugging techniques
► No code changes are required to implement partitions
26All content copyright QNX Software Systems
Microkernel Architecture for Security
Applications and Drivers> Are processes which plug into a message bus> Reside in their own memory-protected address space> Cannot corrupt other software components or kernel> Can be started, stopped and upgraded on the fly> Failures in drivers do not require system restarts
Application
QNXNeutrino
Microkernel
Application Application
Disk Graphics SerialNetworkAudio
ARM,MIPS, SH4, PowerPC, Xscale, x86
27All content copyright QNX Software Systems
Introducing Adaptive Partitioning
QNX® Neutrino ® RTOS provides the basic structure ► Application and OS service encapsulation with message passing
► Hardware memory protection for security and reliability
Adaptive partitioning extends the Neutrino micro-kernel to provide secure partitions and guaranteed CPU time
► A collection of processes and threads make up a partition
► A partition is assigned a percentage of CPU time, averaged over a time window
► Overlays on existing thread scheduling
Add-on Application
Add-On Application
File System
Device Driver
Core Application
QNX NeutrinoMicrokernel
Core Application
Core Application
10% 70% 20%
28All content copyright QNX Software Systems
File System
Networking
Maximum Performance
Core Application
CoreApplication
QNX NeutrinoMicrokernel Add-On
10%I/O Partition
70%Application
Partition
20%UntrustedPartition
Add-On
Device Drivers
CoreApplication
CPU Utilization
CPU guarantees for partitions at full
system load
Dynamic allocation of CPU cycles when not
fully loaded
0% 50% 100%
10% 20%70%
55%5% 30% 10%
Idle CPU time
29All content copyright QNX Software Systems
0% 20% 40% 60% 80% 100%
System Restart
Steady State
TopologyChange
Reconfiguration
Routing & Forwarding
ManagementInterfaces
(CLI, SNMP)
5%
10% 70% 20%
5%
10%
95%
80%
90%
ProcessingLoad
Scenarios
Understanding “Adaptive”
Maintenance
Idle Time10%
5%
30All content copyright QNX Software Systems
Partitioning to Contain Threats
FileSystem
Control PlaneProtocols
Core Application
CoreApplication
QNX NeutrinoMicrokernel Add-On
Add-OnDevice Drivers
NetworkManagement
Control Plane
Attacked
Denial of Service Attack
Contained
Rogue add-on
contained
10% 5% 25% 10%Adaptive PartitioningCPU Time Guarantees 50%
Without Partitioning► Rogue software can starve core applications of CPU time
► Distributed DOS attacks can busy your system with network processing
With Partitioning► Create OS enforced partitions to ensure critical system resources are
protected
► Contain threats and protect core applications and services
31All content copyright QNX Software Systems
CPU Guarantees: Increase Availability
Guaranteed CPU time for recovery actions► Failed components isolated, contained and
cannot impact fault recovery processes
Guaranteed CPU time for notification and user intervention
► Ensure that remote user interfaces remain operational and cannot be starved
NetworkingCore
Application
QNX Neutrinomicrokernel
Fault Recovery
Automatic RecoveryReduce MTTR
File System
RemoteInterface
User Interface
Device Drivers
CoreApplication
Alarm Notification
Networking – DOS Attack Contained
32All content copyright QNX Software Systems
Software Complexity Development View
Large teams, multi-site development► Geographic and time zone separation
Division of responsibilities, functional areas and expertise
► Differing designer skill sets
License and integrate 3rd party technologies to reduce development costs
► Lack of developer control of 3rd party technology
Parallel development, followed by system integration & verification Routing &
Forwarding
ManagementInterfaces
Maintenance
33All content copyright QNX Software Systems
Building Complex SystemsSystem Integration
System integration is a significant portion of the overall project schedule
► Always on the project’s critical path
Problems detected late in design cycle are the most costly► Initial verification cost to find bugs
► Typically hold up whole project
► Require system experts to troubleshoot and resolve
► Cost of re-implementation, re-test
Design changes introduced late add project risk► Typically, band-aid solutions are used to limit churn and maintain schedule
► Net effect is to reduce product quality and performance
Typical problems that occur at integration time are typically related to performance, memory corruption and process starvation
34All content copyright QNX Software Systems
Conclusion
QNX remains committed to the markets QNX remains committed to the markets that made it successful and is that made it successful and is
aggressively expanding into new markets aggressively expanding into new markets to fuel future growth.to fuel future growth.
Our technology roadmap will continue to Our technology roadmap will continue to show clear leadership and will address show clear leadership and will address
the needs of our markets. the needs of our markets.