23 the flexray protocolece649/lectures/nov30...flexray prototype ha rdware (convergence 2000) 4...

22
1 23 The FlexRay Protocol Philip Koopman Significant material drawn from FlexRay Specification Version 2.0, June 2004 30 Nov 2015 © Copyright 2005-2015, Philip Koopman 2 Preview FlexRay – automotive choice for X-by-Wire applications Created by industry consortium founded in 2000 Core members: BMW, DaimlerChrysler, General Motors, Motorola, Philips, Volkswagen, and Robert Bosch. First public FlexRay protocol specification June 30, 2004 [FlexRay04] Combination Time-Triggered & Event-Triggered Approach Intended for use in safety critical, fault-tolerant systems Dec. 7, 2006: “FlexRay protocol has entered its production phase with devices from NXP(formerly Philips Semiconductors) and Freescale Semiconductor in BMW's newest X5 sport activity vehicle.” High volume production reached in about 2010 E.g., NXP had shipped 1 million Flexray chips by 2009; 2 million by mid-2010 But .. By October 2012: “FlexRay not dead, chip vendors claim” Due to possibility of time triggered Ethernet (using switches – no collisions)

Upload: others

Post on 10-Nov-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 23 The FlexRay Protocolece649/lectures/Nov30...FlexRay Prototype Ha rdware (Convergence 2000) 4 Topology – Active Star Plays Key Role Active star simplifies some aspects of distributed

1

23The FlexRay

Protocol

Philip Koopman

Significant material drawn fromFlexRay Specification Version 2.0, June 2004

30 Nov 2015

© Copyright 2005-2015, Philip Koopman

2

Preview FlexRay – automotive choice for X-by-Wire applications

• Created by industry consortium founded in 2000• Core members: BMW, DaimlerChrysler, General Motors, Motorola, Philips,

Volkswagen, and Robert Bosch.

First public FlexRay protocol specification June 30, 2004 [FlexRay04]

• Combination Time-Triggered & Event-Triggered Approach• Intended for use in safety critical, fault-tolerant systems

Dec. 7, 2006:“FlexRay protocol has entered its production phase with devices from NXP(formerly Philips Semiconductors) and Freescale Semiconductor in BMW's newest X5 sport activity vehicle.”

High volume production reached in about 2010• E.g., NXP had shipped 1 million Flexray chips by 2009; 2 million by mid-2010• But .. By October 2012: “FlexRay not dead, chip vendors claim”

– Due to possibility of time triggered Ethernet (using switches – no collisions)

Page 2: 23 The FlexRay Protocolece649/lectures/Nov30...FlexRay Prototype Ha rdware (Convergence 2000) 4 Topology – Active Star Plays Key Role Active star simplifies some aspects of distributed

2

3

FlexRay Prototype Hardware (Convergence 2000)

4

Topology – Active Star Plays Key Role Active star simplifies some aspects of distributed coordination

• Maximum delay through star is 250 ns, so it does not buffer full messages

[FlexRay04]

Page 3: 23 The FlexRay Protocolece649/lectures/Nov30...FlexRay Prototype Ha rdware (Convergence 2000) 4 Topology – Active Star Plays Key Role Active star simplifies some aspects of distributed

3

5

Redundant Active Star Intended to eliminate single-point failures for critical systems

• This seems the most likely configuration for FlexRay X-by-Wire

TTP was found to have some distributed bus guardian issues• Problems related to nodes listening to faulty network startup messages

• Single fault affected multiple “independent” portions of chip!

• Latest proposal is to move to dual channel star configuration for TTP as well

[FlexRay04]

6

General FlexRay Node Block Diagram• Host is application CPU

• Bus guardian controls enable line on bus driver

[FlexRay04]

Page 4: 23 The FlexRay Protocolece649/lectures/Nov30...FlexRay Prototype Ha rdware (Convergence 2000) 4 Topology – Active Star Plays Key Role Active star simplifies some aspects of distributed

4

7

Physical Layer Differential NRZ encoding

10 Mbps operating speed• Independent of network length because, unlike CAN, doesn’t use bit arbitration

[FlexRay04]

8

FlexRay Encoding Approach Data sent as NRZ bytes

• TSS = Transmit Start Sequence (LOW for 5-15 bits)• FSS = Frame Start Sequence (one HI bit)• BSS = Byte Start Sequence (similar to start/stop bits in other NRZ)• FES = Frame End Sequence (END symbol for frame – LO + HI)

Dynamic segment frames are similar• Adds a DTS = dynamic trailing sequence field; helps line up minislots

[FlexRay04]

Page 5: 23 The FlexRay Protocolece649/lectures/Nov30...FlexRay Prototype Ha rdware (Convergence 2000) 4 Topology – Active Star Plays Key Role Active star simplifies some aspects of distributed

5

9

FlexRay Frame Format

• This data is encoded into NRZ bytes per the encoding format

[FlexRay04]

1010

CAN vs. FlexRay Length Field Corruptions

CAN does not protect length field• Corrupted length field will point to wrong location for CRC!

• One bit error in length field circumvents HD=6 CRC

FlexRay solves this with a header CRC to protect Length

ID

ID

LEN

LEN

CRC

CRCCRC

DATA

DATA

Original Message

Corrupted LEN

Source: FlexRay Standard, 2004

Page 6: 23 The FlexRay Protocolece649/lectures/Nov30...FlexRay Prototype Ha rdware (Convergence 2000) 4 Topology – Active Star Plays Key Role Active star simplifies some aspects of distributed

6

11

FlexRay Frame Fields Frame ID

• Frame’s slot number (1 .. 2047); unique within channel in communication cycle

Payload Length• # of 16-bit words in payload

• Must be same for all messages in static segment of communication cycle

Header CRC• HD=6 error detection for header data (optimal polynomial for 20 bits)

Cycle Count• Number of current cycle

• Even vs. odd cycle count values are used by protocol details– Example: clock sync corrects offset on odd cycles and rate on even cycles

Data• 0 .. 254 bytes (must be same for all static frames)

CRC in trailer segment• HD=6 up to 248 payload bytes; HD=4 above that until 508 payload bytes

12

FlexRay Message Cycle Two main phases: static & dynamic

• “Temporal firewall” – partition between phases protects timing of each phase

[FlexRay04]

Page 7: 23 The FlexRay Protocolece649/lectures/Nov30...FlexRay Prototype Ha rdware (Convergence 2000) 4 Topology – Active Star Plays Key Role Active star simplifies some aspects of distributed

7

13

Microtick & Macrotick Microtick level

• Node’s own internal time base

• Direct or scaled value from a local oscillator or counter/timer

• Not synchronized with rest of system – local free-running oscillator

Macrotick level• Time interval derived from cluster-wide clock sync algorithm

• Always an integral number of microticks– BUT, not necessarily the same number of microticks per node

– Number of microticks varies at run time to implement clock sync

Designated macrotick boundaries are “action points”• Transmissions start here – static; dynamic; symbol window

• Transmissions end here – dynamic segment

14

Static Segment TDMA messages, most likely used for critical messages

• All static slots are the same length in microticks

• All static slots are repeated in order every communication cycle

• All static slot times are expended in cycle whether used or not– Number of static slots is configurable for system ; up to 1023 slots

[FlexRay04]

Page 8: 23 The FlexRay Protocolece649/lectures/Nov30...FlexRay Prototype Ha rdware (Convergence 2000) 4 Topology – Active Star Plays Key Role Active star simplifies some aspects of distributed

8

15

Static Segment Details Two-channel operation

• Sync frames on both channels; other frames optionally 1 or 2 channels– Less critical/less expensive nodes might only connect to one channel

• Slots are lock-stepped in order on both channels

TDMA order is by ascending frame ID number• Frame number used to determine slot # by software

– It is NOT a binary countdown arbitration mechanism – only one xmitter at a time

– Optionally, there is a Message ID in the payload area that can be unrelated to slot number

– Example use: each node uses its node # as frame # and multiplexes its messages onto a single time slot, distinguished by Message ID

• In contrast, TTP has a MEDL that can have sub-cycles– Need neither a Frame ID nor a Message ID

– Extra information to be managed and coordinated

16

Dynamic Segment High-level idea is event-based communication channel

• Want arbitration, but must be deterministic

• Binary countdown not used (among other things, restricts possible media)

“Minislot” approach• Can be thought of as a time-compressed TDMA approach (details on next slide)

• Two channels can use independent message queues

[FlexRay04]

Page 9: 23 The FlexRay Protocolece649/lectures/Nov30...FlexRay Prototype Ha rdware (Convergence 2000) 4 Topology – Active Star Plays Key Role Active star simplifies some aspects of distributed

9

17

Dynamic Segment Details High-level idea is each minislot is an opportunity to send a message

• If message is sent, minislot expands into a message transmission

• If message isn’t sent, minislot elapses unused as a short idle period

• All transmitters watch whether a message is sent so they can count minislots

[FlexRay04]

18

Minislot Performance Frame ID # is used for slot numbering

• First dynamic Frame ID = last static Frame ID + 1

Dynamic segment has a fixed amount of time• Fixed number of macroticks, divided up into minislots

• There might or might not be enough time for all dynamic messages to be sent

• When dynamic segment time is up, unsent messages wait for next cycle

Net effect: event-triggered messages• Messages with the lowest Frame ID are sent first

• Each Frame ID # can only send ONE message per cycle

• As many message as will fit in dynamic segment are sent

• This means that only highest priority messages queued are sent in each cycle

• Note that idle minislots consume dynamic segment bandwidth– But minislots are a lot smaller than messages

Page 10: 23 The FlexRay Protocolece649/lectures/Nov30...FlexRay Prototype Ha rdware (Convergence 2000) 4 Topology – Active Star Plays Key Role Active star simplifies some aspects of distributed

10

19

Time Keeping Macrotick is common unit of time across nodes

• Idea is that it is within one microtick of correct at each node

• Rate and offset correction performed every pair of cycles to keep in sync

Two main timekeeping tasks:• MTG – Macrotick Generation Process

– Applies rate and offset correction values

• CSP – Clock Synchronization Process– Initialization

– Calculation of rate and offset values

• Distributed time theory applies here (see lecture on that topic)– Uses fault tolerant midpoint calculation (how many errors does this tolerate?)

[FlexRay04]

20

Clock Sync Schedule

NIT = Network Idle Time at end of each cycle

[FlexRay04]

Page 11: 23 The FlexRay Protocolece649/lectures/Nov30...FlexRay Prototype Ha rdware (Convergence 2000) 4 Topology – Active Star Plays Key Role Active star simplifies some aspects of distributed

11

21

General Bus Guardian (BG) Operation Idea is to have an independent time source

• If communication controller attempts to transmit at wrong time…bus guardian stops it because “enable” is removed outside correct time slice

• If BG is incorrect…communication controller won’t be attempting to transmit anyway

• Goal is “fail silent” operation– Both BG & communication controller have to enable & transmit for message to be

sent

Why is this required?• What if a faulty node tries to send at the wrong time – takes down network!

– Especially “babbling idiot” failure, where node broadcasts continuously

• It is very difficult to get this right at low cost– Ideally want separate chips for BGs to eliminate common mode failures

– As a practical matter, want to integrate on chip to save cost

22

FlexRay Tradeoffs Advantages

• Probably has primitives necessary for critical x-by-wire applications• Static segment provides timing guarantees and some fault tolerance• Dynamic segment gives flexibility for event triggered messages• Big industry consortium behind it• It’s “flexible”

Disadvantages• After 10 years it is getting mature

– Would be no surprise if protocol defects emerge – same as for any other protocol

• Does not provide as complete a set of primitives as TTP– Group membership is an application problem, but will be needed for x-by-wire– Any safety critical operation on host might complicate safety case

Other• Does not encompass a complete system architecture

– Provides flexibility for architectures… but not a blueprint for fault tolerance

Page 12: 23 The FlexRay Protocolece649/lectures/Nov30...FlexRay Prototype Ha rdware (Convergence 2000) 4 Topology – Active Star Plays Key Role Active star simplifies some aspects of distributed

12

23

Relationship To Selected Other Topics Distributed systems:

• Enables hybrid Time Triggered & Event Triggered designs• Requires application to do their own atomic broadcast & group membership• Built-in distributed timekeeping results (synchronous system approach)

Embedded networks:• Uses combination of TDMA and minislot (implicit token/compressed TDMA)

approaches

Real time:• Requires both static scheduling (static portion) and dynamic scheduling

(dynamic portion)

Fault tolerance:• Requires application support for Byzantine faults (e.g., group membership)• Includes data integrity checks on header & payload• Includes no security – that is an application responsibility• Includes some support for system reset, but host must behave properly

Safety – FlexRay consortium is working on protocol analysis• Requires a safety case, including fault analysis

Page 13: 23 The FlexRay Protocolece649/lectures/Nov30...FlexRay Prototype Ha rdware (Convergence 2000) 4 Topology – Active Star Plays Key Role Active star simplifies some aspects of distributed

1

24Society & Ethics

Distributed Embedded Systems

Philip Koopman

November 30, 2015

© Copyright 2000-2015, Philip Koopman

2

How Much Risk Is OK? Few or no products are entirely risk free

• Is it OK to simply inform people of the risks?– What if they have no practical alternative?

• Is it OK to let people self-pay to be safe?– Are poor people more expendable?

• How do we decide how safe is safe enough?

• How do we know we built it safe enough?– Follow the recipe: IEC 61508, ISO 26262, UL 1998, …. Etc.

Should the “many” be protected at the cost of the “few”?• Passenger side air bags probably help save males who wear seat belts

– Can injure people not wearing seat belts

– Until recent changes, it looks like they increased child and female death rates

• If something is risky, should it be illegal?– Is it OK to modify your car’s software for performance at expense of safety?

Page 14: 23 The FlexRay Protocolece649/lectures/Nov30...FlexRay Prototype Ha rdware (Convergence 2000) 4 Topology – Active Star Plays Key Role Active star simplifies some aspects of distributed

2

3

What Forces Product Safety? Making laws

• Legislative action, especially consumer protection laws

• Litigation results (case law)

Industry making improvements for other reasons• Safety to improve public image

• “Self-regulation” in hopes of avoiding legislative action

• Improvements made at request (or upon demand) by insurance companies

Sometimes good technical products can’t/shouldn’t be sold• If you can’t figure out who is liable if it fails, it might not be viable.

• If you can’t figure out risks to set insurance rates, can you sell it?

• If society wants it anyway, maybe legislate that it is OK for it to fail

4

Putting A $$$ Amount On Human Life Is Risky “GM and the Law,” The Economist, July 17, 1999

Remember this discussion?

Page 15: 23 The FlexRay Protocolece649/lectures/Nov30...FlexRay Prototype Ha rdware (Convergence 2000) 4 Topology – Active Star Plays Key Role Active star simplifies some aspects of distributed

3

5

Automotive Guide To Correct Wording

http://blogs.wsj.com/corporate-intelligence/2014/05/16/the-69-words-you-cant-use-at-gm/

6

Automotive Guide To Correct Wording

http://blogs.wsj.com/corporate-intelligence/2014/05/16/the-69-words-you-cant-use-at-gm/

Page 16: 23 The FlexRay Protocolece649/lectures/Nov30...FlexRay Prototype Ha rdware (Convergence 2000) 4 Topology – Active Star Plays Key Role Active star simplifies some aspects of distributed

4

[AN9025-3]

8

ALARP Principle (IEC 61508) Nothing is completely safe!

ALARP = As Low As Reasonably Practicable –Guideline for IEC 61508• Idea is to associate risk with cost (British safety principle)• Set cutoff at “it’s not worth it to spend more on this risk”• Means you need to know a tolerable risk; often based on existing risks• Automotive application:

“X-by-Wire should be at least as safe, overall, as current vehicles”– Current, non-X-by-Wire fatality rate is about 7 * 10-7/hr– Total accident rate, including fender benders and drunk driving, is about 7 * 10-4/hr

Alternate principle (not discussed in 61508, but possible to use):• German MEM (Minimum Endogenous Mortality)• General idea – technology-caused death rate should not be significantly affected• CENELEC pre-standard prEN 50126: 2 * 10-4 fatalities / year baseline

– Less than 10-5 fatalities/year for a new system over entire population (5% of reference value)– For vehicles, this works out to 4 * 10-8 fatalities/hr

Good discussion at: http://www.ewics.org/uploads/attachments/_risk-analysis-subgroup-working-papers/A_discussion_of_risk_tolerance_principles.html

Page 17: 23 The FlexRay Protocolece649/lectures/Nov30...FlexRay Prototype Ha rdware (Convergence 2000) 4 Topology – Active Star Plays Key Role Active star simplifies some aspects of distributed

5

9

US Criminal Investigation of Toyota UA “Toyota Is Fined $1.2 Billion for Concealing Safety Defects”

– March 19, 2014

• Four-year investigation by US Attorney General

• Related to floor mats & sticky throttle pedals only

“TOYOTA misled U.S. consumers by concealing and making deceptive statements about two safety-related issues affecting its vehicles, each of which caused a type of unintended acceleration.” [DoJ Statement of Facts]

• Deferred prosecution for three years in exchange for fine and continuing independent review of its safety processes.

• Toyota said in a statement that it had made fundamental changes in its corporate structure and internal safety controls since the government started its investigation four years ago.

9http://www.nytimes.com/2014/03/20/business/toyota-reaches-1-2-billion-settlement-in-criminal-inquiry.html

10

Professional Licensing for Engineers

• Bridge builders are licenses Professional Engineers• They use genuine math to ensure structural safety

• Should software developers be licensed?

Page 18: 23 The FlexRay Protocolece649/lectures/Nov30...FlexRay Prototype Ha rdware (Convergence 2000) 4 Topology – Active Star Plays Key Role Active star simplifies some aspects of distributed

6

11

Ethics Ethics is all about doing the right thing

• What is the “right” thing?

• Ethics is always interesting in the gray areas, usually not simple answers

• There are always the obvious rights and wrongs

Definition: Ethics are a personal code of behavior.• They represent an ideal we strive toward because we presume that to achieve

ethical behavior is appropriate, honorable, and desirable --- both on a personal level and within the groups we belong to. [Dakin]

12

Process Is Not Sufficient For Ethical Behavior How to destroy a Space Shuttle (and

lose 7 astronauts); January 1986• Complex process and mechanisms in

place to ensure safe shuttle launch

• O-rings had been known to fail at low temperatures

– Single O-ring seal failure at 53 degree observed in a previous launch

– For Challenger launch, temperature was 29 to 36 degrees – double O-ring seal failure resulted

http://onlineethics.org/moral/boisjoly/rocket.html

Page 19: 23 The FlexRay Protocolece649/lectures/Nov30...FlexRay Prototype Ha rdware (Convergence 2000) 4 Topology – Active Star Plays Key Role Active star simplifies some aspects of distributed

7

13

Some Say Graphs Played A Role Thiokol engineering graph 1/27/86

Rogers Commission Graph

http://www.footnote.tv/mwchallenger.html

http://www.firstscience.com/SITE/ARTICLES/challenger.asp

14

Challenger Cultural Failure Morton-Thiokol initially said “don’t launch; it’s too cold”

• NASA responded “please reconsider”

• Engineers had to prove to management shuttle is safe to launch

• Management decided to tell NASA “OK to launch” over engineer protests

How did Challenger happen then?• Role reversal – engineers proving to managers that shuttle shouldn’t launch!

Ethics is about personal responsibility• Just because the customer says you’ll lose your job if you don’t do something

doesn’t make it right

Page 20: 23 The FlexRay Protocolece649/lectures/Nov30...FlexRay Prototype Ha rdware (Convergence 2000) 4 Topology – Active Star Plays Key Role Active star simplifies some aspects of distributed

8

15

Morals vs. Ethics By the dictionary, they are nearly identical, but:

• Morals: principles of right and wrong conduct. (religious connotations)

• Ethics: system/structure of morally correct conduct.(professional/social connotations)

16

Professional Codes of Ethics IEEE code of ethics short to the point

• Mostly broad points; No in-depth discussion

• Sometimes IEEE has had trouble standing behind its members on these points– But they will be happy to hang you out to dry if you violate them

ACM code of ethics is longer

Software Engineering code of ethics is fairly reasonable• Has real recommendations

• Very practical

• Not contradictory

• Reads like a specification

Most codes have rules/recommendations that are “common sense”• Emphasize responsibility to public good

Page 21: 23 The FlexRay Protocolece649/lectures/Nov30...FlexRay Prototype Ha rdware (Convergence 2000) 4 Topology – Active Star Plays Key Role Active star simplifies some aspects of distributed

9

17

IEEE Code of EthicsWe, the members of the IEEE, in recognition of the importance of our technologies in

affecting the quality of life throughout the world, and in accepting a personal obligationto our profession, its members and the communities we serve, do hereby commitourselves to the highest ethical and professional conduct and agree:

1 to accept responsibility in making engineering decisions consistent with the safety, health and welfare of the public, and to disclose promptly factors that might endanger the public or the environment;

2 to avoid real or perceived conflicts of interest whenever possible, and to disclose them to affected parties when they do exist;

3 to be honest and realistic in stating claims or estimates based on available data;

4 to reject bribery in all its forms;

5 to improve the understanding of technology, its appropriate application, and potential consequences;

6 to maintain and improve our technical competence and to undertake technological tasks for others only if qualified by training or experience, or after full disclosure of pertinent limitations;

7 to seek, accept, and offer honest criticism of technical work, to acknowledge and correct errors, and to credit properly the contributions of others;

8 to treat fairly all persons regardless of such factors as race, religion, gender, disability, age, or national origin;

9 to avoid injuring others, their property, reputation, or employment by false or malicious action;

10 to assist colleagues and co-workers in their professional development and to support them in following this code of ethics.

Approved by the IEEE Board of Directors, August 1990

18

Seven Ethics Guidance PointsIf faced with an ethical dilemma, ask:

1. Is the action Legal?

2. Is it wrong?

3. If the problem is gray: “How would this look in the newspaper? Will it appear insensitive or reckless, or be seen as taking more risk than we should have?”• The Washington Post test – how does it look as an unsympathetic headline?

4. Does the action violate the company’s stated values (i.e. written policy?)

5. Will you feel bad if you take this action?

6. Ask someone you trust for guidance (a friend; the corporate ethics officer)

7. Keep asking until you have an answer

[Costlow, IEEE Spectrum, Dec. 2002, pg. 57]

Page 22: 23 The FlexRay Protocolece649/lectures/Nov30...FlexRay Prototype Ha rdware (Convergence 2000) 4 Topology – Active Star Plays Key Role Active star simplifies some aspects of distributed

10

19

Legal Questions To Ask

If it breaks, who gets sued? (Who goes to jail?)

What about things that are beyond your control?• Unintended use• Failure in “extreme” conditions (what is reasonable to anticipate?)• Moron users

How much diligence is “enough”• If there is a standard for similar products, that helps a lot (e.g., UL)• If there is a generic standard, that may help (e.g., IEEE)• Warning labels even help some, no matter how silly they may seem

Important US Judicial system lessons:• Financial liability is not linearly correlated with culpability• Nobody wants bad news in writing• Money talks (it doesn’t buy results, but without it you are at a disadvantage)

20

Ethical Issues Specific To Embedded Systems How much is a human life worth?

• Is it OK to avoid documenting the number when making a decision?

If we don’t know how to get software perfect, when do we ship it?• Is it enough that potential good outweighs potential bad?

• Is it enough that we’ll get fired (or our startup will fail) if we don’t ship?

• If bad process statistically predicts producing bad products, is it OK to work for a company with a “broken” software process?

• Is it OK to produce critical systems without being able to measure their safety directly (i.e., by arguing that best-known practices ensure sufficient safety)?

Is it OK to tell the consumer he/she is responsible for things we don’t know how to get right? (security, safety, exceptional situations)• Is it OK to evade consumer protection laws by redesigning an embedded

product to be a “computer” instead of “goods”?

Is it OK to create products that will inevitably compromise privacy?• Location-aware mobile cell phone trackers/databases

• Personal authentication systems