on codes, machines, and environments: reflections and experiences

51
On codes, machines, and On codes, machines, and environments: reflections and environments: reflections and experiences experiences Vincenzo De Florio Vincenzo De Florio University of Antwerp University of Antwerp [email protected] [email protected]

Upload: vincenzo-de-florio

Post on 22-Jan-2018

371 views

Category:

Science


0 download

TRANSCRIPT

Page 1: On codes, machines, and environments: reflections and experiences

On codes, machines, and On codes, machines, and environments: reflections and environments: reflections and

experiencesexperiencesVincenzo De FlorioVincenzo De Florio

University of AntwerpUniversity of [email protected]@gmail.com

Page 2: On codes, machines, and environments: reflections and experiences

22

AgendaAgenda•• Short introductionShort introduction•• Intro of main charactersIntro of main characters•• Quality and driftQuality and drift•• Drift containement strategiesDrift containement strategies•• Off-line / on-lineOff-line / on-line

•• Adaptation servicesAdaptation services•• Context awareness /Context awareness /

reactive behaviorsreactive behaviors

•• Reactive behaviors:Reactive behaviors:•• Elasticity / resilienceElasticity / resilience

•• ConclusionsConclusionsAcc

ount

of

a num

ber

of

exper

ience

s

Page 3: On codes, machines, and environments: reflections and experiences

33

CareerCareer•• MOSAIC / Universiteit AntwerpenMOSAIC / Universiteit Antwerpen•• adaptive and dependable softwareadaptive and dependable software•• resilience and antifragilityresilience and antifragility•• cyber-physical societiescyber-physical societies

•• ACCA / ESAT / K.U.LeuvenACCA / ESAT / K.U.Leuven•• parallel and distributed systemsparallel and distributed systems•• advanced computer architecturesadvanced computer architectures•• linguistic support to fault-tolerancelinguistic support to fault-tolerance

•• SASIAM / Tecnopolis (I)SASIAM / Tecnopolis (I)•• parallel and distributed systemsparallel and distributed systems•• complex systems modelingcomplex systems modeling•• image processing operators.image processing operators.

htt

ps:

//goo.g

l/w

Rlz

kZ

Page 4: On codes, machines, and environments: reflections and experiences

44

htt

p:/

/goo.g

l/PN

CVJy

Page 5: On codes, machines, and environments: reflections and experiences

55

CodeCode•• Code explicitly refers to a reference Code explicitly refers to a reference

machinemachine•• A physical or virtual machine A physical or virtual machine •• In fact, a In fact, a family family of "interpreters"of "interpreters"

•• Code also refers, implicitly, to a set Code also refers, implicitly, to a set of conditions: what we expect from of conditions: what we expect from the machine and what we expect the the machine and what we expect the environment will doenvironment will do•• The system model and the fault model.The system model and the fault model.

Page 6: On codes, machines, and environments: reflections and experiences

66

Codes, Machines, EnvironmentsCodes, Machines, Environments

•• First, code is deployed on a machine: First, code is deployed on a machine: CC →→ MM

•• Secondly, machine is deployed into Secondly, machine is deployed into an environment: (an environment: (CC, , MM) ) →→ EE

•• ((CC, , MM, , EE) produces a set of ) produces a set of behaviors: the "service"behaviors: the "service"

•• We observe those behaviors and give We observe those behaviors and give a measure of the service qualitya measure of the service quality•• Qualitatively or quantitatively.Qualitatively or quantitatively.

Page 7: On codes, machines, and environments: reflections and experiences

77

Quality of the serviceQuality of the service

•• What do we measure?What do we measure?•• We tell whether the service, e.g., isWe tell whether the service, e.g., is•• trustworthy; reliable; available; safe; trustworthy; reliable; available; safe;

secure; efficient; etcsecure; efficient; etc

•• Important issue: all dynamic Important issue: all dynamic properties!properties!•• Dynamic systems! trustworthiness(Dynamic systems! trustworthiness(tt), ),

safety(safety(tt), efficiency(), efficiency(tt), ...), ...

•• A A drift drift is possibleis possible•• Service mutates its characteristics.Service mutates its characteristics.

Page 8: On codes, machines, and environments: reflections and experiences

88

Quality in terms of Quality in terms of M M propertiesproperties

•• We can express QoS in terms of We can express QoS in terms of MM propertiesproperties

•• For instance: "the service shall For instance: "the service shall express an algorithmic parallelism express an algorithmic parallelism (AP) that is very close to the physical (AP) that is very close to the physical parallelism (PP) expressed by parallelism (PP) expressed by MM."."

•• Efficiency(Efficiency(tt) = inv.distance (AP, PP)) = inv.distance (AP, PP)•• Drift(Drift(tt) = how efficiency() = how efficiency(tt) varies ) varies

with with tt

Page 9: On codes, machines, and environments: reflections and experiences

99

Quality in terms of Quality in terms of E E propertiesproperties

•• We can express QoS in terms of We can express QoS in terms of EE properties tooproperties too

•• For instance: "the service must For instance: "the service must tolerate up to 2 physical or design tolerate up to 2 physical or design faults"faults"

•• Resilience(Resilience(tt) = a majority of ) = a majority of redundant modules can be found at redundant modules can be found at tt

•• Drift(Drift(tt) = how majority varies with ) = how majority varies with tt

Page 10: On codes, machines, and environments: reflections and experiences

1010

Quality driftQuality drift

•• What if we observe a significant drift?What if we observe a significant drift?•• Example 1:Example 1:•• (C (C →→ M M11) ) ⱵⱵ p p

•• C manifests property p on machine MC manifests property p on machine M11

•• (C (C →→ M M22) ) ⱵⱵ ~~p p •• On MOn M22, C , C does not!does not!

•• Example 2:Example 2:•• (C (C →→ M M11) ) ΛΛ M M11(s(s11) ) ⱵⱵ p p

•• When MWhen M11 is in state s is in state s11, then p, then p

•• (C (C →→ M M11) ) ΛΛ M M11(s(s22) ) ⱵⱵ ~~pp

Page 11: On codes, machines, and environments: reflections and experiences

1111

Drift strategiesDrift strategies•• Drift: due to failures; attacks; Drift: due to failures; attacks;

software aging...software aging...•• What can we do?What can we do?1)1)Focus on Focus on MM and, e.g., bring M and, e.g., bring M11(s(s22) )

back to M back to M11(s(s11) or to a new M) or to a new M11(s(s33))•• BW/FW error recoveryBW/FW error recovery

2)2)Focus on Focus on EE: impose: impose restrictions on restrictions on EE's behaviors's behaviors•• Regulations (e.g., safety regs)Regulations (e.g., safety regs)

3)3)Or focus on Or focus on CC: "correct" / : "correct" / transform my code transform my code

Page 12: On codes, machines, and environments: reflections and experiences

1212

Experience #1Experience #1

•• A software house develops through A software house develops through the years a large amount of code the years a large amount of code •• for a proprietary target machinefor a proprietary target machine•• using a proprietary programming using a proprietary programming

languagelanguage•• and a proprietary OSand a proprietary OS•• to be executed on proprietary terminals...to be executed on proprietary terminals...

•• Times changed. Machine/OS/... no Times changed. Machine/OS/... no more supported. more supported. What to doWhat to do??

Page 13: On codes, machines, and environments: reflections and experiences

1313

Experience #1 (continued)Experience #1 (continued)

•• A translator and a set of run-time A translator and a set of run-time librarieslibraries

•• Program transformation:Program transformation:f: (proprietary code) f: (proprietary code) →→ (standard C) (standard C)

•• Net result?Net result?

Page 14: On codes, machines, and environments: reflections and experiences

1414

Experience #1 (continued)Experience #1 (continued)•• Lots of problems!Lots of problems!•• Phase 1: "Code: perfectly running"Phase 1: "Code: perfectly running"•• Phase 2: "...yes but's" (many of them!)Phase 2: "...yes but's" (many of them!)•• Hidden relationships, undocumented Hidden relationships, undocumented

features, idiosyncrasies: features, idiosyncrasies: I want I want ‘‘em allem all..→→ Porting Porting CC does not port the service! does not port the service!•• A large number of A large number of MM- and - and EE-specific -specific

behaviors had to be emulatedbehaviors had to be emulated•• RoleRole: responsible for the design of several : responsible for the design of several

parts of the translators and for several parts of the translators and for several run-time functions (overall system was run-time functions (overall system was conceived / designed by someone else.)conceived / designed by someone else.)

Page 15: On codes, machines, and environments: reflections and experiences

1515

Experience #2Experience #2

•• f: (C + message passing) f: (C + message passing) →→ (C + live data structures)(C + live data structures)

•• In the DomainIn the Domain: scheduler distributes : scheduler distributes work units to workers and then work units to workers and then collects intermediate resultscollects intermediate results

•• In the RangeIn the Range::1.1.Tuple space of work unitsTuple space of work units2.2.Cloud of workers that autonomously Cloud of workers that autonomously

feed themselves according to their own feed themselves according to their own speed, and publish their results.speed, and publish their results.

Page 16: On codes, machines, and environments: reflections and experiences

1616

Experience #2 (continued)Experience #2 (continued)

•• Simple production system to match Simple production system to match tuple patterns with tuples elementstuple patterns with tuples elements

•• Emerging results: autonomic load Emerging results: autonomic load balancing; graceful degradation; balancing; graceful degradation; crash-failure tolerancecrash-failure tolerance

•• In practice, efficiency and reliabilityIn practice, efficiency and reliability

•• RoleRole: I conceived/designed the : I conceived/designed the system; system developed by two system; system developed by two M.Sc students that I promoted and M.Sc students that I promoted and supervised.supervised.

Page 17: On codes, machines, and environments: reflections and experiences

1717

Experience #3Experience #3•• Instead of translating Instead of translating CC, add a add a C'C'•• A software architecture supporting A software architecture supporting

two cooperating application layerstwo cooperating application layers–– A service language to express functional A service language to express functional

concernsconcerns–– A A recovery language recovery language to express to express

dependability strategiesdependability strategies•• Design time: separation of concernsDesign time: separation of concerns•• Run time: separable codesRun time: separable codes

•• Actions: similar to production rules: Actions: similar to production rules: nested IF/THEN/ELSE's.nested IF/THEN/ELSE's.

Page 18: On codes, machines, and environments: reflections and experiences

1818

Recoveryworkingmemory

Application Recoveryexecutive

ErrorDetection

StoreRecovery starts

Query

Skip / fireactions

Result

Recovery endsOK

Recovery languagesRecovery languages

Page 19: On codes, machines, and environments: reflections and experiences

1919

OptimizationsOptimizations

Recoveryworking memUser application Recovery code

C C'

Broker

Recovery code 2

•• Currently, all guards are re-evaluatedCurrently, all guards are re-evaluated•• Full re-evaluations could be avoided Full re-evaluations could be avoided

(maybe through Rete? RWM deltas...)(maybe through Rete? RWM deltas...)•• Separable code = meta-adaptationSeparable code = meta-adaptation

Page 20: On codes, machines, and environments: reflections and experiences

2020

Experience #3 (continued)Experience #3 (continued)

•• RoleRole: system conceived, designed, : system conceived, designed, implemented.implemented.

•• More information: "A Fault-Tolerance Linguistic More information: "A Fault-Tolerance Linguistic Structure for Distributed Applications", Ph.D. Structure for Distributed Applications", Ph.D. thesis, Oct. 2000, thesis, Oct. 2000, http://win.uantwerpen.be/~vincenz/theses/http://win.uantwerpen.be/~vincenz/theses/

•• "Transformer: an adaptation framework with "Transformer: an adaptation framework with contextual adaptation behavior composition contextual adaptation behavior composition support," Gui, N. and De Florio, V. Software: support," Gui, N. and De Florio, V. Software: Practice & Experience, Vol. 43, Issue 8, 2013.Practice & Experience, Vol. 43, Issue 8, 2013.

Page 21: On codes, machines, and environments: reflections and experiences

2121

StrategiesStrategies

•• Quality drifting can be managed in Quality drifting can be managed in several waysseveral ways

•• In what follows, three such waysIn what follows, three such ways

A.A.Do nothing!Do nothing!•• Service has complete Service has complete faith faith in in MM and and EE•• Formally: Formally: synchronous system model synchronous system model

andand empty fault model empty fault model..

Page 22: On codes, machines, and environments: reflections and experiences

2222

Pious software ;-)Pious software ;-)•• ""MM is immutable. Computation is is immutable. Computation is

dependable. Communication is dependable. Communication is dependable..."dependable..."

•• Facilitates development, though...Facilitates development, though...•• ...anything breaks the code (Maximum ...anything breaks the code (Maximum

fragility.)fragility.)•• Ataraxic code (Ataraxic code (ἀἀταραξίαταραξία "impassiveness") "impassiveness")

Sitting ducks

Page 23: On codes, machines, and environments: reflections and experiences

2323

Strategies B and CStrategies B and CB.B.Off-line adaptation (examples: Off-line adaptation (examples:

Experiences #1 and #2) Experiences #1 and #2)C.C.On-line adaptation (Exp. #3)On-line adaptation (Exp. #3)•• Two requirements: Two requirements:

1)1) CC must be must be context-awarecontext-aware2)2) CC must be able to must be able to autonomously autonomously react and adapt react and adapt after changes in after changes in both both MM and and EE

•• Corresponds to the two blocks of Corresponds to the two blocks of production systemsproduction systems

•• In what follows, focus on two services: In what follows, focus on two services: context awarenesscontext awareness; ; reactivityreactivity..

Sensorypreconditi-ons: LHSs

Actions: RHSs

Page 24: On codes, machines, and environments: reflections and experiences

2424

SS11: Context Awareness: Context Awareness

•• Goal: reify in the application layer Goal: reify in the application layer changes pertaining to the context and changes pertaining to the context and in particular to in particular to MM and and EE. .

•• How? Different waysHow? Different ways•• In what follows I briefly describe one In what follows I briefly describe one

answeranswer•• I chose it because it is related to one of I chose it because it is related to one of

my past experiences and to programming my past experiences and to programming languageslanguages

Page 25: On codes, machines, and environments: reflections and experiences

2525

Experience #4: reflective variablesExperience #4: reflective variables

•• Main idea: memory accesses as a Main idea: memory accesses as a metaphor for detecting changes (and metaphor for detecting changes (and reacting from changes)reacting from changes)

•• Reflective variables (RR vars) = Reflective variables (RR vars) = volatile variables associated to volatile variables associated to MM or or EE probes (e.g. sensors, RFID's, OS probes (e.g. sensors, RFID's, OS service...) that continuosly update the service...) that continuosly update the variablesvariables

•• Akin to signals in Elm: "Akin to signals in Elm: "values that values that change over timechange over time""

Page 26: On codes, machines, and environments: reflections and experiences

2626

ExampleExample

RRvars also support callbacks. Example:int PrintCpu(); rrparse("cpu>0);",PrintCpu);

ME

Page 27: On codes, machines, and environments: reflections and experiences

t

Page 28: On codes, machines, and environments: reflections and experiences

2828

Tracking CPU Tracking CPU and and mplayermplayer•• int mplayer returns the following int mplayer returns the following

values:values:

void SystemIsSlow(void) { void SystemIsSlow(void) {

mplayer = mplayer = HARDFRAMEDROPHARDFRAMEDROP;;}}

... ...rrparse("(cpu>98)&&(mplayer==2);", rrparse("(cpu>98)&&(mplayer==2);",

SystemIsSlow);SystemIsSlow);

By couplingan M fact with an E fact,I can deduce conditions

Page 29: On codes, machines, and environments: reflections and experiences

29

t

Page 30: On codes, machines, and environments: reflections and experiences

3030

Tracking users' behaviors too!Tracking users' behaviors too!

int ui is now == X

int ui is now == Y

HCI interactionactions arelogged...

...transcoded......analyzed...

...and reified...

Page 31: On codes, machines, and environments: reflections and experiences

3131

Janus systemJanus system

RR client mplayer ui RR client mplayer ui

Page 32: On codes, machines, and environments: reflections and experiences

3232

Currently, simple analysesCurrently, simple analyses•• Typing frequency as simple user Typing frequency as simple user

stereotypestereotype•• Too high a frequency Too high a frequency ⇾⇾ discomfort discomfort•• (cf. Therac-25 accidents...)(cf. Therac-25 accidents...)

Page 33: On codes, machines, and environments: reflections and experiences

Another exampleAnother example•• int linkbeacons int linkbeacons

[ [««MAC addressMAC address»»] :] :–– Number of beacons Number of beacons

received by received by MANET peer MANET peer during observation during observation periodperiod

–– int linkratesint linkrates [ [««MAC addressMAC address»»] :] :

–– Estimated Estimated bandwidthbandwidth

Page 34: On codes, machines, and environments: reflections and experiences

3434

Experience #4 (continued)Experience #4 (continued)•• RRvars: conceived / designed / RRvars: conceived / designed /

implemented by meimplemented by me•• including instrumenting mplayerincluding instrumenting mplayer•• including simple TCL/TK user interfaceincluding simple TCL/TK user interface

•• More information: More information: •• "A framework for trustworthiness "A framework for trustworthiness

assessment based on fidelity in cyber assessment based on fidelity in cyber and physical domains," and physical domains," https://arxiv.org/abs/1502.01899https://arxiv.org/abs/1502.01899

•• "Safety enhancement through situation-"Safety enhancement through situation-aware user interfaces," aware user interfaces," https://arxiv.org/abs/1504.03731https://arxiv.org/abs/1504.03731

Page 35: On codes, machines, and environments: reflections and experiences

3535

SS22: Reactive Behaviors: Reactive Behaviors

•• How to react to context changes? In How to react to context changes? In different ways. different ways.

•• Two major methods: mask changes / Two major methods: mask changes / tolerate changes:tolerate changes:A.A. elasticity elasticityB.B. resilience resilience

•• Elasticity requires an estimation of a Elasticity requires an estimation of a worst-case scenario.worst-case scenario.

Page 36: On codes, machines, and environments: reflections and experiences

3636

SS2A2A: Elastic strategy: Elastic strategy

•• The worst case scenario is used to The worst case scenario is used to define a point of yieldingdefine a point of yielding

•• Some algorithm is then used to Some algorithm is then used to implement the point of yieldingimplement the point of yielding

•• Cf. information theory; ShannonCf. information theory; Shannon•• Typical algorithm: modular Typical algorithm: modular

redundancy + votingredundancy + voting

Page 37: On codes, machines, and environments: reflections and experiences

3737

ExampleExample

•• Worst case scenario = "At most one Worst case scenario = "At most one disturbance per processing stage"disturbance per processing stage"

•• Yielding point: single disturbance.Yielding point: single disturbance.•• Algorithm:Algorithm:•• triplicate objectstriplicate objects•• write: multiplex to each replicawrite: multiplex to each replica•• read: demultiplex via majority votingread: demultiplex via majority voting

•• "Redundant data structures""Redundant data structures"

Page 38: On codes, machines, and environments: reflections and experiences

3838

Elasticity: intrinsic limitationsElasticity: intrinsic limitations

•• Two "syndromes":Two "syndromes":•• Undershooting (US): Worst case Undershooting (US): Worst case

hypothesis is hypothesis is wrongwrong..•• Overshooting (OS): Worst case Overshooting (OS): Worst case

hypothesis is correct, though it hypothesis is correct, though it wastes too many resourceswastes too many resources

•• I will illustrate US and OS through an I will illustrate US and OS through an exampleexample

Page 39: On codes, machines, and environments: reflections and experiences

aRDS: redundant data structuresaRDS: redundant data structures•• Three threads:Three threads:

scrambler + aRDS + reader scrambler + aRDS + reader

1.1.scrambler: fault injection interpreterscrambler: fault injection interpreter2.2.aRDS: aRDS: ““protectsprotects”” 20,000 4-byte 20,000 4-byte

variablesvariables–– Fixed allocation stride = 20 Fixed allocation stride = 20

3.3.reader: round-robin read accessesreader: round-robin read accesses

•• Experiments recordExperiments record–– number of scrambled cells number of scrambled cells –– number of read failuresnumber of read failures

Page 40: On codes, machines, and environments: reflections and experiences

Scrambler's Scrambler's ““little languagelittle language””

Page 41: On codes, machines, and environments: reflections and experiences

Case #1: undershootingCase #1: undershooting

Page 42: On codes, machines, and environments: reflections and experiences

Case #2: overshootingCase #2: overshooting

Page 43: On codes, machines, and environments: reflections and experiences

4343

SS2B2B: Resilient strategy: Resilient strategy

•• Point of yielding = dynamic systemPoint of yielding = dynamic system•• Employed redundancy = Employed redundancy =

f (estimated risk of yielding)f (estimated risk of yielding)•• DTOF: distance to failure.DTOF: distance to failure.

Page 44: On codes, machines, and environments: reflections and experiences

DTOF=DTOF=Indirect deduction of riskIndirect deduction of risk

OS = 6 OS = 4

OS = 2 OS =0 US!

DTOF = OS / (n-1)

Page 45: On codes, machines, and environments: reflections and experiences

Case #3: DTOF, Case #3: DTOF, nn(0)=5(0)=5

Page 46: On codes, machines, and environments: reflections and experiences

Redundancy evolutionRedundancy evolution

t

Red

unda

ncy

Page 47: On codes, machines, and environments: reflections and experiences

47

Hypothesis about Hypothesis about EE::a dynamic systema dynamic system

Page 48: On codes, machines, and environments: reflections and experiences

4848

ConclusionsConclusions

•• Quality of "service": Quality of "service": f f ((CC, , MM, , EE))•• A complex problem of intertwined A complex problem of intertwined

behaviors!behaviors!•• Application layer(s)Application layer(s)•• Metaprograms/protocolsMetaprograms/protocols•• CompilersCompilers•• OSOS•• HWHW

•• Stigmergy complicates solutionsStigmergy complicates solutions

Environment

Page 49: On codes, machines, and environments: reflections and experiences

4949

ConclusionsConclusions•• How to deal with this complex How to deal with this complex

problem?problem?•• My hypothesis: Game TheoryMy hypothesis: Game Theory•• MM entities and entities and EE: GT players: GT players•• Energy budgets shared by Energy budgets shared by MM entities entities•• GT payoffs associated to behaviorsGT payoffs associated to behaviors•• Nested compositional hierarchies of Nested compositional hierarchies of

payoff matricespayoff matrices•• Interconnected and mutually influencing Interconnected and mutually influencing

payoff payoff ““spreadsheetsspreadsheets”” (cf. reactive prog.) (cf. reactive prog.)•• Future research action: "Resilience as Future research action: "Resilience as

concurrent interplays of opponents", concurrent interplays of opponents", https://goo.gl/Mz8foA + https://goo.gl/Mz8foA + antifragilityantifragility

Page 50: On codes, machines, and environments: reflections and experiences

5050

Further detailFurther detail•• System/fault modelsSystem/fault models: ": "Application-Application-

layer fault-tolerance protocolslayer fault-tolerance protocols",",https://bit.ly/1WNJj6V https://bit.ly/1WNJj6V

•• DriftDrift: Antifragility = ": Antifragility = "Elasticity +Elasticity +Resilience + Machine Learning: ModelsResilience + Machine Learning: Modelsand Algorithms for Open System Fidelityand Algorithms for Open System Fidelity", ", http://goo.gl/rdwMQH; "http://goo.gl/rdwMQH; "A Framework for A Framework for Trustworthiness Assessment based on Fidelity in Trustworthiness Assessment based on Fidelity in Cyber and Physical DomainsCyber and Physical Domains", http://goo.gl/fsYxqT", http://goo.gl/fsYxqT

•• ResilienceResilience: ": "On Resilient Behaviors in Computational On Resilient Behaviors in Computational Systems and EnvironmentsSystems and Environments", http://goo.gl/3eU12a; ", http://goo.gl/3eU12a; ""On environments as systemic exoskeletons: On environments as systemic exoskeletons: Crosscutting optimizers and antifragility enablersCrosscutting optimizers and antifragility enablers", ", http://goo.gl/82RsKw http://goo.gl/82RsKw

Page 51: On codes, machines, and environments: reflections and experiences

5151

Thanks for your Thanks for your attention!attention!

Questions?Questions?