specification and verification of actor · 2019-06-26 · specification and verification of...

655

Upload: others

Post on 29-Feb-2020

47 views

Category:

Documents


0 download

TRANSCRIPT

  • Specification and Verification of ActorProtocols with Finite-State Machines

    Jonathan Schuster

    May 6, 2019

    Submitted in partial fulfillment of the requirements for the degreeof Doctor of Philosophy

    College of Computer and Information ScienceNortheastern UniversityBoston, Massachusetts

  • ii

    Specification and Verification of Actor Protocols with Finite-State MachinesCopyright © 2019 Jonathan Schuster

    This dissertation was typeset with the standard book class for LATEX, using theNew Century Schoolbook typeface for normal text, the Fourier typeface for math,and the Bera Mono typeface for source code.

  • Abstract

    Many programmers use the actor model to build distributed systems. The com-munication aspects of such systems are notoriously hard to implement correctly,however, leading programmers to spend more time debugging protocol implemen-tations and less time focusing on application logic. Furthermore, the commonapproach of specifying a protocol as a finite-state machine and verifying that theprogram implements this protocol is insufficient, because standard FSMs do notaccount for the dynamic, evolving communication topologies in actor programs.

    To address this problem, this dissertation defines a specification languagethat augments finite-state machines with the ability to describe address-passingaspects of actor protocols. Additionally, the dissertation develops a series of prooftechniques for such specifications, as well as a model-checking algorithm thatverifies whether a program conforms to its specification. When applied to realis-tic actor programs and specifications, the model checker can both detect protocol-violating bugs and prove conformance in a reasonable amount of time.

    iii

  • iv ABSTRACT

  • Acknowledgments

    A dissertation may be written by just one person, but no one earns a Ph.D. with-out lots of help from their colleagues and loved ones. As a result, I have manypeople to thank for helping me get this far, starting with my thesis committee:

    • My advisor Olin Shivers has overseen my growth as a researcher ever sincemy early days at Northeastern. Over the course of many whiteboard chatsthat included complicated diagrams, hand-waving explanations, and Olintelling me “I didn’t understand that: explain that to me again”, I learnedhow to pare my ideas down to their essence and clearly explain the mainpoints. He also constantly encouraged me along the way, especially on thedays where I doubted I was good enough to be a “real” researcher. Finally,Olin has always been quick to remind me to spend quality time with familyand friends and not let work take over my life.

    • Matthias Felleisen introduced me to the formal study of programming lan-guages when he taught the Ph.D.-level course during my first semester atNortheastern. Two years later when Olin went on sabbatical, Matthiasacted as a substitute advisor for me and was instrumental in helping mefind a research project when I had been spinning my wheels for the previ-ous year. Along with Stephen Chang, he helped me turn that project intomy first paper, which became the basis for this dissertation. I’ve also hadseveral chats with Matthias over the years about grad school and the sortof career I’d like to have, and I am thankful for all of his advice.

    • Amal Ahmed and I didn’t have many chances to work together directly, butshe was nevertheless a critical part of my grad-school experience. When-ever I needed someone in the department to discuss the difficulties I wasfacing, Amal was there to lend an empathetic ear, and I always came awayfrom our meetings feeling better about my situation. Although I may nothave worked with her on technical matters as much as I did with my othercommittee members, Amal’s support and advice were equally important tomy success in grad school.

    • Jon Rossie was initially my manager at Cisco during a summer intern-ship after my first year of grad school, where I was exposed to the kindsof fascinating ideas that can come from applying academic techniques to

    v

  • vi ACKNOWLEDGMENTS

    real-world problems. Jon helped guide my research in my first few yearsafter that internship, helping me staying grounded in problems relevantto industry programmers. We also had a number of fun calls and emailstrading interesting research papers—a practice I hope to carry on with himand others as I transition into an industry career.

    Mitch Wand, while not on my thesis committee, was extraordinarily generouswith his time to help with my dissertation. Reading through my work line-by-linewith him helped me become a much better mathematician and taught me how topresent math in a rigorous way. Furthermore, while I may have thought I was adecent writer before, Mitch’s feedback improved my skills immensely.

    Cisco as a whole and Jon’s team in particular deserve my thanks for tworeasons. First, they funded me for several years of my Ph.D. studies. Second, theresearch in this dissertation was inspired by some of the work the team is doingand problems they have run into, so none of this would exist without that project.Thank you to all of the project team members for creating a fun space to work inand for giving me a chance to contribute in my own small way.

    Past and present members of Northeastern’s Programming Research Labhave had a major impact on my time in grad school. Much of what I learnedabout research and programming languages came not from reading assigned pa-pers or working on my research, but from conversations with my fellow gradstudents. Additionally, although getting a Ph.D. can be a stressful time for justabout anyone who attempts it, I was privileged to work with a group of colleagueswho support one another through all of the highs and lows. I should make specialmention of Stephen Chang, who as previously mentioned helped me work out theinitial ideas of this dissertation. I would also be remiss if I didn’t single out BenLerner. Ben has been there for the best and worst moments of my grad-schoolexperience, but through it all he has constantly encouraged me to press on, andhe has become a good friend in the process.

    Outside of the research world, my parents Greg and Diane worked hardthroughout their lives to ensure that my sisters and I had access to a top-qualityeducation. They also taught me to have the kind of do-it-yourself attitude that isso essential for independent research (although I’m sure they would be happy totell you stories of when I was young and a little too eager to be helpful). Theirencouraging words and advice have been a constant comfort over the last eightyears. My sisters Kate and Lauren and my in-laws in the Wagner family havecheered me along in the process, as well.

    Finally, there is my wife Claire, the love of my life. Meeting her early inmy grad-school experience has made the journey a thousand times better, andI am incredibly lucky that she agreed to spend her life with me. She has beenan unwavering source of support, a wise counselor, a role model for work ethic,a smiling face at the end of a long day, and my number-one fan. Her love andsupport mean the world to me, and I can only hope to be as wonderful a partnerto her over the many years we will share together as she has been to me.

    Thank you to one and all. I wouldn’t be here without you.

    — Jonathan Schuster

  • Contents

    Abstract iii

    Acknowledgments v

    1 Introduction 11.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 The Problem and My Thesis. . . . . . . . . . . . . . . . . . . . . . . 11.3 The Current Landscape . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Structure of the Dissertation . . . . . . . . . . . . . . . . . . . . . . 5

    2 CSA: Actors as Finite-State Machines 72.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2 Syntax and Informal Semantics . . . . . . . . . . . . . . . . . . . . 82.3 Example: Stream Processing . . . . . . . . . . . . . . . . . . . . . . 122.4 Type System. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.5 Formal Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

    3 APS: Actor Protocols as Finite-State Machines 333.1 Syntax and Intuitive Semantics . . . . . . . . . . . . . . . . . . . . 343.2 Extended Example: Stream-Processing . . . . . . . . . . . . . . . . 463.3 APS Strengths and Weaknesses . . . . . . . . . . . . . . . . . . . . 493.4 History Markers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503.5 Marked Transition Semantics for CSA . . . . . . . . . . . . . . . . 573.6 Marked Transition Semantics for APS . . . . . . . . . . . . . . . . 643.7 Conformance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 743.8 Refinements to Conformance . . . . . . . . . . . . . . . . . . . . . . 773.9 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

    4 A Manual Conformance Proof 83

    5 Abstracting CSA 935.1 Overview of the Abstract Interpretation. . . . . . . . . . . . . . . . 945.2 Abstract Interpretation for Programs . . . . . . . . . . . . . . . . . 955.3 Abstract Transitions for PSMs . . . . . . . . . . . . . . . . . . . . . 118

    vii

  • viii CONTENTS

    5.4 Abstract Conformance . . . . . . . . . . . . . . . . . . . . . . . . . . 1205.5 Summary Conformance . . . . . . . . . . . . . . . . . . . . . . . . . 1215.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

    6 Transformation Conformance 1276.1 Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1276.2 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1296.3 Conformance as a Verification Game. . . . . . . . . . . . . . . . . . 1306.4 Split . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1386.5 Unmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1396.6 Assimilate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1416.7 Canonicalize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

    7 A Model-Checking Algorithm for APS 1477.1 Finding a Simulation Relation . . . . . . . . . . . . . . . . . . . . . 1477.2 ModelCheck . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1517.3 Explore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1537.4 MatchingSpecSteps. . . . . . . . . . . . . . . . . . . . . . . . . . . . 1567.5 Prune . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1647.6 FindFulfillingPairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1667.7 Correctness and Termination . . . . . . . . . . . . . . . . . . . . . . 1697.8 Tunable Heuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1707.9 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

    8 Optimizations 1738.1 Acceleration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1738.2 Eviction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1828.3 Dead-Marker Detection . . . . . . . . . . . . . . . . . . . . . . . . . 1868.4 Order of Optimizations. . . . . . . . . . . . . . . . . . . . . . . . . . 1878.5 Memoization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

    9 Evaluation 1899.1 Evaluated Programs and Specifications . . . . . . . . . . . . . . . . 1909.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 1989.3 Effectiveness Evaluation Results . . . . . . . . . . . . . . . . . . . . 1999.4 Performance Evaluation Results . . . . . . . . . . . . . . . . . . . . 2049.5 Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2079.6 Future Performance Work . . . . . . . . . . . . . . . . . . . . . . . . 209

    10 Conclusion 21110.1 Summing Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21110.2 Future Work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212

    A Proof of the Maximal Instantiation Theorem 227

  • CONTENTS ix

    B Externals-Only Conformance 231B.1 Definitions for Proofs. . . . . . . . . . . . . . . . . . . . . . . . . . . 232B.2 EO Input-Pattern-Matching Lemma . . . . . . . . . . . . . . . . . . 235B.3 EO Output-Pattern-Matching Lemma . . . . . . . . . . . . . . . . . 236B.4 Externals-Only PSM Input Lemma . . . . . . . . . . . . . . . . . . 236B.5 Externals-Only Specification Input Lemma. . . . . . . . . . . . . . 238B.6 Externals-Only PSM Output Lemma . . . . . . . . . . . . . . . . . 239B.7 Externals-Only Specification Output Lemma. . . . . . . . . . . . . 240B.8 Externals-Only Silent Step Lemma . . . . . . . . . . . . . . . . . . 241B.9 EO Specification Simulation Lemma. . . . . . . . . . . . . . . . . . 241B.10 Externals-Only Simulation Lemma . . . . . . . . . . . . . . . . . . 242B.11 Externals-Only Conformance Theorem . . . . . . . . . . . . . . . . 249

    C External-Representative Conformance 253C.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255C.2 External-Representative Message Lemma . . . . . . . . . . . . . . 256C.3 ER Receptionist Lemma . . . . . . . . . . . . . . . . . . . . . . . . . 256C.4 External-Representative Simulation Lemma . . . . . . . . . . . . . 257C.5 ER Specification Simulation Lemma. . . . . . . . . . . . . . . . . . 258C.6 External-Representative Conformance Theorem. . . . . . . . . . . 259

    D Single-Handler Conformance 261D.1 Definitions for the Proof . . . . . . . . . . . . . . . . . . . . . . . . . 263D.2 Handler Step Commutativity Lemma . . . . . . . . . . . . . . . . . 263D.3 Program Fair Suffix Lemma. . . . . . . . . . . . . . . . . . . . . . . 264D.4 Handler Termination Lemma . . . . . . . . . . . . . . . . . . . . . . 265D.5 PSM Output Commutativity Lemma . . . . . . . . . . . . . . . . . 266D.6 Specification Output Commutativity Lemma . . . . . . . . . . . . . 269D.7 Specification Step Commutativity Lemma . . . . . . . . . . . . . . 269D.8 Specification Fair Suffix Lemma . . . . . . . . . . . . . . . . . . . . 270D.9 Single-Handler Isomorphism Lemma . . . . . . . . . . . . . . . . . 271D.10 Single-Handler Prefix Lemma . . . . . . . . . . . . . . . . . . . . . 272D.11 Handler Continuation Lemma . . . . . . . . . . . . . . . . . . . . . 272D.12 Rearrange Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276D.13 Unrearrange Lemma. . . . . . . . . . . . . . . . . . . . . . . . . . . 278D.14 Single-Handler Conformance Theorem . . . . . . . . . . . . . . . . 281

    E Deterministic-Handler Conformance 283

    F Event-Step Conformance 289

    G PSM Conformance 293G.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294G.2 Concrete Specification Well-Formed Preservation . . . . . . . . . . 294G.3 Concrete Distinct-Marker Preservation . . . . . . . . . . . . . . . . 294G.4 PSM Conformance Theorem. . . . . . . . . . . . . . . . . . . . . . . 295

  • x CONTENTS

    H Type System for Abstract CSA 299H.1 Abstract Program Configurations . . . . . . . . . . . . . . . . . . . 299H.2 Abstract Behaviors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300H.3 Abstract State Definitions . . . . . . . . . . . . . . . . . . . . . . . . 300H.4 Abstract Timeout Clauses . . . . . . . . . . . . . . . . . . . . . . . . 300H.5 Abstract Expressions. . . . . . . . . . . . . . . . . . . . . . . . . . . 300H.6 Labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302H.7 Type Depth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302

    I Proofs for Abstract Conformance 305I.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305I.2 Value Approximation Lemma . . . . . . . . . . . . . . . . . . . . . . 309I.3 Abstract Substitution Lemma. . . . . . . . . . . . . . . . . . . . . . 309I.4 Abstract Context Lemma . . . . . . . . . . . . . . . . . . . . . . . . 310I.5 Well-Formed Preservation Lemma . . . . . . . . . . . . . . . . . . . 310I.6 Abstract Well-Formed Preservation Lemma . . . . . . . . . . . . . 311I.7 Extra Markers Lemma. . . . . . . . . . . . . . . . . . . . . . . . . . 311I.8 Deterministic Marking Lemma . . . . . . . . . . . . . . . . . . . . . 312I.9 Marker Soundness Lemma . . . . . . . . . . . . . . . . . . . . . . . 312I.10 New Behavior Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . 314I.11 Replaced Behavior Lemma . . . . . . . . . . . . . . . . . . . . . . . 316I.12 Maximal Value Lemma . . . . . . . . . . . . . . . . . . . . . . . . . 318I.13 Well-Typed Maximal Value Lemma . . . . . . . . . . . . . . . . . . 318I.14 Internal Address Types Lemma . . . . . . . . . . . . . . . . . . . . 319I.15 Functional-Step Soundness Lemma . . . . . . . . . . . . . . . . . . 319I.16 Merge Unreachability Lemma . . . . . . . . . . . . . . . . . . . . . 319I.17 Mergeability Preservation Lemma . . . . . . . . . . . . . . . . . . . 319I.18 Quasi-Commutativity Theorem. . . . . . . . . . . . . . . . . . . . . 320I.19 Approximation Mergeability Lemma. . . . . . . . . . . . . . . . . . 321I.20 Merge Argument Lemma . . . . . . . . . . . . . . . . . . . . . . . . 322I.21 Merge Result Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . 322I.22 Multiple Merge Result Lemma . . . . . . . . . . . . . . . . . . . . . 323I.23 Fully Merged Preservation Lemma . . . . . . . . . . . . . . . . . . 323I.24 Message-Addition Soundness Lemma . . . . . . . . . . . . . . . . . 324I.25 Soundness of Abstract CSA Lemma . . . . . . . . . . . . . . . . . . 328I.26 Soundness of Event Steps Lemma . . . . . . . . . . . . . . . . . . . 347I.27 Event-Step Execution Soundness Lemma. . . . . . . . . . . . . . . 349I.28 Soundness of Fair Executions Lemma . . . . . . . . . . . . . . . . . 351I.29 Monitored Correspondents Lemma . . . . . . . . . . . . . . . . . . 352I.30 Input Pattern Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . 353I.31 Output Pattern Lemma . . . . . . . . . . . . . . . . . . . . . . . . . 353I.32 Label Erasure Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . 355I.33 PSM Completeness Lemma . . . . . . . . . . . . . . . . . . . . . . . 356I.34 Configuration Completeness Lemma. . . . . . . . . . . . . . . . . . 358I.35 Expression Reflexivity Lemma . . . . . . . . . . . . . . . . . . . . . 362I.36 Message-Map Reflexivity Lemma . . . . . . . . . . . . . . . . . . . 363

  • CONTENTS xi

    I.37 Approximation Reflexivity Lemma . . . . . . . . . . . . . . . . . . . 363I.38 Expression Transitivity Lemma . . . . . . . . . . . . . . . . . . . . 364I.39 Message-Map Transitivity Lemma . . . . . . . . . . . . . . . . . . . 364I.40 Approximation Transitivity Lemma . . . . . . . . . . . . . . . . . . 365I.41 Externals-Only Preservation Lemma . . . . . . . . . . . . . . . . . 365I.42 Abstract Externals-Only Preservation Lemma . . . . . . . . . . . . 367I.43 Single-Handler Preservation Lemma . . . . . . . . . . . . . . . . . 367I.44 Abstract Conformance Theorem . . . . . . . . . . . . . . . . . . . . 369I.45 Summary Conformance Theorem. . . . . . . . . . . . . . . . . . . . 372

    J Type Preservation Proofs 375J.1 Substitution Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . 375J.2 Instantiation Type Preservation Lemma . . . . . . . . . . . . . . . 376J.3 Markings Type Preservation Lemma . . . . . . . . . . . . . . . . . 377J.4 Type Inversion Lemma . . . . . . . . . . . . . . . . . . . . . . . . . 377J.5 Context Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378J.6 IntAddrTypes Correctness Lemma . . . . . . . . . . . . . . . . . . . 378J.7 Functional Step Type Preservation Lemma. . . . . . . . . . . . . . 378J.8 Type Preservation Lemma . . . . . . . . . . . . . . . . . . . . . . . 379J.9 Abstraction Type Preservation . . . . . . . . . . . . . . . . . . . . . 383J.10 Subtype Inversion Lemma 1 . . . . . . . . . . . . . . . . . . . . . . 383J.11 Subtype Inversion Lemma 2 . . . . . . . . . . . . . . . . . . . . . . 383J.12 Abstract Canonical Forms Lemma . . . . . . . . . . . . . . . . . . . 384J.13 Abstract Type Inversion Lemma . . . . . . . . . . . . . . . . . . . . 385J.14 Typed Value Depth Lemma . . . . . . . . . . . . . . . . . . . . . . . 386J.15 Merge Type Preservation Lemma . . . . . . . . . . . . . . . . . . . 387J.16 Message Addition Type Preservation Lemma . . . . . . . . . . . . 387J.17 Abstract Typed Substitution Lemma . . . . . . . . . . . . . . . . . 388J.18 IntAddrTypes Depth Lemma . . . . . . . . . . . . . . . . . . . . . . 388J.19 Abstract Functional Step Type Preservation Lemma . . . . . . . . 389J.20 Abstract Type Preservation Lemma . . . . . . . . . . . . . . . . . . 390

    K Proofs for Transformation Conformance 395K.1 Miscellaneous Definitions . . . . . . . . . . . . . . . . . . . . . . . . 395K.2 Conformance Reflection . . . . . . . . . . . . . . . . . . . . . . . . . 397K.3 Definition of SimExecs . . . . . . . . . . . . . . . . . . . . . . . . . . 401K.4 Fair Suffix Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403K.5 Specification Well-Formed Preservation . . . . . . . . . . . . . . . . 404K.6 Empty Matchable Output Lemma . . . . . . . . . . . . . . . . . . . 406K.7 Externals-Only Label Lemma. . . . . . . . . . . . . . . . . . . . . . 407K.8 Monitored Matchable Markers Lemma . . . . . . . . . . . . . . . . 410K.9 Used Marker Lemma. . . . . . . . . . . . . . . . . . . . . . . . . . . 411K.10 Used/Monitored Marker Lemma . . . . . . . . . . . . . . . . . . . . 412K.11 Distinct-Marker Silent Preservation Lemma . . . . . . . . . . . . . 413K.12 Distinct-Marker Preservation Lemma . . . . . . . . . . . . . . . . . 414K.13 Silent-Step Partition Lemma . . . . . . . . . . . . . . . . . . . . . . 418

  • xii CONTENTS

    K.14 Communication-Step Partition Lemma . . . . . . . . . . . . . . . . 419K.15 Weak-Silent-Step Partition Lemma . . . . . . . . . . . . . . . . . . 420K.16 Specification Weak-Step Partition Lemma . . . . . . . . . . . . . . 421K.17 Matchable Labels Lemma . . . . . . . . . . . . . . . . . . . . . . . . 423K.18 Fresh Matchables Lemma . . . . . . . . . . . . . . . . . . . . . . . . 424K.19 Distinct Matchables Lemma . . . . . . . . . . . . . . . . . . . . . . 424K.20 Specification Summary Partition Lemma . . . . . . . . . . . . . . . 426K.21 Monitored Marker Permanence Lemma . . . . . . . . . . . . . . . . 431K.22 Post-Fulfillment Monitoring Lemma. . . . . . . . . . . . . . . . . . 432K.23 Fair Specification Suffix Lemma . . . . . . . . . . . . . . . . . . . . 433K.24 SimExecs Simulation Lemma . . . . . . . . . . . . . . . . . . . . . . 435K.25 SimExecs Non-Emptiness Lemma . . . . . . . . . . . . . . . . . . . 440K.26 SimExecs Determinism Lemma . . . . . . . . . . . . . . . . . . . . 446K.27 SimExecs Common Prefix Lemma . . . . . . . . . . . . . . . . . . . 447K.28 SimExecs Fulfillment Lemma. . . . . . . . . . . . . . . . . . . . . . 450K.29 Transformation Conformance Theorem . . . . . . . . . . . . . . . . 455

    L Conformance-Reflection Proofs 459L.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459L.2 No Monitored Markers Lemma . . . . . . . . . . . . . . . . . . . . . 461L.3 No Silent Transitions Lemma. . . . . . . . . . . . . . . . . . . . . . 462L.4 Distinct Monitored Markers Lemma . . . . . . . . . . . . . . . . . . 464L.5 Summary Synchronization Lemma . . . . . . . . . . . . . . . . . . 466L.6 Split Silent Transitions Lemma . . . . . . . . . . . . . . . . . . . . 468L.7 Split Silent Weak-Step Transitions Lemma. . . . . . . . . . . . . . 472L.8 Split Transitions Lemma . . . . . . . . . . . . . . . . . . . . . . . . 475L.9 Split Weak-Step Transitions Lemma. . . . . . . . . . . . . . . . . . 479L.10 Split Event-Step Transitions Lemma . . . . . . . . . . . . . . . . . 480L.11 Split Summary Transitions Lemma . . . . . . . . . . . . . . . . . . 483L.12 Split Conformance Reflection Theorem . . . . . . . . . . . . . . . . 485L.13 Remap Well-Formed Preservation Lemma . . . . . . . . . . . . . . 491L.14 Remap Externals-Only Preservation Lemma . . . . . . . . . . . . . 492L.15 Abstract Single-Handler Preservation Lemma . . . . . . . . . . . . 492L.16 Abstract Extra Markers Lemma . . . . . . . . . . . . . . . . . . . . 493L.17 Abstract Marker Soundness Lemma . . . . . . . . . . . . . . . . . . 494L.18 Approximation Substitution Lemma. . . . . . . . . . . . . . . . . . 494L.19 Abstract Internal Address Types Lemma . . . . . . . . . . . . . . . 495L.20 Abstract Functional-Step Soundness Lemma. . . . . . . . . . . . . 496L.21 Approximation Soundness Lemma . . . . . . . . . . . . . . . . . . . 496L.22 Event-Step Approximation Soundness Lemma. . . . . . . . . . . . 515L.23 Fair Execution Approximation Soundness Lemma. . . . . . . . . . 517L.24 Label Sequence Construction Lemma . . . . . . . . . . . . . . . . . 519L.25 Abstract Configuration Completeness Lemma . . . . . . . . . . . . 523L.26 Approximating Transformation Lemma . . . . . . . . . . . . . . . . 527L.27 Fully Merged Expansion Lemma . . . . . . . . . . . . . . . . . . . . 534L.28 Remap Approximation Lemma . . . . . . . . . . . . . . . . . . . . . 535

  • CONTENTS xiii

    L.29 Remap Single-Message Reflection Lemma . . . . . . . . . . . . . . 536L.30 Unmark Conformance Reflection Theorem . . . . . . . . . . . . . . 536L.31 Assimilate Conformance Reflection Theorem . . . . . . . . . . . . . 537L.32 Canonicalize Conformance Reflection Theorem . . . . . . . . . . . 538L.33 TryTrans Well-Formed Preservation Lemma . . . . . . . . . . . . . 539L.34 Accelerate Conformance Reflection Theorem . . . . . . . . . . . . . 539L.35 Definitions for Evict Proofs . . . . . . . . . . . . . . . . . . . . . . . 541L.36 Eviction Skip Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . 544L.37 Eviction Simulation Lemma. . . . . . . . . . . . . . . . . . . . . . . 546L.38 Eviction Specification Skip Lemma . . . . . . . . . . . . . . . . . . 549L.39 Eviction Specification Simulation Lemma. . . . . . . . . . . . . . . 551L.40 Evictability Preservation Lemma . . . . . . . . . . . . . . . . . . . 552L.41 Eviction Synchronization Lemma . . . . . . . . . . . . . . . . . . . 553L.42 Eviction Disabled Actor Lemma . . . . . . . . . . . . . . . . . . . . 554L.43 Eviction Running Actor Lemma . . . . . . . . . . . . . . . . . . . . 555L.44 Eviction Message Receive Lemma . . . . . . . . . . . . . . . . . . . 556L.45 SimulateUnevicted Simulation Lemma . . . . . . . . . . . . . . . . 557L.46 Eviction Fulfillment Lemma . . . . . . . . . . . . . . . . . . . . . . 558L.47 SimulateUnevicted Synchronization Lemma . . . . . . . . . . . . . 559L.48 Evict Conformance Reflection Theorem . . . . . . . . . . . . . . . . 560L.49 Detect Conformance Reflection Theorem . . . . . . . . . . . . . . . 565L.50 Remap Approximation Composition Lemma . . . . . . . . . . . . . 569L.51 Composition Conformance Reflection Theorem. . . . . . . . . . . . 570

    M Correctness Proof for ModelCheck 577M.1 PsmSimluateOutput Correctness Lemma . . . . . . . . . . . . . . . 577M.2 SimulateOutput Correctness Lemma . . . . . . . . . . . . . . . . . 578M.3 SimulateOutputs Correctness Lemma . . . . . . . . . . . . . . . . . 579M.4 MatchingSpecSteps Correctness Lemma . . . . . . . . . . . . . . . 581M.5 Explore Correctness Lemma . . . . . . . . . . . . . . . . . . . . . . 581M.6 Prune Correctness Lemma . . . . . . . . . . . . . . . . . . . . . . . 582M.7 FindFulfillingPairs Correctness Lemma . . . . . . . . . . . . . . . 583M.8 ModelCheck Transformation Conformance Theorem . . . . . . . . 584

    N Termination Proof for ModelCheck 587N.1 Definitions for Termination Proofs . . . . . . . . . . . . . . . . . . . 587N.2 Remap Origin Preservation Lemma . . . . . . . . . . . . . . . . . . 590N.3 Remap Message Type Preservation Lemma . . . . . . . . . . . . . 591N.4 Remap Type Preservation Lemma . . . . . . . . . . . . . . . . . . . 591N.5 Core Transformation Termination Theorem . . . . . . . . . . . . . 593N.6 Distinct Spawns Lemma. . . . . . . . . . . . . . . . . . . . . . . . . 595N.7 TryTrans Termination Lemma . . . . . . . . . . . . . . . . . . . . . 596N.8 Accelerate Termination Theorem . . . . . . . . . . . . . . . . . . . . 598N.9 Evict Termination Theorem . . . . . . . . . . . . . . . . . . . . . . . 600N.10 Detect Termination Theorem . . . . . . . . . . . . . . . . . . . . . . 603N.11 Label Depth Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . 603

  • xiv CONTENTS

    N.12 Label Names Lemma. . . . . . . . . . . . . . . . . . . . . . . . . . . 604N.13 Bounded Merge Lemma . . . . . . . . . . . . . . . . . . . . . . . . . 606N.14 Program Origin Preservation Lemma . . . . . . . . . . . . . . . . . 606N.15 Specification Origin Preservation Lemma. . . . . . . . . . . . . . . 607N.16 Bounded Types Lemma . . . . . . . . . . . . . . . . . . . . . . . . . 610N.17 Bounded Expressions Lemma. . . . . . . . . . . . . . . . . . . . . . 610N.18 Bounded Timeout Clauses Lemma . . . . . . . . . . . . . . . . . . . 612N.19 Bounded State Definitions Lemma . . . . . . . . . . . . . . . . . . . 613N.20 Bounded Behaviors Lemma . . . . . . . . . . . . . . . . . . . . . . . 614N.21 Bounded Program Configurations Lemma . . . . . . . . . . . . . . 615N.22 Bounded PSMs Lemma . . . . . . . . . . . . . . . . . . . . . . . . . 617N.23 Bounded Pairs Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . 617N.24 Unique Context Lemma . . . . . . . . . . . . . . . . . . . . . . . . . 619N.25 Finite Maximal Values Lemma . . . . . . . . . . . . . . . . . . . . . 619N.26 Finite Triggers Lemma . . . . . . . . . . . . . . . . . . . . . . . . . 620N.27 Finite Loop Allocations Lemma. . . . . . . . . . . . . . . . . . . . . 621N.28 Finite Allocations Lemma . . . . . . . . . . . . . . . . . . . . . . . . 623N.29 ProgSteps Termination Lemma . . . . . . . . . . . . . . . . . . . . . 627N.30 Finite Output Matches Lemma . . . . . . . . . . . . . . . . . . . . . 629N.31 PsmSimluateOutput Termination Lemma . . . . . . . . . . . . . . 629N.32 SimulateOutput Termination Lemma . . . . . . . . . . . . . . . . . 630N.33 SimulateOutputs Termination Lemma . . . . . . . . . . . . . . . . 631N.34 MatchingSpecSteps Termination Lemma . . . . . . . . . . . . . . . 631N.35 Explore Termination Lemma . . . . . . . . . . . . . . . . . . . . . . 631N.36 Finite Transformation Steps Lemma . . . . . . . . . . . . . . . . . 636N.37 Prune Termination Lemma . . . . . . . . . . . . . . . . . . . . . . . 636N.38 FindFulfillingPairs Termination Lemma . . . . . . . . . . . . . . . 637N.39 ModelCheck Termination Theorem. . . . . . . . . . . . . . . . . . . 639

  • Chapter 1

    Introduction

    1.1 BackgroundDistributed systems are one of the most widely used means to structure softwaresystems. Such a system harnesses the collective computing power of all of itsconstituent machines, thus allowing the system to scale even as the exponentialimprovements in hardware performance from Moore’s law begin to taper off.

    Although distributed systems can be built directly in terms of networkingprimitives such as sockets, programmers sometimes turn to languages or frame-works based on the actor model that provide more suitable abstractions. A pro-gram in the actor model consists of many different processes, called actors, thatcommunicate by asynchronous message-passing. Upon receiving a message, anactor can spawn new actors, send messages to other actors, or change how it willhandle future received messages. First developed by Hewitt et al. [59] and fur-ther investigated by Agha et al. [3, 4], the model is now the basis for Erlang [10]and Scala’s Akka framework [5].

    A distinguishing characteristic of actors is their address-passing capability:that is, their ability to include actor addresses in messages. This allows actors tolearn about other actors as the program evolves, resulting in a dynamic commu-nication topology. This capability increases the expressive power of the language,but it also makes reasoning about programs difficult.

    1.2 The Problem and My ThesisBecause an actor program’s behavior is defined in terms of its communicationwith the outside world, its correctness is defined in terms of a protocol that de-scribes its expected communication behavior. For example, the protocol for a pro-gram that computes a running average of an incoming stream of numbers mightrequire that

    • upon receiving an element of the stream, the program sends back an ac-knowledgment,

    1

  • 2 CHAPTER 1. INTRODUCTION

    • every request for the current running average receives a response, and

    • after the program receives an end-of-stream message, it sends no furtheracknowledgments.

    When reasoning about such protocols, it can be useful to focus solely on theexpected patterns of communication while ignoring the computational aspects.For example, the properties described above concern the messages the programis expected to send in response to inputs from the outside world, but do not spec-ify how to compute the running average itself. However, even such lightweightspecifications describe behaviors that are easy to implement incorrectly.

    To specify these kinds of communication patterns, programmers often usefinite-state machines (FSMs).1 Each transition of the FSM describes the pro-gram’s expected reaction to a given input, and the different states allow for dif-ferent behavior depending on the sequence of messages received so far. Thismodel is used for network protocols such as TCP [90] and the Alternating BitProtocol [15].

    Traditional FSMs are insufficient for describing protocols for actor programs(hereafter called actor protocols), however. The dynamic communication topologyof an actor program means that a protocol must describe not only what messagesa program can send and receive, but also how the addresses carried in messagesshould be used. For example, a request for the running average in the above ex-ample might contain an address, and the protocol would specify that the responsemust be sent to that address. Therefore, to describe actor protocols, FSMs wouldhave to be augmented with the ability to describe address-passing patterns.

    Even if FSMs were extended to specify actor protocols in this way, implement-ing such protocols would still be difficult. It can be tedious to ensure that everymessage is handled correctly in every state, and therefore it is easy to make mis-takes. Checking that every rule is followed correctly in every possible case is ajob better left to a computer than to a human, so it would be useful to have a toolthat could automatically verify whether a program implements a given protocolcorrectly. Such a tool would help programmers avoid the “obvious” sorts of errorsand allow them to focus on the more complicated aspects of their program thatare harder to reason about automatically.

    Verification is a hard problem in general, but there are three advantages inthis situation. First, because these FSM-like specifications would describe onlyhigh-level patterns of communication rather than low-level computational de-tails, it should be easier to prove that a program satisfies such a specification.Second, actor protocols are often implemented directly as FSMs, to the extentthat Erlang and Akka provide built-in support for this purpose in the form ofthe gen_fsm behavior and the FSM trait, respectively. Therefore, when thinkingabout verifying an actor program against an FSM-based specification, it makes

    1Technically, the finite-state machines used to describe protocols are not truly finite-state, becauseeach explicitly named state can have variables that range over infinitely many values. Nevertheless,the “states” defined by partitioning that infinite set into finitely many classes still act as a usefulreasoning tool for programmers. The rest of this dissertation uses the term “finite-state machine” (or“FSM”) to refer to this informal notion.

  • 1.3. THE CURRENT LANDSCAPE 3

    sense to consider a setting in which the program is written in a language gearedtowards building actors with FSMs. Third, this dissertation will show that manyactor protocols can be written in a language that syntactically forces all event-handler expressions to be terminating (i.e., they may not contain unboundedloops or recursion), which lends further reasoning power.

    This leads me to my thesis.

    Finite-state, address-passing specifications can be used to au-tomatically verify non-trivial protocols in actor programs.

    As evidence for this thesis, this dissertation presents the following contribu-tions:

    1. a specification language for describing actor protocols

    • The language uses address-passing, finite-state machines and itcomes with a notion of conformance that formally defines what itmeans for a program in an FSM-based actor language to implement(conform to) a specification. The language can express both safety andliveness properties of actor protocols.

    2. a series of refinements to conformance to make proving conformance easier

    3. an abstract interpretation for the FSM-based programming language

    4. a state-space-reduction technique

    • The technique further abstracts the program at each step duringmodel checking based on what is relevant to its specification.

    5. a model-checking algorithm for verifying conformance to a specification

    • The algorithm is sound, but not complete, meaning that a successfulverification result implies that the program does indeed conform toits specification, but there are some conforming programs which thealgorithm is unable to verify.

    6. a set of optimizations for the model-checking algorithm

    7. an empirical validation of the model-checking algorithm’s precision

    1.3 The Current Landscape

    Later chapters discuss the related work in more detail, but to provide the readerwith a sense of perspective, this section summarizes the state of the art in ensur-ing that actor programs are correct.

  • 4 CHAPTER 1. INTRODUCTION

    FSMs for Protocols The FSMs used to describe protocols are often infor-mal diagrams, but there are also formal languages such as statecharts [55] andSDL [102] that endow these diagrams with a formal semantics. None of theselanguages have a simple means for expressing address-passing, however. I amunaware of any work that attempts to statically verify an actor program directlyagainst an FSM-like specification. Chapter 3 discusses these works in more de-tail.

    Temporal Logics As an alternative to FSMs, some researchers have proposedthe use of temporal logics to specify the expected behavior of programs. Damet al. [36] use the first-order µ-calculus to specify properties of Erlang programssimilar to the kinds of behavior described in my specification language, but theirlanguage is more expressive and therefore its associated theorem prover [48]requires more human interaction than my model-checking algorithm. Lamport’sTemporal Logic of Actions (TLA) [70], and especially its extension TLA+ [72] area popular means for specifying the behavior of distributed systems in industry,but it does not yet appear to be used with actor-based technologies. Chapter 3further compares my work to temporal logics.

    Testing When it comes to verifying the behavior of an actor program, testingis still the most common approach in industry. Unit testing is of course com-mon, but more rigorous approaches are also used. Both Quviq QuickCheck [12]and PropEr [87], are property-based testers for Erlang that generate random testcases for checking user-defined properties. The P [40] and P# [37] projects (whichalso structure actor-like processes as FSMs) both have schedulers that allow thesame test case to be run under many different schedulers, thereby detecting as-sertion violations that can occur as a result of different execution orders. I viewall of these dynamic techniques as complementary to my static approach. Chap-ter 7 discusses these dynamic approaches in more detail.

    Model-Checking A variety of model checkers [11, 43, 49, 64] have been builtfor checking properties of Erlang programs. Most of these are focused on ver-ifying individual properties of a protocol specified with temporal logic, such aschecking that a certain bad state is never reached, rather than checking thatan entire FSM-based protocol description is implemented correctly. Several ofthe existing model checkers explore only a bounded subset of the program’s statespace, or require programmers to devise their own abstractions to explore theentire state space. Chapter 7 describes these in more detail.

    Type Systems Finally, various type systems have been designed for message-passing programs. A type system can be seen as both a specification and a ver-ification tool, in that a type for a communication channel specifies the protocolthat should be followed when sending or receiving on that channel, while type-checking verifies that the program uses the channel as specified. There are manysuch type systems, but the most well-known are session types [62]. A session type

  • 1.4. STRUCTURE OF THE DISSERTATION 5

    describes how a given communication channel (“session”) should be used by afixed set of interacting processes. Type-checking ensures that the processes coor-dinate with each other correctly, usually to prevent conditions such as deadlock.Session types are difficult to apply to dynamic settings such as actor programs,in which different processes can join and leave conversations at any time, andwhere a single “conversation” can occur over multiple actor addresses. Chapter 3discusses session types in more detail, as well as other relevant type systems.

    1.4 Structure of the DissertationThe first part of this dissertation (chapters 2–4) focuses on defining both the pro-gramming language and specification language, and on showing how to proveconformance by hand. The latter part of the dissertation (chapters 5–9) thendevelops techniques to build an automatic model checker for programs and spec-ifications written in these two languages, and it evaluates the resulting tool onrealistic examples. The chapter-by-chapter breakdown is as follows:

    • Chapter 2 defines CSA, a programming language for implementing actorsas communicating finite-state machines.

    • Chapter 3 defines APS, a specification language for describing actor pro-tocols as restricted, address-passing finite-state machines. The formal se-mantics of APS is defined in terms of a conformance relation that defineswhat it means for a CSA program to implement the protocol described byan APS specification.

    • Chapter 4 presents an example conformance proof for a running examplebuilt up in chapters 2 and 3, to provide a sense of the “shape” of such proofs.The job of the model checker is to automate the construction of such proofs.

    • Chapter 5 develops an abstract interpretation for CSA programs that helpsreduce their possible state-space.

    • Chapter 6 builds on top of chapter 5 to define a new notion of conformancethat further abstracts the program as it evolves, depending on what aspectsof the program are relevant to the specification.

    • Chapter 7 presents the model-checking algorithm itself.

    • Chapter 8 defines a set of optimizations for the model-checking algorithm.

    • Chapter 9 evaluates the model-checking algorithm on a set of realistic actorprograms and specifications.

    • Chapter 10 describes ideas for future work and concludes.

    Chapters 2, 3, 5, and 7 also discuss related work for the topics introduced inthose chapters.

  • 6 CHAPTER 1. INTRODUCTION

  • Chapter 2

    CSA: Actors as Finite-StateMachines

    This chapter introduces CSA (Communicating State Actors), a communication-focused programming language that incorporates the actor-as-FSM pattern as itscore computational model.1 The language is used as a basis for the rest of thedissertation.

    CSA models the core constructs of an internal research language exploredby developers at Cisco. That language and the problems faced by programmersworking in it directly inspired this dissertation. Section 2.2.4 further describesthe Cisco language’s influence on CSA.

    The next few sections describe the language via its syntax, intuitive seman-tics, and a small example. The subsequent sections then define CSA’s formalsemantics and type rules, and the chapter concludes with related work.

    2.1 NotationThis section introduces notation used in the rest of this dissertation. P (A) standsfor the powerset of A, F (A) stands for the set of all finite subsets of A, and M (A)stands for the set of all finite multisets of A. Both the disjoint union of twosets A and B and the multiset sum of two multisets A and B is written A ]B;the context will always make it obvious which meaning is intended. Tuples aredenoted with angle brackets, as in the 3-tuple 〈a1,a2,a3〉. A term a stands for afinite sequence of as, ² stands for an empty sequence, and A∗ stands for the setof all finite sequences of elements of A.

    The set of partial functions from A to B is written A *B. The empty partialfunction (i.e., the function whose domain is the empty set) is written ;. Thenotation f [x 7→ y] stands for the partial function that maps x to y and maps allother inputs x′ to f (x′). When f =;, this can be shortened simply to [x 7→ y]. The

    1This is an updated version of the language presented in a previous paper [97].

    7

  • 8 CHAPTER 2. CSA: ACTORS AS FINITE-STATE MACHINES

    disjoint union of two partial functions f ] g is defined such that ( f ] g)(x) = x′if and only if f (x) = x′ or g(x) = x′; the disjoint union is defined only if dom( f )∩dom(g) = ;. The composition g ◦ f of two partial functions is defined such that(g ◦ f )(x) = g( f (x)) for all x; it is undefined if either f (x) or g( f (x)) is undefined.The restriction of a partial function f to a set A, written f

    ∣∣A , is defined such that

    f∣∣A(x)= f (x) for all x ∈ dom( f )∩ A, and undefined otherwise.

    2.2 Syntax and Informal SemanticsCSA programs consist of independent processes called actors that can communi-cate with other actors via message-passing. An actor is an event-handling pro-cess: whenever it receives a message, it can send messages, spawn new actors,and finally update its state before suspending execution until the next event oc-curs. Message-passing in CSA is asynchronous; each actor has a mailbox in whichmessages sent to it reside until the actor handles them. Messages are unordered,so the mailbox functions as a bag (multiset) of messages rather than a queue.

    A CSA program runs in the context of some environment made up of otherunknown actors that communicate with the program. Those unknown actorsare called external actors, while actors in the program are called internal actors.Thus, a program can serve as a single component of some larger system. Mod-eling programs in this way allows us to allows us to separate a program’s publicinterface (i.e., its communication with the environment) from its private imple-mentation details.

    Figure 2.1 lists the S-expression-based syntax for CSA programs. A programP defines its interface with the environment and the initial set of actors.2 Thereceptionists clause lists the internal actors initially advertised to the environ-ment, which allows actors in the environment to send them messages. Con-versely, the externals clause names the external actors initially known to theprogram, enabling actors inside the program to communicate with the environ-ment. The type associated with each receptionist (external) indicates the type ofmessages the environment (program, respectively) may send to that actor. Thenthe sub-clauses of the let-actors clause each spawn a new actor and bind it to thedeclared variable x. The final sequence of variables x provides the list of actoraddresses to be bound as receptionists (some actors are exposed at more than onename, others are not exposed as receptionists at all).3

    Actors in CSA are structured as finite-state machines, with each state q pa-rameterized over some arguments xs. In a program’s initial spawn clauses, eachactor’s goto expression provides its initial state, and its state definitions Q de-fine its transitions. Each initial state argument is an open value ov. An openvalue is a value-like term that may have free variables. Throughout this disser-

    2The language and semantic presentation is inspired by Agha et al. [3].3The notion of receptionist here differs slightly from that given by Agha et al. to account for the

    addition of types. Whereas a receptionist in their work is a particular actor (or that actor’s address),the term “receptionist” here refers to an address paired with with a type, because the program mayprovide an actor’s address to the environment at multiple different types.

  • 2.2. SYNTAX AND INFORMAL SEMANTICS 9

    P ::= (program (receptionists [x τ]) (externals [x τ])(let-actors ([x (spawn` τ (goto q ov) Q)]) x))

    (Programs)

    Q ::= (define-state (q [xs τ]) xm e tc) (State Definitions)tc ::= no-timeout | [(timeout ov) e] (Timeout Clause)e ::= (spawn` τ e Q) | (goto q e) | (send e e) (Expressions)

    | (begin e e) | (record [r e]) | (: e r) | (variant t e)| (case e [(t x) e]) | (fold τ e) | (unfold τ e) | (list ov)| (dict [ov ov]) | (for/fold [x e] [x e] e) | (o e) | n | str | x

    ov ::= (record [r ov]) | (variant t ov) | (fold τ ov) (Open Values)| (list ov) | (dict [ov ov]) | n | str | x

    τ ::= Nat | String | (Variant [t τ]) | (Record [r τ]) | (Addr τ) (Types)| (rec X (Addr τ)) | X | (List τ) | (Dict τ τ)

    q ∈ StateName (State Names)x ∈ Var (Variables)X ∈ TypeVar (Type Variables)n ∈ N (Natural Numbers)

    str ∈ String (Strings)o ∈ PrimOp (Primitive Operations)` ∈ ProgLocunionEnvLoc (Syntactic Locations)r ∈ RecField (Field Names)t ∈ Tag (Variant Tags)

    Figure 2.1: CSA syntax (keywords in bold, all parentheses and brackets are lit-erals)

  • 10 CHAPTER 2. CSA: ACTORS AS FINITE-STATE MACHINES

    tation, the term “value” on its own is assumed to mean a closed value v (definedin section 2.5.1).

    2.2.1 Handling EventsTwo kinds of events trigger state transitions in CSA: received messages and time-outs. To handle a received message, an actor in state q retrieves the messagefrom its mailbox and evaluates the handler expression e from the correspondingstate definition (define-state (q [xs τ]) xm e tc). In that handler expression,each state argument is bound to the corresponding variable xs, and the messageis bound to the formal parameter xm.

    A state may also declare a timeout clause [(timeout ov) e]. If ov evaluates ton and the actor does not receive a message before n milliseconds elapse, then theactor executes the timeout clause’s handler expression e instead of the messagehandler.4

    While handling an event, an actor can spawn other actors and send messagesto actors (including itself). The expression (spawn` τ e Q) spawns a new actorthat accepts messages of type τ and whose states are defined by Q. The expres-sion e is an initial handler expression for the actor to execute when it starts.5

    The result of a spawn expression is the new actor’s address: a unique identi-fier used to send messages to that actor. The address is also bound to self in thecontext of the new actor. Addresses have type (Addr τ), where τ is the type ofmessage the actor can receive. Address types are contravariant in the messagetype, allowing an address of type (Addr τ) to be used in a context that sends itonly messages of some subtype τ′ of τ.

    The label ` on a spawn expression indicates the syntactic location of thatexpression and is used when creating addresses. In chapter 5, we will see howan abstract interpretation for addresses takes advantage of these labels.

    The expression (send ea ev) sends to the address produced by ea the valueproduced by ev. The message sits in the mailbox for ea until the actor is ready tohandle it.

    A goto expression ends evaluation of a handler expression by transitioningto the named state and passing the given arguments to that state. The actorthen waits in that state for the next event. As with return statements in C-likelanguages, any remaining continuation is dropped when the goto is evaluated.

    2.2.2 Other Expressions and TypesThe remaining expressions are standard values and control forms. Recordfields are accessed using the : form. A variant with tag t is constructed with

    4In an actual implementation of CSA, a message that arrives shortly after n milliseconds haveelapsed may be handled instead of the timeout, as a result of imprecision in the timing. This subtletyis irrelevant for this dissertation, however.

    5In a practical CSA-like language, actors would be defined in terms of class-like definitions, andspawn would instead construct new instances of those classes. The class-less version here simplifiesthe presentation of the language.

  • 2.2. SYNTAX AND INFORMAL SEMANTICS 11

    (variant t e), and the case expression deconstructs variants, binding the vari-ant’s fields in the matching clause. These are similar to ML-style sum-of-productsdatatypes and have types marked with the keyword Variant. The primitive op-erations o include standard operations on natural numbers and strings as wellas operations to add and retrieve values from lists and dictionaries. For polymor-phic types such as (List τ), every possible type has its own set of operators forthat type (e.g., consNat, consString, etc.).

    To enable an abstract interpretation of CSA in chapter 5, list and dictionaryexpressions may contain only open values. The more general form can easilybe desugared into this one by using a case statement to bind variables, e.g., theexpression (list e) can be desugared into the following:

    (case (variant Binding e)[(Binding x) (list x)])

    The expression (for/fold [xacc eacc] [xi e i] ebod y) is like a for-loop, but italso accumulates a result as it iterates, much like a fold in functional languages.6

    The initial value of the accumulated result is eacc. The expression then evaluatesebod y once per item in the list e i, with xi bound to the current list item, and xaccbound to the current accumulated result. The result of ebod y is used as the nextvalue of xacc, and the final such result is the result of the entire expression. CSAeschews unbounded while-loops in favor of for/fold to ensure all loops terminate.

    CSA’s recursive types differ slightly from the standard presentation. A type(rec X (Addr τ)) indicates a greatest-fixed-point type with name X , but therecursive type is limited to address types. This design choice allows messages tocontain addresses of the same type as the receiving actor, which is necessary formany actor protocols, but prevents programs from creating values of unbounded(syntactic) depth. We will see later that this restriction assists in verifying thata program conforms to its specification.

    Although least-fixed-point types are more common in standard functional lan-guages, using a greatest fixed-point allows for infinite cyclic communication pat-terns, such as an address that can be sent to itself infinitely many times.

    2.2.3 Messaging Guarantees

    Message delivery in CSA is unduplicated, unordered, and reliable (i.e., messagesare guaranteed to be delivered). The unduplicated guarantee matches that ofErlang and Akka, while the unordered guarantee is slightly weaker: those sys-tems guarantee that when two messages have the same sender and receiver,their order is preserved. Because the rest of this dissertation assumes a weakerguarantee, the reasoning developed here also applies to systems with strongerordering guarantees.

    Neither Akka nor Erlang guarantee reliable delivery. The Erlang documen-tation, however, says that most programmers find it simplest to assume reliable

    6CSA borrows for/fold from Racket [46].

  • 12 CHAPTER 2. CSA: ACTORS AS FINITE-STATE MACHINES

    delivery while using Erlang’s monitoring facilities to detect failures.7 Thus, theCSA semantics corresponds to the mental model programmers typically use, andso the reasoning developed in this dissertation is sound up to the reliable deliveryof messages.

    2.2.4 Inspiration for CSACSA is based on a Cisco internal research programming language that adaptsthe actor model for implementing network protocols. I was introduced to the lan-guage during an internship at Cisco in the summer of 2012 while a team of de-velopers were exploring the language. Further work with that Cisco team helpedme understand the problems faced by both actor programmers and implementersof network protocols, and it inspired the work developed in this dissertation.

    Aside from the actor-model foundation for the language, CSA borrows two ofthe Cisco language’s defining features. First, both languages use a finite-statemachine to structure actors, while actors in other languages are arbitrary pro-cesses that either return a closure as their continuation when they finish han-dling an event or recursively execute a receive statement to process the nextmessage. This state-machine orientation lends a notion of organization and pre-dictability to actors. In this way, both CSA and the Cisco language are closerto functional programming languages in that the next state is computed and re-turned, rather than being a part of the current continuation as in a languagesuch as Erlang.

    Second, both CSA and the Cisco language syntactically enforce all handler ex-pressions to be terminating: there are no unbounded loops and no recursion. Theforced termination allows for more precise analyses, because it makes reasoningabout unbounded continuations unnecessary.

    Experience so far suggests that these differences from typical actor languagesdo not overly restrict actor programmers (see the discussions of example pro-grams in chapter 9), but further evaluation is needed.

    2.3 Example: Stream ProcessingBob runs a weather-tracking station just outside of Boston, and he wants to geta better understanding of the region’s infamously volatile weather. As a firststep, he decides to track the average temperature at certain times of day/year,so he can answer questions such as “What’s the average temperature on a Julyafternoon?” or “How cold is a typical Christmas morning?”.

    Bob’s station supplies him with a stream of temperature data at a rate ofone reading per second, but he needs a system to filter that data and computemean temperatures from it. Rather than reprocessing years’ worth of raw dataevery time he wants an updated result, he decides to design a system of actors toprocess this incoming data stream and compute running averages for particulartime periods. In particular, Bob can turn each actor’s recording capabilities on or

    7http://erlang.org/faq/academic.html#idp32851984

    http://erlang.org/faq/academic.html#idp32851984

  • 2.3. EXAMPLE: STREAM PROCESSING 13

    OFF ON

    DONE

    Shutdown

    Enable

    Disable

    Shutdow

    n

    Figure 2.2: Finite-state machine of the processor actor

    off at appropriate times by sending it a message, so each such stream-processoractor is responsible only for computing a running average of readings receivedwhile it is recording.

    A separate manager actor will create new processor actors on demand andshut down the entire system when requested. To limit resource usage, the man-ager will create no more than 100 processors.

    The FSM in figure 2.2 summarizes each stream processor’s behavior. It startsin the OFF state, and the system can toggle between OFF and ON with Enableand Disable messages. A Shutdown message from either state shuts down theprocess entirely. This FSM leaves out many details of the described protocol,though; we will return to this issue in the next chapter.

    2.3.1 Implementation

    Listings 2.1 and 2.2 show the program Bob came up with. Single-line com-ments start with ;, and block comments are nested between #| and |#. The list-ings also assume some bits of syntactic sugar: begin expressions and no-timeoutclauses are left implicit, types are given the aliases listed in the comments, and alet expression (let ([x e]) e′) is short for (case (variant Let e) [(Let x) e′]).

    The program starts with a single manager actor, which has just one state.When the manager receives a MakeProc request, if the processor limit has notbeen reached, it spawns a new processor and sends its address to the providedaddress resp (the contents of the processor-spawn expression are found in list-ing 2.2). The mdest address included in the request indicates where the processorshould send the current mean when requested. The manager then adds the pro-cessor to its list of known processors and transitions back to the Managing state.If the limit has been reached, the manager silently ignores the request. Other-wise, upon receiving a ShutdownAll request, it sends each processor a Shutdownmessage before clearing its list of processors.

    In listing 2.2, the processor actor’s type says it can accept five kinds of mes-sages: AddRdg, a command to add a new temperature reading to the total;GetMean, a request for the current running mean value; Enable and Disable,commands to turn the actor’s recording capabilities on or off respectively; andShutdown; a command to kill the entire process.

  • 14 CHAPTER 2. CSA: ACTORS AS FINITE-STATE MACHINES

    1 ;; ManagerMessage =2 ;; (Variant [MakeProc (Addr (Addr ProcUserAPI)) (Addr Nat)]3 ;; [ShutdownAll])4 ;;5 ;; ManagerUserAPI =6 ;; (Variant [MakeProc (Addr (Addr ProcUserAPI)) (Addr Nat)])7 ;;8 ;; ManagerSysAPI = (Variant [ShutdownAll])9 ;;

    10 ;; ProcUserAPI =11 ;; (Variant [AddRdg Nat (Addr (Variant [Ok] [NotOk]))]12 ;; [GetMean]13 ;; [Disable]14 ;; [Enable])15 ;;16 ;; ProcAddr = (Addr (Variant [Shutdown]))17

    18 (program (receptionists [user-api ManagerUserAPI]19 [sys-api ManagerSysAPI])20 (externals)21 (let-actors22 ([manager (spawn ManagerMessage23 (goto Managing (list))24 (define-state (Managing [processors (List ProcAddr)]) m25 (case m26 [(MakeProc resp mdest)27 (case (< (length processors) 100)28 [(True)29 (let ([p (spawn #|see listing 2.2|#)])30 (send resp p)31 (goto Managing (cons p processors)))]32 [(False) (goto Managing processors)])]33 [(ShutdownAll)34 (for/fold ([dummy-result (variant Shutdown)])35 ([p processors])36 (send p (variant Shutdown)))37 (goto Managing (list))])))])38 (manager manager)))

    Listing 2.1: The manager actor’s implementation

  • 2.3. EXAMPLE: STREAM PROCESSING 15

    1 ;; Processor's declared type2 (Variant [AddRdg Nat (Addr (Variant [Ok] [NotOk]))]3 [GetMean]4 [Disable]5 [Enable]6 [Shutdown])7

    8 ;; Initial state9 (goto Off 0 0)

    10

    11 ;; State definitions12 (define-state (Off [sum Nat] [num-rdgs Nat]) m13 (case m14 [(AddRdg temp resp)15 (send resp (variant NotOk))16 (goto Off sum num-rdgs)]17 [(GetMean)18 (send mdest (/ sum num-rdgs))19 (goto Off sum num-rdgs)]20 [(Disable) (goto Off sum num-rdgs)]21 [(Enable) (goto On sum num-rdgs)]22 [(Shutdown) (goto Done)]))23

    24 (define-state (On [sum Nat] [num-rdgs Nat]) m25 (case m26 [(AddRdg temp resp)27 (send resp (variant Ok))28 (goto On (+ sum temp) (+ num-rdgs 1))]29 [(GetMean)30 (send mdest (/ sum num-rdgs))31 (goto On sum num-rdgs)]32 [(Disable) (goto Off sum num-rdgs)]33 [(Enable) (goto On sum num-rdgs)]34 [(Shutdown) (goto Done)]))35

    36 (define-state (Done) m (goto Done))

    Listing 2.2: The processor actor’s implementation

  • 16 CHAPTER 2. CSA: ACTORS AS FINITE-STATE MACHINES

    x1, . . . , xn, x′1, . . . , x′o are distinct x

    ′1, . . . , x

    ′m, x

    ′′1 , . . . , x

    ′′o are distinct

    Γext =[x′1 7→ (Addr τ′1), . . . , x′m 7→ (Addr τ′m)

    ]∀k ∈ 1 . . . o. Γk =Γext

    [x′′1 7→ (Addr τ′′1), . . . , x′′k−1 7→ (Addr τ′′k−1)

    ]∀k ∈ 1 . . . o. Γk,;` ek : (Addr τ′′k) Γ′ = [x′′1 7→ (Addr τ′′1), . . . , x′′o 7→ (Addr τ′′o)]

    ∀i ∈ 1 . . .n. Γ′,;` xi : (Addr τi)

    `prog (program (receptionists [xi τi]i∈1...n

    ) (externals [x′j τ′j]

    j∈1...m)

    (let-actors ([x′′1 e1] . . .[x′′o eo]) x

    ′′′1 . . . x

    ′′′n ))

    Figure 2.3: Typing judgment for programs

    The actor starts in the Off state, with an initial sum of 0 for all tempera-ture data and 0 readings recorded so far. In that state, the actor rejects all newtemperature readings by sending a NotOk message back to the sender. To replyfor a request for the mean, the actor simply calculates the average so far andsends the result to mdest. The Disable message does not affect the actor in thisstate, but the Enable message causes it to transition to the On state. Finally, ashut-down request causes the actor to transition to the Done state, dropping itscurrent totals.

    The On state is similar to Off, except that the actor accepts new temperaturereadings with an Ok response and updates its counts. Finally, once it shuts downand enters the Done state, the processor ignores all messages it receives.

    We will return to this example in the next chapter.

    2.4 Type SystemCSA’s type system is relatively standard. This section formally defines the typesystem and explains any non-standard features.

    2.4.1 Type-Checking ProgramsThe top-level judgment is a type-check of complete programs, written `prog P anddefined in figure 2.3. It requires the externals and initial actors in a program tohave distinct names, as well as the externals and receptionists. Every spawnexpression in the let-actors clause must type-check as some address type whengiven types for the externals and previous internal actors. Finally, every addressused as a receptionist must also type-check as an address of the type expected forthat receptionist.

    To type-check expressions, in addition to a standard term environment Γ, thetype system uses a state environment Θ that maps state names to the sequenceof types for their arguments (see figure 2.4).

    Figure 2.5 defines the type rules for effectful expressions. The rule to type-check a goto expression requires that the argument types match the arguments

  • 2.4. TYPE SYSTEM 17

    Γ ∈ Var*Type (Term Environments)Θ ∈ StateName*Type∗ (State Environments)

    τ ∈Type ::= . . . | ⊥ (Types)

    Figure 2.4: Extra environments and types for type system

    Θ(q)= τ1, . . . ,τn Γ,Θ` e i : τi for all i ∈ 1 . . .nΓ,Θ` (goto q e1 . . . en) :⊥

    Q i = (define-state (qi [xi,1 τ′i,1] . . . [xi,m τ′i,m]) x′i e′i tci) for all i ∈ 1 . . .nq1, . . . , qn are distinct

    Γ′ =Γ[self 7→ (Addr τ)] Θ′ = [q1 7→ (τ1,1 . . .τ1,m), . . . , qn 7→ (τn,1 . . .τn,m)]Γ′,Θ′ ` e :⊥ Γ′,Θ′ `state Q i for all i ∈ 1 . . .nΓ,Θ` (spawn` τ e Q1 . . . Qn) : (Addr τ)

    Γ,Θ` e1 : (Addr τ) Γ,Θ` e2 : τΓ,Θ` (send e1 e2) : (Variant [Unit])

    Figure 2.5: Expression type rules for effectful expressions

    declared for that state in Θ. Because a goto expression is purely a control ex-pression that does not evaluate to a value, it gets the special type ⊥.

    The rule for spawn requires that all states have distinct names and creates anew state environment Θ′ in which to type-check the state definitions and initialexpression of the spawned actor. The type of the initial expression e must be ⊥,indicating that the expression eventually transitions to one of the actor’s states.Type-checking rules for state definitions and their timeout clauses are found infigure 2.6; similar to the spawn expression, they require that all handler expres-sions have type ⊥.

    Finally, the rule for send merely checks that the message being sent matchesthe address’s type.

    Figure 2.7 lists the remaining (largely standard) type rules for expressions,including a subsumption rule (see section 2.4.2 for a description of subtyping inCSA). The type for an address is determined by the function ActorType. For type-checking primitive operations (o e1 . . . en), the function PrimOpTypes gives theexpected argument types and return type of each operation.

    The requirement that the clauses in a case expressions must be in the sameorder as the tags in a Variant type is only for simplification, as the subtype rulesallow the tag-clauses of a Variant type to be rearranged.

  • 18 CHAPTER 2. CSA: ACTORS AS FINITE-STATE MACHINES

    Γ(self)= (Addr τ′)Γ′ =Γ[x1 7→ τ1, . . . , xn 7→ τn] Γ′[x′ 7→ τ′],Θ` e :⊥ Γ′,Θ`tc tc

    Γ,Θ`state (define-state (q [x1 τ1] . . . [xn τn]) x′ e tc)

    Γ,Θ`tc no-timeoutΓ,Θ` ov :Nat Γ,Θ` e :⊥Γ,Θ`tc [(timeout ov) e]

    Figure 2.6: Type rules for state definitions and timeout clauses

    2.4.2 Subtyping

    Variant types are a convenient means for defining the different messages an actorcan accept. To allow an address to be presented to different parts of a programwith a restricted set of possible messages, CSA has a width-subtyping rule onVariant types that allows subtypes to drop tag clauses. Address types are con-travariant, so an address that accepts more possible messages is a subtype of onethat accepts less, allowing the “capabilities” of an address to be narrowed via thetype system.

    Aside from that rule and a permutation rule for Variant types, the other sub-type rules (defined in figure 2.8) are the standard reflexivity, transitivity, bottom,and depth rules.8 The rule for recursive types uses the standard Amber rule [23],where Υ is a set of sub-type assumptions of the form X

  • 2.5. FORMAL SEMANTICS 19

    Γ(x)= τΓ,Θ` x : τ Γ,Θ` n :Nat Γ,Θ` str :String

    Γ,Θ` e i : τi for all i ∈ 1 . . .n Γ,Θ` e′ : τ′Γ,Θ` (begin e1 . . . en e′) : τ′

    Γ,Θ` e i : τi for all i ∈ 1 . . .nΓ,Θ` (record [r1 e1] . . . [rn en]) : (Record [r1 τ1] . . . [rn τn])

    Γ,Θ` e : (Record [r1 τ1] . . . [rn τn]) i ∈ 1 . . .nΓ,Θ` (: e r i) : τi

    Γ,Θ` e i : τi for all i ∈ 1 . . .nΓ,Θ` (variant t e1 . . . en) : (Variant [t τ1 . . . τn])

    Γ,Θ` e : (Variant [t1 τ1,1 . . . τ1,m] . . . [tn τn,1 . . . τn,m])Γ[xi,1 7→ τi,1, . . . , xi,m 7→ τi,m],Θ` e′i : τ′ for all i ∈ 1 . . . n

    Γ,Θ` (case e [(t1 x1,1 . . . x1,m) e′1] . . .[(tn xn,n . . . xn,m) e′n]) : τ′

    τ= (rec X τ′) Γ,Θ` e : τ′[X ← τ]Γ,Θ` (fold τ e) : τ

    τ= (rec X τ′) Γ,Θ` e : τΓ,Θ` (unfold τ e) : τ′[X ← τ]

    Γ,Θ` ovi : τ for all i ∈ 1 . . .nΓ,Θ` (list ov1 . . . ovn) : (List τ)

    Γ,Θ` ovi : τ for all i ∈ 1 . . .n Γ,Θ` ov′i : τ′ for all i ∈ 1 . . .nΓ,Θ` (dict [ov1 ov′1] . . . [ovn ov′n]) : (Dict τ τ′)

    Γ,Θ` e : τ Γ,Θ` e′ : (List τ′) Γ[x 7→ τ, x′ 7→ τ′],Θ` e′′ : τΓ,Θ` (for/fold [x e] [x′ e′] e′′) : τ

    PrimOpTypes(o)= 〈τ1 . . . τn,τ′〉 Γ,Θ` e i : τi for all i ∈ 1 . . .nΓ,Θ` (o e1 . . . en) : τ′

    Γ,Θ` e : τ′ τ′

  • 20 CHAPTER 2. CSA: ACTORS AS FINITE-STATE MACHINES

    Υ` τ

  • 2.5. FORMAL SEMANTICS 21

    Actors

    am : (Managing (list))

    ap1 : (On 100 2)

    ap2 : (Off 240 4)

    Messages

    Shutdown

    Shutdown

    Environment

    Knows about:

    • am with type ManagerUserAPI

    • am with type ManagerSysAPI

    • ap1 with type ProcUserAPI

    • ap2 with type ProcUserAPI

    Figure 2.9: Illustration of a configuration of the stream-processing example

  • 22 CHAPTER 2. CSA: ACTORS AS FINITE-STATE MACHINES

    K ::=〈〈β

    ∣∣∣µ 〉〉ρ (Program Configurations)β ∈ Addr*Beh (Actor-Behavior Maps)µ ∈ M (Addr×Val) (Message Multisets)ρ ∈ F (Addr×Type) (Receptionist Sets)

    b ∈Beh ::=〈

    Q, e〉

    |〈

    Q,(receive x e tc)〉

    (Behaviors)

    a ∈Addr ::= (addr ` n) (Addresses)v ∈Val ::= a | (record [r v]) | (variant t v) | (fold τ v) (Values)

    | (list v) | (dict [v v]) | n | stre ::= . . . | a (Expressions)

    ov ::= . . . | a (Open Values)

    Figure 2.10: Program-configuration syntax

    Figure 2.9 illustrates a configuration of the stream-processing example withone manager actor and two processor actors, where there is a Shutdown messageon its way to each of those two processor actors. The figure also illustrates thestate each actor is in, along with its current state parameters.

    The environment in which this program executes knows about the addressesfor each of those actors at various types. There are two receptionists forthe actor at am: one representing the capability to request new processors(ManagerUserAPI), and another representing the capability to shut down thesystem (ManagerSysAPI). The processors are each known to the environment atthe ProcUserAPI type.

    Figure 2.10 defines the formal syntax for such a configuration configuration.A program configuration K (represented by the metavariable ) consists of anactor-behavior map β, a message multiset µ, and a receptionist set ρ.

    The set of actors in a configuration (corresponding to the actors on the left-hand side of figure 2.9) is represented as a map β from an actor’s address toits behavior. An actor’s behavior b contains its state definitions Q (not shownin figure 2.9) and either its currently executing handler expression e or a specialreceive term that indicates the actor is suspended and waiting for the next event.The receive term specifies the message handler e and timeout clause tc to be usedfor the next event, with x binding the received message in e. The states of theactors in figure 2.9 correspond to this latter form of a behavior.

    A message is a pair of a destination address a and a communicated value v.The message multiset µ, corresponding to the messages on the right-hand side offigure 2.9, contains all sent messages not yet handled by the receiving actor.

    Finally, the receptionist set ρ records the addresses the program has provided

    8Similar permutation and width rules for Record types are possible, but unnecessary for ourpurposes.

  • 2.5. FORMAL SEMANTICS 23

    to the environment and at what types the environment has access to them. Infigure 2.9, this corresponds to the set of addresses that the environment knowsabout and the types at which it knows them. The type of a receptionist may bedifferent from the base type of the identified actor (as is the case for all recep-tionists in figure 2.9) so that an actor’s public interface can differ from its privateimplementation. This set of receptionists initially matches the declarations fromthe program, but adds more elements as the program sends more addresses tothe environment.

    The environment is assumed to be made up of actors from other CSA pro-grams, and therefore those actors are assumed to send only messages with appro-priate types. A more robust implementation of CSA may wish to first type-checkmessages sent to receptionists and reject any with the wrong type.

    An address a is defined by the syntactic location ` of the spawn expressionthat created it, along with a unique identifier n to distinguish it from other actorscreated at that location. The rest of this dissertation will sometimes refer to anactor by its address, as in, “the actor at a has behavior b.”

    The locations are partitioned into two sets: ProgLoc is the set of all syntacticlocations from the original program, and EnvLoc is the set of all locations foractors from the environment. An address (addr ` n) is internal if ` ∈ ProgLocand external if ` ∈EnvLoc.

    It is assumed there is a function ActorType from locations to types such thatActorType(`) is the type of message that an actor spawned at ` can receive. Fur-thermore, it is assumed that for every possible type τ there are infinitely manyexternal locations ` such that ActorType(`) = τ. The function extends to ad-dresses based on the spawn location for that address; i.e., if a = (addr ` n),then ActorType(a)=ActorType(`).

    2.5.2 Type-Checking Configurations and AddressesThe soundness of some of the proof techniques introduced in this dissertationrely on a type preservation lemma similar to the following (the actual lemma isphrased in terms of marked configurations, defined in chapter 3):

    Lemma. For all K , K ′, and l, if `cfg K and Kl

    K ′, then `cfg K ′.

    As a result, a type-checking judgment `cfg for configurations is required. Therule in figure 2.11 defines the judgment. The rule checks that

    • every receptionist corresponds to an actor in the program,

    • every receptionist’s type is a subtype of the messages that address can re-ceive,

    • the address for every actor in the program is internal,

    • the actor-behavior map has a mapping for every internal address in theconfiguration,

  • 24 CHAPTER 2. CSA: ACTORS AS FINITE-STATE MACHINES

    ∀〈a,τ〉 ∈ ρ. a ∈ dom(β) and ;,;` a : (Addr τ) ∀a ∈ dom(β). a is internal∀a appearing in β,µ, or ρ. a ∈ dom(β) if a is internal

    ∀a ∈ dom(β). ∃τ. ActorType(a)= τ and τ,;`beh b where b =β(a)∀〈a′,v〉 ∈µ. ∃τ. ;,;` a′ : (Addr τ) and ;,;` v : τ

    `cfg〈〈β

    ∣∣∣µ 〉〉ρQ i = (define-state (qi [xi,1 τ′i,1] . . . [xi,m τ′i,m]) x′i e′i tci) for all i ∈ 1 . . .n

    q1, . . . , qn are distinct Θ= [q1 7→ (τ′1,1 . . .τ′1,m), . . . , qn 7→ (τ′n,1 . . .τ′n,m)]Γ,Θ` e :⊥ Γ,Θ`state Q i for all i ∈ 1 . . .n

    τ,Γ`beh 〈Q1 . . . Qn, e〉

    Q i = (define-state (qi [xi,1 τ′i,1] . . . [xi,m τ′i,m]) x′i e′i tci) for all i ∈ 1 . . .nq1, . . . , qn are distinct Θ= [q1 7→ (τ′1,1 . . .τ′1,m), . . . , qn 7→ (τ′n,1 . . .τ′n,m)]Γ[x′′ 7→ τ],Θ` e :⊥ Γ,Θ`tc tc Γ,Θ`state Q i for all i ∈ 1 . . .n

    τ,Γ`beh〈Q1 . . . Qn,(receive x′′ e tc)

    Figure 2.11: Program configuration and behavior type rules

    ActorType(a)= τΓ,Θ` a : (Addr τ)

    Figure 2.12: Address typing rule

    • every actor’s behavior type-checks, and

    • the type of every in-transit message matches its destination address’s type.

    The type-checking rules for behaviors are similar to the rule for spawn ex-pressions, ensuring that both the state definitions and any handler expressionsand timeout clauses type-check. The main difference is that instances of the selfkeyword in a behavior have already been replaced with the actor’s address, sothe rule does not add self to the type environment. This judgment also takes asinput an extra type τ, indicating the type of messages the actor can receive.

    Because configurations may contain addresses, addresses must be able to betype-checked, as well. The rule, for an address a, given in figure 2.12, simplygives the type for that address reported by the ActorType function.

    2.5.3 InstantiationThe function Inst in figure 2.13 converts a program into its initial configuration.It takes a sequence of addresses to assign as the declared externals and creates

  • 2.5. FORMAL SEMANTICS 25

    Inst : Prog×Addr∗*ProgConfigInst(P,a′1 . . .a

    ′m)=

    〈〈[a′′1 7→ b1, . . . ,a′′o 7→ bo]

    ∣∣∣; 〉〉{〈a1,τ1〉,...,〈an,τn〉}where P = (program (receptionists [xi τi]

    i∈1...n) (externals [x′j τ

    ′j]

    j∈1...m)

    (let-actors ([x′′k (spawn`k τk ek Qk)]

    k∈1...o) x′′′1 . . . x

    ′′′n ))

    and a′′k = (addr `k 0)and bk = InstAct(Qk, ek, [self← a′′k][x′′1 ← a′′1] . . . [x′′k−1 ← a′′k−1][x′1 ← a′1] . . . [x′m ← a′m])and ai = x′′′i [x′′1 ← a′′1] . . . [x′′o ← a′′o]

    Figure 2.13: Program instantiation

    InstAct : StateDef∗×Exp×AddrSubst*BehInstAct(Q,(goto q ov1 . . . ovn), [x1 ← a1] . . . [xm ← am])=〈

    Q′,(receive x′′ e tc)[x′1 ← v′1] . . . [x′n ← v′n]〉

    where Q′ =Q[x1 ← a1] . . . [xm ← am]and (define-state (q [x′1 τ1] . . .[x

    ′n τn]) x

    ′′ e tc) is in Q′and v′i = ovi[x1 ← a1] . . . [xm ← am]

    Figure 2.14: Actor instantiation

    the initial behavior for each actor, substituting in the actor’s own address, theexternal addresses, and all previously declared internal actors.

    The InstAct function in figure 2.14 instantiates an actor by applying the givensubstitutions to its state definitions and goto arguments, then transitioning itinto its initial state where it waits for the next message or timeout.

    2.5.4 Transition Semantics

    The transition relation for CSA is a labeled transition relation of the form Kl

    K ′, defined in figure 2.16. The transition-step label l (defined in figure 2.15)

    E ::= [] | (goto q v E e) | (send E e) | (send v E) (Evaluation Contexts)| (begin E e) | (record [r v] [r E] [r e]) | (: E r) | (variant t v E e)| (case E [(t x) e]) | (fold τ E) | (unfold τ E)| (for/fold [x E] [x e] e) | (for/fold [x v] [x E] e) | (o v E e)

    l ::= a : rcv-ext(v,τ) | a : rcv-int(v) (Transition-Step Labels)| a : send-ext(a,v) | a : send-int(a,v) | a : timeout | a : func | a : goto| a : spawn(a)

    Figure 2.15: Miscellaneous evaluation syntax

  • 26 CHAPTER 2. CSA: ACTORS AS FINITE-STATE MACHINES

    distinguishes steps for the sake of a fairness condition for CSA (see section 2.5.6).The label also indicates any communication with the environment (send or rcv),and whether that communication was internal to the program (int) or external(ext). The type in the rcv-ext label indicates the type of the receptionist thatallows the message to be received. The address in a spawn label identifies thespawned actor. The address used as the prefix a : of a transition step’s labelindicates the active actor for that step. A func label indicates a purely functionalstep of a message handler, such as extracting a field from a record or evaluatinga primitive operation.

    The handler-start labels are those labels of the form a : timeout, a : rcv-int(v),or a : rcv-ext(v,τ), because they each indicate the start of a new event handler. Allothers are handler-continuation labels, which represent a step taken while exe-cuting an event handler. Because handler expressions are deterministic (modulochoice of address for newly spawned actors), at most one step with a handler-continuation label is enabled in a given configuration for each actor.

    Figure 2.16 lists the transition rules of CSA’s operational semantics. The ruleE-GOTO transitions an actor into its next state (represented by a receive term),in which it waits to receive a message. The message variable, handler expression,and timeout clause of the receive term come from the named state. This step endsevaluation of the handler expression by dropping the remaining context E.

    E-RECEIVEINTERNAL picks an arbitrary message sent to the actor at a andsubstitutes it into the handler expression.

    E-RECEIVEEXTERNAL is similar to E-RECEIVEINTERNAL, with the messagecoming from the environment. If ρ contains a receptionist 〈a,τ〉, then the envi-ronment can send any message of type τ to a.

    The rule requires that any received internal addresses are used in positionsthat respect their type as given in ρ. The function IntAddrTypes, defined infigure 2.17, returns a set

    {〈a′1,τ

    ′1〉

    , . . . ,〈a′n,τ′n

    〉}indicating the types of all internal

    addresses in the message when v is viewed as the receptionist’s type τ. Subtypingis defined in section 2.4.2.

    E-TIMEOUT allows an actor to run its state’s timeout handler. The semanticsdoes not model a clock for timeouts.

    E-SENDINTERNAL pairs the message v with its destination a′ and adds it tothe set of messages. The expression itself evaluates to a Unit value.

    E-SENDEXTERNAL outputs to the environment a message directed to oneof the external addresses. The sent message may contain addresses the envi-ronment does not know about yet, or present previously known addresses withdifferent types (because of the subtyping rules, see section 2.4.2), necessitat-ing an update to the receptionists set ρ to reflect the newly exposed reception-ists. For instance, ρ may have an entry

    〈a,((Variant [GetItem Nat]))

    〉(mean-

    ing the actor at a can accept messages of the form (variant GetItem n)), butthe program may send a message that uses a in a context where it has type(Addr (Variant [SetName String])) (indicating the actor at a can also acceptmessages of the form (variant SetName str)). Therefore the environment learnsthat a can receive an additional kind of message, so the program adds an entry〈a,(Variant SetName String)

    〉to ρ.

  • 2.5. FORMAL SEMANTICS 27

    E-GOTO〈〈β

    [a 7→

    〈Q,E[(goto q vs)]

    〉] ∣∣∣µ 〉〉ρ a : goto 〈〈β[a 7→ 〈Q,(receive x e tc)[xs ← vs]〉] ∣∣∣µ 〉〉ρif (define-state (q [xs τ]) x e tc) is in Q

    E-RECEIVEINTERNAL〈〈β

    [a 7→

    〈Q,(receive x e tc)

    〉] ∣∣∣µ] {〈a,v〉} 〉〉ρ a : rcv-int(v) 〈〈β[a 7→ 〈Q, e[x ← v]〉] ∣∣∣µ 〉〉ρE-RECEIVEEXTERNAL〈〈β

    [a 7→

    〈Q,(receive x e tc)

    〉] ∣∣∣µ 〉〉ρ a : rcv-ext(v,τ) 〈〈β[a 7→ 〈Q, e[x ← v]〉] ∣∣∣µ 〉〉ρif 〈a,τ〉 ∈ ρ and ;,;` v : τand ∀〈a′,τ′〉 ∈ IntAddrTypes(v,τ). ∃τ′′. 〈a′,τ′′〉 ∈ ρ and τ′′

  • 28 CHAPTER 2. CSA: ACTORS AS FINITE-STATE MACHINES

    IntAddrTypes(v,τ)=

    Case v = a,τ= (Addr τ′) :; if a is external, else {〈a,τ′〉}

    Case v = n,τ=Nat :;

    Case v = str,τ=String :;

    Case v = (variant t v1 . . . vn),τ= (Variant [t′ τ′1 . . . τ′m] [t τ1 . . . τn] [t′′ τ′′1 . . . τ′′l ]) :⋃i∈1...n IntAddrTypes(vi,τi)

    Case v = (record [r1 v1] . . . [rn vn]),τ= (Record [r1 τ1] . . . [rn τn]) :⋃i∈1...n IntAddrTypes(vi,τi)

    Case v = (fold τ v′),τ= (rec X τ′) :IntAddrTypes(v,τ′[X ← τ])

    Case v = (list v1 . . . vn),τ= (List τ′) :⋃i∈1...n IntAddrTypes(vi,τ′)

    Case v = (dict [v1 v′1] . . . [vn v′n]),τ= (Dict τ′ τ′′) :⋃i∈1...n(IntAddrTypes(vi,τ′)∪ IntAddrTypes(v′i,τ′′))

    Otherwise, undefined

    Figure 2.17: Internal address type extraction

  • 2.5. FORMAL SEMANTICS 29

    (begin v) � v(begin v e e′) � (begin e e′)

    (: (record [r′ v′] [r v] [r′′ v′′]) r) � v(case (variant t v) _ [(t x) e] _) � e[x ← v]

    (unfold τ (fold τ′ v)) � v(o v) � EvalPrimop(o,v)

    (for/fold [x v] [x′ (list)] e) � v(for/fold [x v] [x′ (list v′ v′′)] e) � (for/fold [x e′] [x′ (list v′′)] e)

    where e′ = e[x ← v][x′ ← v′]

    Figure 2.18: Functional reduction steps

    E-SPAWN creates an actor with a globally unique address.9 The initializationexpression becomes the actor’s current handler expression. The result of thespawn is the new address a′, which is substituted for self in the new actor.

    E-FUNC allows the actor to step if its current handler expression can take afunctional reduction step, defined in figure 2.18. The functional rules are stan-dard. Exceptional behavior such as dividing by zero causes the evaluating actorto get stuck.

    CSA does not model crashed actors or dropped messages, which sometimesoccurs in distributed systems. Reasoning about systems without these propertiesis still useful, though, because an error in such a failure-proof system would alsobe an error in a failure-susceptible one.

    2.5.5 Related Semantic Notions

    The transition semantics gives rise to a variety of related notions defined hereand used later in this dissertation. A transition step labeled with l is enabled

    from a configuration K , written Kl

    , if there exists some K ′ such that Kl

    K ′.Intuitively, actors alternate between waiting for messages and handling some

    event. To formalize this, we say that a behavior b is handling an event if b =〈Q, e

    〉for some Q and e; otherwise, we say the behavior is a