computer systems - 一只特立独行的猪guanzhou.pub/files/computer system_en.pdf ·...

1078

Upload: others

Post on 18-Jun-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

  • Computer SystemsA Programmer’s Perspective

  • This page intentionally left blank

  • Computer SystemsA Programmer’s Perspective

    Randal E. BryantCarnegie Mellon University

    David R. O’HallaronCarnegie Mellon University and Intel Labs

    Prentice Hall

    Boston Columbus Indianapolis New York San Francisco Upper Saddle River

    Amsterdam Cape Town Dubai London Madrid Milan Munich Paris Montreal Toronto

    Delhi Mexico City Sao Paulo Sydney Hong Kong Seoul Singapore Taipei Tokyo

  • Editorial Director: Marcia HortonEditor-in-Chief: Michael HirschAcquisitions Editor: Matt GoldsteinEditorial Assistant: Chelsea BellDirector of Marketing: Margaret WaplesMarketing Coordinator: Kathryn FerrantiManaging Editor: Jeff HolcombSenior Manufacturing Buyer: Carol MelvilleArt Director: Linda KnowlesCover Designer: Elena SidorovaImage Interior Permission Coordinator: Richard RodriguesCover Art: © Randal E. Bryant and David R. O’HallaronMedia Producer: Katelyn BollerProject Management and Interior Design: Paul C. Anagnostopoulos, Windfall SoftwareComposition: Joe Snowden, Coventry CompositionPrinter/Binder: Edwards BrothersCover Printer: Lehigh-Phoenix Color/Hagerstown

    Copyright © 2011, 2003 by Randal E. Bryant and David R. O’Hallaron. All rights reserved.Manufactured in the United States of America. This publication is protected by Copyright,and permission should be obtained from the publisher prior to any prohibited reproduction,storage in a retrieval system, or transmission in any form or by any means, electronic,mechanical, photocopying, recording, or likewise. To obtain permission(s) to use materialfrom this work, please submit a written request to Pearson Education, Inc., PermissionsDepartment, 501 Boylston Street, Suite 900, Boston, Massachusetts 02116.

    Many of the designations by manufacturers and seller to distinguish their products areclaimed as trademarks. Where those designations appear in this book, and the publisherwas aware of a trademark claim, the designations have been printed in initial caps or allcaps.

    Library of Congress Cataloging-in-Publication Data

    Bryant, Randal.Computer systems : a programmer’s perspective / Randal E. Bryant, David R.

    O’Hallaron.—2nd ed.p. cm.

    Includes bibliographical references and index.ISBN-13: 978-0-13-610804-7 (alk. paper)ISBN-10: 0-13-610804-0 (alk. paper)1. Computer systems. 2. Computers. 3. Telecommunication. 4. User interfaces

    (Computer systems) I. O’Hallaron, David Richard. II. Title.QA76.5.B795 2010004—dc22

    2009053083

    10 9 8 7 6 5 4 3 2 1—EB—14 13 12 11 10

    ISBN 10: 0-13-610804-0ISBN 13: 978-0-13-610804-7

  • To the students and instructors of the 15-213

    course at Carnegie Mellon University, for inspiring

    us to develop and refine the material for this book.

  • This page intentionally left blank

  • Contents

    Preface xix

    About the Authors xxxiii

    1A Tour of Computer Systems 11.1 Information Is Bits + Context 31.2 Programs Are Translated by Other Programs into Different Forms 41.3 It Pays to Understand How Compilation Systems Work 61.4 Processors Read and Interpret Instructions Stored in Memory 7

    1.4.1 Hardware Organization of a System 71.4.2 Running the hello Program 10

    1.5 Caches Matter 121.6 Storage Devices Form a Hierarchy 131.7 The Operating System Manages the Hardware 14

    1.7.1 Processes 161.7.2 Threads 171.7.3 Virtual Memory 171.7.4 Files 19

    1.8 Systems Communicate with Other Systems Using Networks 201.9 Important Themes 21

    1.9.1 Concurrency and Parallelism 211.9.2 The Importance of Abstractions in Computer Systems 24

    1.10 Summary 25Bibliographic Notes 26

    Part I Program Structure and Execution

    2Representing and Manipulating Information 292.1 Information Storage 33

    2.1.1 Hexadecimal Notation 342.1.2 Words 382.1.3 Data Sizes 38

    vii

  • viii Contents

    2.1.4 Addressing and Byte Ordering 392.1.5 Representing Strings 462.1.6 Representing Code 472.1.7 Introduction to Boolean Algebra 482.1.8 Bit-Level Operations in C 512.1.9 Logical Operations in C 542.1.10 Shift Operations in C 54

    2.2 Integer Representations 562.2.1 Integral Data Types 572.2.2 Unsigned Encodings 582.2.3 Two’s-Complement Encodings 602.2.4 Conversions Between Signed and Unsigned 652.2.5 Signed vs. Unsigned in C 692.2.6 Expanding the Bit Representation of a Number 712.2.7 Truncating Numbers 752.2.8 Advice on Signed vs. Unsigned 76

    2.3 Integer Arithmetic 792.3.1 Unsigned Addition 792.3.2 Two’s-Complement Addition 832.3.3 Two’s-Complement Negation 872.3.4 Unsigned Multiplication 882.3.5 Two’s-Complement Multiplication 892.3.6 Multiplying by Constants 922.3.7 Dividing by Powers of Two 952.3.8 Final Thoughts on Integer Arithmetic 98

    2.4 Floating Point 992.4.1 Fractional Binary Numbers 1002.4.2 IEEE Floating-Point Representation 1032.4.3 Example Numbers 1052.4.4 Rounding 1102.4.5 Floating-Point Operations 1132.4.6 Floating Point in C 114

    2.5 Summary 118Bibliographic Notes 119Homework Problems 119Solutions to Practice Problems 134

    3Machine-Level Representation of Programs 1533.1 A Historical Perspective 1563.2 Program Encodings 159

  • Contents ix

    3.2.1 Machine-Level Code 1603.2.2 Code Examples 1623.2.3 Notes on Formatting 165

    3.3 Data Formats 1673.4 Accessing Information 168

    3.4.1 Operand Specifiers 1693.4.2 Data Movement Instructions 1713.4.3 Data Movement Example 174

    3.5 Arithmetic and Logical Operations 1773.5.1 Load Effective Address 1773.5.2 Unary and Binary Operations 1783.5.3 Shift Operations 1793.5.4 Discussion 1803.5.5 Special Arithmetic Operations 182

    3.6 Control 1853.6.1 Condition Codes 1853.6.2 Accessing the Condition Codes 1873.6.3 Jump Instructions and Their Encodings 1893.6.4 Translating Conditional Branches 1933.6.5 Loops 1973.6.6 Conditional Move Instructions 2063.6.7 Switch Statements 213

    3.7 Procedures 2193.7.1 Stack Frame Structure 2193.7.2 Transferring Control 2213.7.3 Register Usage Conventions 2233.7.4 Procedure Example 2243.7.5 Recursive Procedures 229

    3.8 Array Allocation and Access 2323.8.1 Basic Principles 2323.8.2 Pointer Arithmetic 2333.8.3 Nested Arrays 2353.8.4 Fixed-Size Arrays 2373.8.5 Variable-Size Arrays 238

    3.9 Heterogeneous Data Structures 2413.9.1 Structures 2413.9.2 Unions 2443.9.3 Data Alignment 248

    3.10 Putting It Together: Understanding Pointers 2523.11 Life in the Real World: Using the gdb Debugger 2543.12 Out-of-Bounds Memory References and Buffer Overflow 256

    3.12.1 Thwarting Buffer Overflow Attacks 261

  • x Contents

    3.13 x86-64: Extending IA32 to 64 Bits 2673.13.1 History and Motivation for x86-64 2683.13.2 An Overview of x86-64 2703.13.3 Accessing Information 2733.13.4 Control 2793.13.5 Data Structures 2903.13.6 Concluding Observations about x86-64 291

    3.14 Machine-Level Representations of Floating-Point Programs 2923.15 Summary 293

    Bibliographic Notes 294Homework Problems 294Solutions to Practice Problems 308

    4Processor Architecture 3334.1 The Y86 Instruction Set Architecture 336

    4.1.1 Programmer-Visible State 3364.1.2 Y86 Instructions 3374.1.3 Instruction Encoding 3394.1.4 Y86 Exceptions 3444.1.5 Y86 Programs 3454.1.6 Some Y86 Instruction Details 350

    4.2 Logic Design and the Hardware Control Language HCL 3524.2.1 Logic Gates 3534.2.2 Combinational Circuits and HCL Boolean Expressions 3544.2.3 Word-Level Combinational Circuits and HCL Integer

    Expressions 3554.2.4 Set Membership 3604.2.5 Memory and Clocking 361

    4.3 Sequential Y86 Implementations 3644.3.1 Organizing Processing into Stages 3644.3.2 SEQ Hardware Structure 3754.3.3 SEQ Timing 3794.3.4 SEQ Stage Implementations 383

    4.4 General Principles of Pipelining 3914.4.1 Computational Pipelines 3924.4.2 A Detailed Look at Pipeline Operation 3934.4.3 Limitations of Pipelining 3944.4.4 Pipelining a System with Feedback 398

    4.5 Pipelined Y86 Implementations 4004.5.1 SEQ+: Rearranging the Computation Stages 400

  • Contents xi

    4.5.2 Inserting Pipeline Registers 4014.5.3 Rearranging and Relabeling Signals 4054.5.4 Next PC Prediction 4064.5.5 Pipeline Hazards 4084.5.6 Avoiding Data Hazards by Stalling 4134.5.7 Avoiding Data Hazards by Forwarding 4154.5.8 Load/Use Data Hazards 4184.5.9 Exception Handling 4204.5.10 PIPE Stage Implementations 4234.5.11 Pipeline Control Logic 4314.5.12 Performance Analysis 4444.5.13 Unfinished Business 446

    4.6 Summary 4494.6.1 Y86 Simulators 450Bibliographic Notes 451Homework Problems 451Solutions to Practice Problems 457

    5Optimizing Program Performance 4735.1 Capabilities and Limitations of Optimizing Compilers 4765.2 Expressing Program Performance 4805.3 Program Example 4825.4 Eliminating Loop Inefficiencies 4865.5 Reducing Procedure Calls 4905.6 Eliminating Unneeded Memory References 4915.7 Understanding Modern Processors 496

    5.7.1 Overall Operation 4975.7.2 Functional Unit Performance 5005.7.3 An Abstract Model of Processor Operation 502

    5.8 Loop Unrolling 5095.9 Enhancing Parallelism 513

    5.9.1 Multiple Accumulators 5145.9.2 Reassociation Transformation 518

    5.10 Summary of Results for Optimizing Combining Code 5245.11 Some Limiting Factors 525

    5.11.1 Register Spilling 5255.11.2 Branch Prediction and Misprediction Penalties 526

    5.12 Understanding Memory Performance 5315.12.1 Load Performance 5315.12.2 Store Performance 532

  • xii Contents

    5.13 Life in the Real World: Performance Improvement Techniques 5395.14 Identifying and Eliminating Performance Bottlenecks 540

    5.14.1 Program Profiling 5405.14.2 Using a Profiler to Guide Optimization 5425.14.3 Amdahl’s Law 545

    5.15 Summary 547Bibliographic Notes 548Homework Problems 549Solutions to Practice Problems 552

    6The Memory Hierarchy 5596.1 Storage Technologies 561

    6.1.1 Random-Access Memory 5616.1.2 Disk Storage 5706.1.3 Solid State Disks 5816.1.4 Storage Technology Trends 583

    6.2 Locality 5866.2.1 Locality of References to Program Data 5876.2.2 Locality of Instruction Fetches 5886.2.3 Summary of Locality 589

    6.3 The Memory Hierarchy 5916.3.1 Caching in the Memory Hierarchy 5926.3.2 Summary of Memory Hierarchy Concepts 595

    6.4 Cache Memories 5966.4.1 Generic Cache Memory Organization 5976.4.2 Direct-Mapped Caches 5996.4.3 Set Associative Caches 6066.4.4 Fully Associative Caches 6086.4.5 Issues with Writes 6116.4.6 Anatomy of a Real Cache Hierarchy 6126.4.7 Performance Impact of Cache Parameters 614

    6.5 Writing Cache-friendly Code 6156.6 Putting It Together: The Impact of Caches on Program Performance 620

    6.6.1 The Memory Mountain 6216.6.2 Rearranging Loops to Increase Spatial Locality 6256.6.3 Exploiting Locality in Your Programs 629

    6.7 Summary 629Bibliographic Notes 630Homework Problems 631Solutions to Practice Problems 642

  • Contents xiii

    Part II Running Programs on a System

    7Linking 6537.1 Compiler Drivers 6557.2 Static Linking 6577.3 Object Files 6577.4 Relocatable Object Files 6587.5 Symbols and Symbol Tables 6607.6 Symbol Resolution 663

    7.6.1 How Linkers Resolve Multiply Defined Global Symbols 6647.6.2 Linking with Static Libraries 6677.6.3 How Linkers Use Static Libraries to Resolve References 670

    7.7 Relocation 6727.7.1 Relocation Entries 6727.7.2 Relocating Symbol References 673

    7.8 Executable Object Files 6787.9 Loading Executable Object Files 6797.10 Dynamic Linking with Shared Libraries 6817.11 Loading and Linking Shared Libraries from Applications 6837.12 Position-Independent Code (PIC) 6877.13 Tools for Manipulating Object Files 6907.14 Summary 691

    Bibliographic Notes 691Homework Problems 692Solutions to Practice Problems 698

    8Exceptional Control Flow 7018.1 Exceptions 703

    8.1.1 Exception Handling 7048.1.2 Classes of Exceptions 7068.1.3 Exceptions in Linux/IA32 Systems 708

    8.2 Processes 7128.2.1 Logical Control Flow 7128.2.2 Concurrent Flows 7138.2.3 Private Address Space 7148.2.4 User and Kernel Modes 7148.2.5 Context Switches 716

  • xiv Contents

    8.3 System Call Error Handling 7178.4 Process Control 718

    8.4.1 Obtaining Process IDs 7198.4.2 Creating and Terminating Processes 7198.4.3 Reaping Child Processes 7238.4.4 Putting Processes to Sleep 7298.4.5 Loading and Running Programs 7308.4.6 Using fork and execve to Run Programs 733

    8.5 Signals 7368.5.1 Signal Terminology 7388.5.2 Sending Signals 7398.5.3 Receiving Signals 7428.5.4 Signal Handling Issues 7458.5.5 Portable Signal Handling 7528.5.6 Explicitly Blocking and Unblocking Signals 7538.5.7 Synchronizing Flows to Avoid Nasty Concurrency Bugs 755

    8.6 Nonlocal Jumps 7598.7 Tools for Manipulating Processes 7628.8 Summary 763

    Bibliographic Notes 763Homework Problems 764Solutions to Practice Problems 771

    9Virtual Memory 7759.1 Physical and Virtual Addressing 7779.2 Address Spaces 7789.3 VM as a Tool for Caching 779

    9.3.1 DRAM Cache Organization 7809.3.2 Page Tables 7809.3.3 Page Hits 7829.3.4 Page Faults 7829.3.5 Allocating Pages 7839.3.6 Locality to the Rescue Again 784

    9.4 VM as a Tool for Memory Management 7859.5 VM as a Tool for Memory Protection 7869.6 Address Translation 787

    9.6.1 Integrating Caches and VM 7919.6.2 Speeding up Address Translation with a TLB 7919.6.3 Multi-Level Page Tables 7929.6.4 Putting It Together: End-to-end Address Translation 794

    9.7 Case Study: The Intel Core i7/Linux Memory System 799

  • Contents xv

    9.7.1 Core i7 Address Translation 8009.7.2 Linux Virtual Memory System 803

    9.8 Memory Mapping 8079.8.1 Shared Objects Revisited 8079.8.2 The fork Function Revisited 8099.8.3 The execve Function Revisited 8109.8.4 User-level Memory Mapping with the mmap Function 810

    9.9 Dynamic Memory Allocation 8129.9.1 The malloc and free Functions 8149.9.2 Why Dynamic Memory Allocation? 8169.9.3 Allocator Requirements and Goals 8179.9.4 Fragmentation 8199.9.5 Implementation Issues 8209.9.6 Implicit Free Lists 8209.9.7 Placing Allocated Blocks 8229.9.8 Splitting Free Blocks 8239.9.9 Getting Additional Heap Memory 8239.9.10 Coalescing Free Blocks 8249.9.11 Coalescing with Boundary Tags 8249.9.12 Putting It Together: Implementing a Simple Allocator 8279.9.13 Explicit Free Lists 8359.9.14 Segregated Free Lists 836

    9.10 Garbage Collection 8389.10.1 Garbage Collector Basics 8399.10.2 Mark&Sweep Garbage Collectors 8409.10.3 Conservative Mark&Sweep for C Programs 842

    9.11 Common Memory-Related Bugs in C Programs 8439.11.1 Dereferencing Bad Pointers 8439.11.2 Reading Uninitialized Memory 8439.11.3 Allowing Stack Buffer Overflows 8449.11.4 Assuming that Pointers and the Objects They Point to Are the

    Same Size 8449.11.5 Making Off-by-One Errors 8459.11.6 Referencing a Pointer Instead of the Object It Points to 8459.11.7 Misunderstanding Pointer Arithmetic 8469.11.8 Referencing Nonexistent Variables 8469.11.9 Referencing Data in Free Heap Blocks 8479.11.10 Introducing Memory Leaks 847

    9.12 Summary 848

    Bibliographic Notes 848

    Homework Problems 849

    Solutions to Practice Problems 853

  • xvi Contents

    Part III Interaction and Communication BetweenPrograms

    10System-Level I/O 86110.1 Unix I/O 86210.2 Opening and Closing Files 86310.3 Reading and Writing Files 86510.4 Robust Reading and Writing with the Rio Package 867

    10.4.1 Rio Unbuffered Input and Output Functions 86710.4.2 Rio Buffered Input Functions 868

    10.5 Reading File Metadata 87310.6 Sharing Files 87510.7 I/O Redirection 87710.8 Standard I/O 87910.9 Putting It Together: Which I/O Functions Should I Use? 88010.10 Summary 881

    Bibliographic Notes 882Homework Problems 882Solutions to Practice Problems 883

    11Network Programming 88511.1 The Client-Server Programming Model 88611.2 Networks 88711.3 The Global IP Internet 891

    11.3.1 IP Addresses 89311.3.2 Internet Domain Names 89511.3.3 Internet Connections 899

    11.4 The Sockets Interface 90011.4.1 Socket Address Structures 90111.4.2 The socket Function 90211.4.3 The connect Function 90311.4.4 The open_clientfd Function 90311.4.5 The bind Function 90411.4.6 The listen Function 90511.4.7 The open_listenfd Function 90511.4.8 The accept Function 90711.4.9 Example Echo Client and Server 908

  • Contents xvii

    11.5 Web Servers 91111.5.1 Web Basics 91111.5.2 Web Content 91211.5.3 HTTP Transactions 91411.5.4 Serving Dynamic Content 916

    11.6 Putting It Together: The Tiny Web Server 91911.7 Summary 927

    Bibliographic Notes 928Homework Problems 928Solutions to Practice Problems 929

    12Concurrent Programming 93312.1 Concurrent Programming with Processes 935

    12.1.1 A Concurrent Server Based on Processes 93612.1.2 Pros and Cons of Processes 937

    12.2 Concurrent Programming with I/O Multiplexing 93912.2.1 A Concurrent Event-Driven Server Based on I/O

    Multiplexing 94212.2.2 Pros and Cons of I/O Multiplexing 946

    12.3 Concurrent Programming with Threads 94712.3.1 Thread Execution Model 94812.3.2 Posix Threads 94812.3.3 Creating Threads 95012.3.4 Terminating Threads 95012.3.5 Reaping Terminated Threads 95112.3.6 Detaching Threads 95112.3.7 Initializing Threads 95212.3.8 A Concurrent Server Based on Threads 952

    12.4 Shared Variables in Threaded Programs 95412.4.1 Threads Memory Model 95512.4.2 Mapping Variables to Memory 95612.4.3 Shared Variables 956

    12.5 Synchronizing Threads with Semaphores 95712.5.1 Progress Graphs 96012.5.2 Semaphores 96312.5.3 Using Semaphores for Mutual Exclusion 96412.5.4 Using Semaphores to Schedule Shared Resources 96612.5.5 Putting It Together: A Concurrent Server Based on

    Prethreading 97012.6 Using Threads for Parallelism 974

  • xviii Contents

    12.7 Other Concurrency Issues 97912.7.1 Thread Safety 97912.7.2 Reentrancy 98012.7.3 Using Existing Library Functions in Threaded Programs 98212.7.4 Races 98312.7.5 Deadlocks 985

    12.8 Summary 988Bibliographic Notes 989Homework Problems 989Solutions to Practice Problems 994

    AError Handling 999A.1 Error Handling in Unix Systems 1000A.2 Error-Handling Wrappers 1001

    References 1005

    Index 1011

  • Preface

    This book (CS:APP) is for computer scientists, computer engineers, and otherswho want to be able to write better programs by learning what is going on “underthe hood” of a computer system.

    Our aim is to explain the enduring concepts underlying all computer systems,and to show you the concrete ways that these ideas affect the correctness, perfor-mance, and utility of your application programs. Other systems books are writtenfrom a builder’s perspective, describing how to implement the hardware or the sys-tems software, including the operating system, compiler, and network interface.This book is written from a programmer’s perspective, describing how applicationprogrammers can use their knowledge of a system to write better programs. Ofcourse, learning what a system is supposed to do provides a good first step in learn-ing how to build one, and so this book also serves as a valuable introduction tothose who go on to implement systems hardware and software.

    If you study and learn the concepts in this book, you will be on your way tobecoming the rare “power programmer” who knows how things work and howto fix them when they break. Our aim is to present the fundamental concepts inways that you will find useful right away. You will also be prepared to delve deeper,studying such topics as compilers, computer architecture, operating systems, em-bedded systems, and networking.

    Assumptions about the Reader’s Background

    The presentation of machine code in the book is based on two related formatssupported by Intel and its competitors, colloquially known as “x86.” IA32 is themachine code that has become the de facto standard for a wide range of systems.x86-64 is an extension of IA32 to enable programs to operate on larger data and toreference a wider range of memory addresses. Since x86-64 systems are able to runIA32 code, both of these forms of machine code will see widespread use for theforeseeable future. We consider how these machines execute C programs on Unixor Unix-like (such as Linux) operating systems. (To simplify our presentation,we will use the term “Unix” as an umbrella term for systems having Unix astheir heritage, including Solaris, Mac OS, and Linux.) The text contains numerousprogramming examples that have been compiled and run on Linux systems. Weassume that you have access to such a machine and are able to log in and do simplethings such as changing directories.

    If your computer runs Microsoft Windows, you have two choices. First, youcan get a copy of Linux (www.ubuntu.com) and install it as a “dual boot” option,so that your machine can run either operating system. Alternatively, by installinga copy of the Cygwin tools (www.cygwin.com), you can run a Unix-like shell under

    xix

    www.ubuntu.comwww.cygwin.com

  • xx Preface

    Windows and have an environment very close to that provided by Linux. Not allfeatures of Linux are available under Cygwin, however.

    We also assume that you have some familiarity with C or C++. If your onlyprior experience is with Java, the transition will require more effort on your part,but we will help you. Java and C share similar syntax and control statements.However, there are aspects of C, particularly pointers, explicit dynamic memoryallocation, and formatted I/O, that do not exist in Java. Fortunately, C is a smalllanguage, and it is clearly and beautifully described in the classic “K&R” textby Brian Kernighan and Dennis Ritchie [58]. Regardless of your programmingbackground, consider K&R an essential part of your personal systems library.

    Several of the early chapters in the book explore the interactions betweenC programs and their machine-language counterparts. The machine-languageexamples were all generated by the GNU gcc compiler running on IA32 and x86-64 processors. We do not assume any prior experience with hardware, machinelanguage, or assembly-language programming.

    New to C? Advice on the C programming language

    To help readers whose background in C programming is weak (or nonexistent), we have also includedthese special notes to highlight features that are especially important in C. We assume you are familiarwith C++ or Java.

    How to Read the Book

    Learning how computer systems work from a programmer’s perspective is greatfun, mainly because you can do it actively. Whenever you learn something new,you can try it out right away and see the result first hand. In fact, we believe thatthe only way to learn systems is to do systems, either working concrete problemsor writing and running programs on real systems.

    This theme pervades the entire book. When a new concept is introduced, itis followed in the text by one or more practice problems that you should workimmediately to test your understanding. Solutions to the practice problems areat the end of each chapter. As you read, try to solve each problem on your own,and then check the solution to make sure you are on the right track. Each chapteris followed by a set of homework problems of varying difficulty. Your instructorhas the solutions to the homework problems in an Instructor’s Manual. For eachhomework problem, we show a rating of the amount of effort we feel it will require:

    ◆ Should require just a few minutes. Little or no programming required.

    ◆◆ Might require up to 20 minutes. Often involves writing and testing some code.Many of these are derived from problems we have given on exams.

    ◆◆◆ Requires a significant effort, perhaps 1–2 hours. Generally involves writingand testing a significant amount of code.

    ◆◆◆◆ A lab assignment, requiring up to 10 hours of effort.

  • Preface xxi

    code/intro/hello.c

    1 #include

    2

    3 int main()

    4 {

    5 printf("hello, world\n");

    6 return 0;

    7 }

    code/intro/hello.c

    Figure 1 A typical code example.

    Each code example in the text was formatted directly, without any manualintervention, from a C program compiled with gcc and tested on a Linux system.Of course, your system may have a different version of gcc, or a different compileraltogether, and so your compiler might generate different machine code, but theoverall behavior should be the same. All of the source code is available from theCS:APP Web page at csapp.cs.cmu.edu. In the text, the file names of the sourceprograms are documented in horizontal bars that surround the formatted code.For example, the program in Figure 1 can be found in the file hello.c in directorycode/intro/. We encourage you to try running the example programs on yoursystem as you encounter them.

    To avoid having a book that is overwhelming, both in bulk and in content,we have created a number of Web asides containing material that supplementsthe main presentation of the book. These asides are referenced within the bookwith a notation of the form CHAP:TOP, where CHAP is a short encoding of thechapter subject, and TOP is short code for the topic that is covered. For example,Web Aside data:bool contains supplementary material on Boolean algebra forthe presentation on data representations in Chapter 2, while Web Aside arch:vlogcontains material describing processor designs using the Verilog hardware descrip-tion language, supplementing the presentation of processor design in Chapter 4.All of these Web asides are available from the CS:APP Web page.

    Aside What is an aside?

    You will encounter asides of this form throughout the text. Asides are parenthetical remarks that giveyou some additional insight into the current topic. Asides serve a number of purposes. Some are littlehistory lessons. For example, where did C, Linux, and the Internet come from? Other asides are meantto clarify ideas that students often find confusing. For example, what is the difference between a cacheline, set, and block? Other asides give real-world examples. For example, how a floating-point errorcrashed a French rocket, or what the geometry of an actual Seagate disk drive looks like. Finally, someasides are just fun stuff. For example, what is a “hoinky”?

  • xxii Preface

    Book Overview

    The CS:APP book consists of 12 chapters designed to capture the core ideas incomputer systems:

    . Chapter 1: A Tour of Computer Systems. This chapter introduces the majorideas and themes in computer systems by tracing the life cycle of a simple“hello, world” program.

    . Chapter 2: Representing and Manipulating Information. We cover computerarithmetic, emphasizing the properties of unsigned and two’s-complementnumber representations that affect programmers. We consider how numbersare represented and therefore what range of values can be encoded for a givenword size. We consider the effect of casting between signed and unsigned num-bers. We cover the mathematical properties of arithmetic operations. Noviceprogrammers are often surprised to learn that the (two’s-complement) sumor product of two positive numbers can be negative. On the other hand, two’s-complement arithmetic satisfies the algebraic properties of a ring, and hence acompiler can safely transform multiplication by a constant into a sequence ofshifts and adds. We use the bit-level operations of C to demonstrate the prin-ciples and applications of Boolean algebra. We cover the IEEE floating-pointformat in terms of how it represents values and the mathematical propertiesof floating-point operations.

    Having a solid understanding of computer arithmetic is critical to writingreliable programs. For example, programmers and compilers cannot replacethe expression (x

  • Preface xxiii

    combined in a datapath that executes a simplified subset of the IA32 instruc-tion set called “Y86.” We begin with the design of a single-cycle datapath. Thisdesign is conceptually very simple, but it would not be very fast. We then intro-duce pipelining, where the different steps required to process an instructionare implemented as separate stages. At any given time, each stage can workon a different instruction. Our five-stage processor pipeline is much more re-alistic. The control logic for the processor designs is described using a simplehardware description language called HCL. Hardware designs written in HCLcan be compiled and linked into simulators provided with the textbook, andthey can be used to generate Verilog descriptions suitable for synthesis intoworking hardware.

    . Chapter 5: Optimizing Program Performance. This chapter introduces a num-ber of techniques for improving code performance, with the idea being thatprogrammers learn to write their C code in such a way that a compiler canthen generate efficient machine code. We start with transformations that re-duce the work to be done by a program and hence should be standard practicewhen writing any program for any machine. We then progress to transforma-tions that enhance the degree of instruction-level parallelism in the generatedmachine code, thereby improving their performance on modern “superscalar”processors. To motivate these transformations, we introduce a simple opera-tional model of how modern out-of-order processors work, and show how tomeasure the potential performance of a program in terms of the critical pathsthrough a graphical representation of a program. You will be surprised howmuch you can speed up a program by simple transformations of the C code.

    . Chapter 6: The Memory Hierarchy.The memory system is one of the most visi-ble parts of a computer system to application programmers. To this point, youhave relied on a conceptual model of the memory system as a linear array withuniform access times. In practice, a memory system is a hierarchy of storagedevices with different capacities, costs, and access times. We cover the differ-ent types of RAM and ROM memories and the geometry and organization ofmagnetic-disk and solid-state drives. We describe how these storage devicesare arranged in a hierarchy. We show how this hierarchy is made possible bylocality of reference. We make these ideas concrete by introducing a uniqueview of a memory system as a “memory mountain” with ridges of temporallocality and slopes of spatial locality. Finally, we show you how to improve theperformance of application programs by improving their temporal and spatiallocality.

    . Chapter 7: Linking. This chapter covers both static and dynamic linking, in-cluding the ideas of relocatable and executable object files, symbol resolution,relocation, static libraries, shared object libraries, and position-independentcode. Linking is not covered in most systems texts, but we cover it for sev-eral reasons. First, some of the most confusing errors that programmers canencounter are related to glitches during linking, especially for large softwarepackages. Second, the object files produced by linkers are tied to conceptssuch as loading, virtual memory, and memory mapping.

  • xxiv Preface

    . Chapter 8: Exceptional Control Flow. In this part of the presentation, westep beyond the single-program model by introducing the general conceptof exceptional control flow (i.e., changes in control flow that are outside thenormal branches and procedure calls). We cover examples of exceptionalcontrol flow that exist at all levels of the system, from low-level hardwareexceptions and interrupts, to context switches between concurrent processes,to abrupt changes in control flow caused by the delivery of Unix signals, tothe nonlocal jumps in C that break the stack discipline.

    This is the part of the book where we introduce the fundamental idea ofa process, an abstraction of an executing program. You will learn how pro-cesses work and how they can be created and manipulated from applicationprograms. We show how application programmers can make use of multipleprocesses via Unix system calls. When you finish this chapter, you will be ableto write a Unix shell with job control. It is also your first introduction to thenondeterministic behavior that arises with concurrent program execution.

    . Chapter 9: Virtual Memory. Our presentation of the virtual memory systemseeks to give some understanding of how it works and its characteristics. Wewant you to know how it is that the different simultaneous processes can eachuse an identical range of addresses, sharing some pages but having individualcopies of others. We also cover issues involved in managing and manipulatingvirtual memory. In particular, we cover the operation of storage allocatorssuch as the Unix malloc and free operations. Covering this material servesseveral purposes. It reinforces the concept that the virtual memory space isjust an array of bytes that the program can subdivide into different storageunits. It helps you understand the effects of programs containing memory ref-erencing errors such as storage leaks and invalid pointer references. Finally,many application programmers write their own storage allocators optimizedtoward the needs and characteristics of the application. This chapter, morethan any other, demonstrates the benefit of covering both the hardware andthe software aspects of computer systems in a unified way. Traditional com-puter architecture and operating systems texts present only part of the virtualmemory story.

    . Chapter 10: System-Level I/O. We cover the basic concepts of Unix I/O suchas files and descriptors. We describe how files are shared, how I/O redirectionworks, and how to access file metadata. We also develop a robust buffered I/Opackage that deals correctly with a curious behavior known as short counts,where the library function reads only part of the input data. We cover the Cstandard I/O library and its relationship to Unix I/O, focusing on limitationsof standard I/O that make it unsuitable for network programming. In general,the topics covered in this chapter are building blocks for the next two chapterson network and concurrent programming.

    . Chapter 11: Network Programming. Networks are interesting I/O devices toprogram, tying together many of the ideas that we have studied earlier in thetext, such as processes, signals, byte ordering, memory mapping, and dynamic

  • Preface xxv

    storage allocation. Network programs also provide a compelling context forconcurrency, which is the topic of the next chapter. This chapter is a thin slicethrough network programming that gets you to the point where you can writea Web server. We cover the client-server model that underlies all networkapplications. We present a programmer’s view of the Internet, and show howto write Internet clients and servers using the sockets interface. Finally, weintroduce HTTP and develop a simple iterative Web server.

    . Chapter 12: Concurrent Programming. This chapter introduces concurrentprogramming using Internet server design as the running motivational ex-ample. We compare and contrast the three basic mechanisms for writing con-current programs—processes, I/O multiplexing, and threads—and show howto use them to build concurrent Internet servers. We cover basic principles ofsynchronization using P and V semaphore operations, thread safety and reen-trancy, race conditions, and deadlocks. Writing concurrent code is essentialfor most server applications. We also describe the use of thread-level pro-gramming to express parallelism in an application program, enabling fasterexecution on multi-core processors. Getting all of the cores working on a sin-gle computational problem requires a careful coordination of the concurrentthreads, both for correctness and to achieve high performance.

    New to this Edition

    The first edition of this book was published with a copyright of 2003. Consider-ing the rapid evolution of computer technology, the book content has held upsurprisingly well. Intel x86 machines running Unix-like operating systems andprogrammed in C proved to be a combination that continues to encompass manysystems today. Changes in hardware technology and compilers and the experienceof many instructors teaching the material have prompted a substantial revision.

    Here are some of the more significant changes:

    . Chapter 2: Representing and Manipulating Information.We have tried to makethis material more accessible, with more careful explanations of conceptsand with many more practice and homework problems. We moved some ofthe more theoretical aspects to Web asides. We also describe some of thesecurity vulnerabilities that arise due to the overflow properties of computerarithmetic.

    . Chapter 3: Machine-Level Representation of Programs.We have extended ourcoverage to include x86-64, the extension of x86 processors to a 64-bit wordsize. We also use the code generated by a more recent version of gcc. We haveenhanced our coverage of buffer overflow vulnerabilities. We have createdWeb asides on two different classes of instructions for floating point, andalso a view of the more exotic transformations made when compilers attempthigher degrees of optimization. Another Web aside describes how to embedx86 assembly code within a C program.

  • xxvi Preface

    . Chapter 4: Processor Architecture. We include a more careful exposition ofexception detection and handling in our processor design. We have also cre-ated a Web aside showing a mapping of our processor designs into Verilog,enabling synthesis into working hardware.

    . Chapter 5: Optimizing Program Performance. We have greatly changed ourdescription of how an out-of-order processor operates, and we have createda simple technique for analyzing program performance based on the pathsin a data-flow graph representation of a program. A Web aside describeshow C programmers can write programs that make use of the SIMD (single-instruction, multiple-data) instructions found in more recent versions of x86processors.

    . Chapter 6: The Memory Hierarchy. We have added material on solid-statedisks, and we have updated our presentation to be based on the memoryhierarchy of an Intel Core i7 processor.

    . Chapter 7: Linking. This chapter has changed only slightly.

    . Chapter 8: Exceptional Control Flow. We have enhanced our discussion ofhow the process model introduces some fundamental concepts of concurrency,such as nondeterminism.

    . Chapter 9: Virtual Memory.We have updated our memory system case study todescribe the 64-bit Intel Core i7 processor. We have also updated our sampleimplementation of malloc to work for both 32-bit and 64-bit execution.

    . Chapter 10: System-Level I/O. This chapter has changed only slightly.

    . Chapter 11: Network Programming. This chapter has changed only slightly.

    . Chapter 12: Concurrent Programming.We have increased our coverage of thegeneral principles of concurrency, and we also describe how programmerscan use thread-level parallelism to make programs run faster on multi-coremachines.

    In addition, we have added and revised a number of practice and homeworkproblems.

    Origins of the Book

    The book stems from an introductory course that we developed at Carnegie Mel-lon University in the Fall of 1998, called 15-213: Introduction to Computer Systems(ICS) [14]. The ICS course has been taught every semester since then, each time toabout 150–250 students, ranging from sophomores to masters degree students andwith a wide variety of majors. It is a required course for all undergraduates in theCS and ECE departments at Carnegie Mellon, and it has become a prerequisitefor most upper-level systems courses.

    The idea with ICS was to introduce students to computers in a different way.Few of our students would have the opportunity to build a computer system. Onthe other hand, most students, including all computer scientists and computerengineers, will be required to use and program computers on a daily basis. So we

  • Preface xxvii

    decided to teach about systems from the point of view of the programmer, usingthe following filter: we would cover a topic only if it affected the performance,correctness, or utility of user-level C programs.

    For example, topics such as hardware adder and bus designs were out. Topicssuch as machine language were in, but instead of focusing on how to write assem-bly language by hand, we would look at how a C compiler translates C constructsinto machine code, including pointers, loops, procedure calls, and switch state-ments. Further, we would take a broader and more holistic view of the systemas both hardware and systems software, covering such topics as linking, loading,processes, signals, performance optimization, virtual memory, I/O, and networkand concurrent programming.

    This approach allowed us to teach the ICS course in a way that is practical,concrete, hands-on, and exciting for the students. The response from our studentsand faculty colleagues was immediate and overwhelmingly positive, and we real-ized that others outside of CMU might benefit from using our approach. Hencethis book, which we developed from the ICS lecture notes, and which we havenow revised to reflect changes in technology and how computer systems are im-plemented.

    For Instructors: Courses Based on the Book

    Instructors can use the CS:APP book to teach five different kinds of systemscourses (Figure 2). The particular course depends on curriculum requirements,personal taste, and the backgrounds and abilities of the students. From left toright in the figure, the courses are characterized by an increasing emphasis on theprogrammer’s perspective of a system. Here is a brief description:

    . ORG: A computer organization course with traditional topics covered in anuntraditional style. Traditional topics such as logic design, processor architec-ture, assembly language, and memory systems are covered. However, there ismore emphasis on the impact for the programmer. For example, data repre-sentations are related back to the data types and operations of C programs,and the presentation on assembly code is based on machine code generatedby a C compiler rather than hand-written assembly code.

    . ORG+: The ORG course with additional emphasis on the impact of hardwareon the performance of application programs. Compared to ORG, studentslearn more about code optimization and about improving the memory per-formance of their C programs.

    . ICS: The baseline ICS course, designed to produce enlightened programmerswho understand the impact of the hardware, operating system, and compila-tion system on the performance and correctness of their application programs.A significant difference from ORG+ is that low-level processor architecture isnot covered. Instead, programmers work with a higher-level model of a mod-ern out-of-order processor. The ICS course fits nicely into a 10-week quarter,and can also be stretched to a 15-week semester if covered at a more leisurelypace.

  • xxviii Preface

    Course

    Chapter Topic ORG ORG+ ICS ICS+ SP

    1 Tour of systems • • • • •2 Data representation • • • • � (d)3 Machine language • • • • •4 Processor architecture • •5 Code optimization • • •6 Memory hierarchy � (a) • • • � (a)7 Linking � (c) � (c) •8 Exceptional control flow • • •9 Virtual memory � (b) • • • •

    10 System-level I/O • •11 Network programming • •12 Concurrent programming • •

    Figure 2 Five systems courses based on the CS:APP book. Notes: (a) Hardware only,(b) No dynamic storage allocation, (c) No dynamic linking, (d) No floating point. ICS+is the 15-213 course from Carnegie Mellon.

    . ICS+: The baseline ICS course with additional coverage of systems program-ming topics such as system-level I/O, network programming, and concurrentprogramming. This is the semester-long Carnegie Mellon course, which coversevery chapter in CS:APP except low-level processor architecture.

    . SP: A systems programming course. Similar to the ICS+ course, but dropsfloating point and performance optimization, and places more emphasis onsystems programming, including process control, dynamic linking, system-level I/O, network programming, and concurrent programming. Instructorsmight want to supplement from other sources for advanced topics such asdaemons, terminal control, and Unix IPC.

    The main message of Figure 2 is that the CS:APP book gives a lot of optionsto students and instructors. If you want your students to be exposed to lower-level processor architecture, then that option is available via the ORG and ORG+courses. On the other hand, if you want to switch from your current computerorganization course to an ICS or ICS+ course, but are wary are making sucha drastic change all at once, then you can move toward ICS incrementally. Youcan start with ORG, which teaches the traditional topics in a nontraditional way.Once you are comfortable with that material, then you can move to ORG+, andeventually to ICS. If students have no experience in C (for example they haveonly programmed in Java), you could spend several weeks on C and then coverthe material of ORG or ICS.

  • Preface xxix

    Finally, we note that the ORG+ and SP courses would make a nice two-term(either quarters or semesters) sequence. Or you might consider offering ICS+ asone term of ICS and one term of SP.

    Classroom-Tested Laboratory Exercises

    The ICS+ course at Carnegie Mellon receives very high evaluations from students.Median scores of 5.0/5.0 and means of 4.6/5.0 are typical for the student courseevaluations. Students cite the fun, exciting, and relevant laboratory exercises asthe primary reason. The labs are available from the CS:APP Web page. Here areexamples of the labs that are provided with the book:

    . Data Lab. This lab requires students to implement simple logical and arith-metic functions, but using a highly restricted subset of C. For example, theymust compute the absolute value of a number using only bit-level operations.This lab helps students understand the bit-level representations of C datatypes and the bit-level behavior of the operations on data.

    . Binary Bomb Lab. A binary bomb is a program provided to students as anobject-code file. When run, it prompts the user to type in six different strings.If any of these is incorrect, the bomb “explodes,” printing an error messageand logging the event on a grading server. Students must “defuse” theirown unique bombs by disassembling and reverse engineering the programsto determine what the six strings should be. The lab teaches students tounderstand assembly language, and also forces them to learn how to use adebugger.

    . Buffer Overflow Lab. Students are required to modify the run-time behaviorof a binary executable by exploiting a buffer overflow vulnerability. This labteaches the students about the stack discipline, and teaches them about thedanger of writing code that is vulnerable to buffer overflow attacks.

    . Architecture Lab. Several of the homework problems of Chapter 4 can becombined into a lab assignment, where students modify the HCL descriptionof a processor to add new instructions, change the branch prediction policy,or add or remove bypassing paths and register ports. The resulting processorscan be simulated and run through automated tests that will detect most of thepossible bugs. This lab lets students experience the exciting parts of processordesign without requiring a complete background in logic design and hardwaredescription languages.

    . Performance Lab. Students must optimize the performance of an applicationkernel function such as convolution or matrix transposition. This lab providesa very clear demonstration of the properties of cache memories, and givesstudents experience with low-level program optimization.

    . Shell Lab.Students implement their own Unix shell program with job control,including the ctrl-c and ctrl-zkeystrokes, fg, bg, and jobs commands. Thisis the student’s first introduction to concurrency, and gives them a clear ideaof Unix process control, signals, and signal handling.

  • xxx Preface

    . Malloc Lab. Students implement their own versions of malloc, free, and(optionally) realloc. This lab gives students a clear understanding of datalayout and organization, and requires them to evaluate different trade-offsbetween space and time efficiency.

    . Proxy Lab. Students implement a concurrent Web proxy that sits betweentheir browsers and the rest of the World Wide Web. This lab exposes thestudents to such topics as Web clients and servers, and ties together many ofthe concepts from the course, such as byte ordering, file I/O, process control,signals, signal handling, memory mapping, sockets, and concurrency. Studentslike being able to see their programs in action with real Web browsers and Webservers.

    The CS:APP Instructor’s Manual has a detailed discussion of the labs, as wellas directions for downloading the support software.

    Acknowledgments for the Second Edition

    We are deeply grateful to the many people who have helped us produce this secondedition of the CS:APP text.

    First and foremost, we would to recognize our colleagues who have taught theICS course at Carnegie Mellon for their insightful feedback and encouragement:Guy Blelloch, Roger Dannenberg, David Eckhardt, Greg Ganger, Seth Goldstein,Greg Kesden, Bruce Maggs, Todd Mowry, Andreas Nowatzyk, Frank Pfenning,and Markus Pueschel.

    Thanks also to our sharp-eyed readers who contributed reports to the erratapage for the first edition: Daniel Amelang, Rui Baptista, Quarup Barreirinhas,Michael Bombyk, Jörg Brauer, Jordan Brough, Yixin Cao, James Caroll, Rui Car-valho, Hyoung-Kee Choi, Al Davis, Grant Davis, Christian Dufour, Mao Fan,Tim Freeman, Inge Frick, Max Gebhardt, Jeff Goldblat, Thomas Gross, AnitaGupta, John Hampton, Hiep Hong, Greg Israelsen, Ronald Jones, Haudy Kazemi,Brian Kell, Constantine Kousoulis, Sacha Krakowiak, Arun Krishnaswamy, Mar-tin Kulas, Michael Li, Zeyang Li, Ricky Liu, Mario Lo Conte, Dirk Maas, DevonMacey, Carl Marcinik, Will Marrero, Simone Martins, Tao Men, Mark Morris-sey, Venkata Naidu, Bhas Nalabothula, Thomas Niemann, Eric Peskin, David Po,Anne Rogers, John Ross, Michael Scott, Seiki, Ray Shih, Darren Shultz, ErikSilkensen, Suryanto, Emil Tarazi, Nawanan Theera-Ampornpunt, Joe Trdinich,Michael Trigoboff, James Troup, Martin Vopatek, Alan West, Betsy Wolff, TimWong, James Woodruff, Scott Wright, Jackie Xiao, Guanpeng Xu, Qing Xu, CarenYang, Yin Yongsheng, Wang Yuanxuan, Steven Zhang, and Day Zhong. Specialthanks to Inge Frick, who identified a subtle deep copy bug in our lock-and-copyexample, and to Ricky Liu, for his amazing proofreading skills.

    Our Intel Labs colleagues Andrew Chien and Limor Fix were exceptionallysupportive throughout the writing of the text. Steve Schlosser graciously providedsome disk drive characterizations. Casey Helfrich and Michael Ryan installedand maintained our new Core i7 box. Michael Kozuch, Babu Pillai, and JasonCampbell provided valuable insight on memory system performance, multi-core

  • Preface xxxi

    systems, and the power wall. Phil Gibbons and Shimin Chen shared their consid-erable expertise on solid-state disk designs.

    We have been able to call on the talents of many, including Wen-Mei Hwu,Markus Pueschel, and Jiri Simsa, to provide both detailed comments and high-level advice. James Hoe helped us create a Verilog version of the Y86 processorand did all of the work needed to synthesize working hardware.

    Many thanks to our colleagues who provided reviews of the draft manu-script: James Archibald (Brigham Young University), Richard Carver (GeorgeMason University), Mirela Damian (Villanova University), Peter Dinda (North-western University), John Fiore (Temple University), Jason Fritts (St. Louis Uni-versity), John Greiner (Rice University), Brian Harvey (University of California,Berkeley), Don Heller (Penn State University), Wei Chung Hsu (University ofMinnesota), Michelle Hugue (University of Maryland), Jeremy Johnson (DrexelUniversity), Geoff Kuenning (Harvey Mudd College), Ricky Liu, Sam Mad-den (MIT), Fred Martin (University of Massachusetts, Lowell), Abraham Matta(Boston University), Markus Pueschel (Carnegie Mellon University), NormanRamsey (Tufts University), Glenn Reinmann (UCLA), Michela Taufer (Univer-sity of Delaware), and Craig Zilles (UIUC).

    Paul Anagnostopoulos of Windfall Software did an outstanding job of type-setting the book and leading the production team. Many thanks to Paul and hissuperb team: Rick Camp (copyeditor), Joe Snowden (compositor), MaryEllen N.Oliver (proofreader), Laurel Muller (artist), and Ted Laux (indexer).

    Finally, we would like to thank our friends at Prentice Hall. Marcia Horton hasalways been there for us. Our editor Matt Goldstein provided stellar leadershipfrom beginning to end. We are profoundly grateful for their help, encouragement,and insights.

    Acknowledgments from the First Edition

    We are deeply indebted to many friends and colleagues for their thoughtful crit-icisms and encouragement. A special thanks to our 15-213 students, whose infec-tious energy and enthusiasm spurred us on. Nick Carter and Vinny Furia gener-ously provided their malloc package.

    Guy Blelloch, Greg Kesden, Bruce Maggs, and Todd Mowry taught the courseover multiple semesters, gave us encouragement, and helped improve the coursematerial. Herb Derby provided early spiritual guidance and encouragement. Al-lan Fisher, Garth Gibson, Thomas Gross, Satya, Peter Steenkiste, and Hui Zhangencouraged us to develop the course from the start. A suggestion from Garthearly on got the whole ball rolling, and this was picked up and refined with thehelp of a group led by Allan Fisher. Mark Stehlik and Peter Lee have been verysupportive about building this material into the undergraduate curriculum. GregKesden provided helpful feedback on the impact of ICS on the OS course. GregGanger and Jiri Schindler graciously provided some disk drive characterizationsand answered our questions on modern disks. Tom Stricker showed us the mem-ory mountain. James Hoe provided useful ideas and feedback on how to presentprocessor architecture.

  • xxxii Preface

    A special group of students—Khalil Amiri, Angela Demke Brown, ChrisColohan, Jason Crawford, Peter Dinda, Julio Lopez, Bruce Lowekamp, JeffPierce, Sanjay Rao, Balaji Sarpeshkar, Blake Scholl, Sanjit Seshia, Greg Stef-fan, Tiankai Tu, Kip Walker, and Yinglian Xie—were instrumental in helpingus develop the content of the course. In particular, Chris Colohan established afun (and funny) tone that persists to this day, and invented the legendary “binarybomb” that has proven to be a great tool for teaching machine code and debuggingconcepts.

    Chris Bauer, Alan Cox, Peter Dinda, Sandhya Dwarkadas, John Greiner,Bruce Jacob, Barry Johnson, Don Heller, Bruce Lowekamp, Greg Morrisett,Brian Noble, Bobbie Othmer, Bill Pugh, Michael Scott, Mark Smotherman, GregSteffan, and Bob Wier took time that they did not have to read and advise uson early drafts of the book. A very special thanks to Al Davis (University ofUtah), Peter Dinda (Northwestern University), John Greiner (Rice University),Wei Hsu (University of Minnesota), Bruce Lowekamp (College of William &Mary), Bobbie Othmer (University of Minnesota), Michael Scott (University ofRochester), and Bob Wier (Rocky Mountain College) for class testing the Betaversion. A special thanks to their students as well!

    We would also like to thank our colleagues at Prentice Hall. Marcia Horton,Eric Frank, and Harold Stone have been unflagging in their support and vision.Harold also helped us present an accurate historical perspective on RISC andCISC processor architectures. Jerry Ralya provided sharp insights and taught usa lot about good writing.

    Finally, we would like to acknowledge the great technical writers BrianKernighan and the late W. Richard Stevens, for showing us that technical bookscan be beautiful.

    Thank you all.

    Randy BryantDave O’HallaronPittsburgh, Pennsylvania

  • About the Authors

    Randal E. Bryant received his Bachelor’s degree fromthe University of Michigan in 1973 and then attendedgraduate school at the Massachusetts Institute ofTechnology, receiving a Ph.D. degree in computer sci-ence in 1981. He spent three years as an AssistantProfessor at the California Institute of Technology,and has been on the faculty at Carnegie Mellon since1984. He is currently a University Professor of Com-puter Science and Dean of the School of ComputerScience. He also holds a courtesy appointment with

    the Department of Electrical and Computer Engineering.He has taught courses in computer systems at both the undergraduate and

    graduate level for over 30 years. Over many years of teaching computer archi-tecture courses, he began shifting the focus from how computers are designed toone of how programmers can write more efficient and reliable programs if theyunderstand the system better. Together with Professor O’Hallaron, he developedthe course 15-213 “Introduction to Computer Systems” at Carnegie Mellon thatis the basis for this book. He has also taught courses in algorithms, programming,computer networking, and VLSI design.

    Most of Professor Bryant’s research concerns the design of software toolsto help software and hardware designers verify the correctness of their systems.These include several types of simulators, as well as formal verification tools thatprove the correctness of a design using mathematical methods. He has publishedover 150 technical papers. His research results are used by major computer manu-facturers, including Intel, FreeScale, IBM, and Fujitsu. He has won several majorawards for his research. These include two inventor recognition awards and atechnical achievement award from the Semiconductor Research Corporation, theKanellakis Theory and Practice Award from the Association for Computer Ma-chinery (ACM), and the W. R. G. Baker Award, the Emmanuel Piore Award, andthe Phil Kaufman Award from the Institute of Electrical and Electronics Engi-neers (IEEE). He is a Fellow of both the ACM and the IEEE and a member ofthe U.S. National Academy of Engineering.

    xxxiii

  • xxxiv About the Authors

    David R. O’Hallaron is the Director of Intel LabsPittsburgh and an Associate Professor in ComputerScience and Electrical and Computer Engineering atCarnegie Mellon University. He received his Ph.D.from the University of Virginia.

    He has taught computer systems courses at theundergraduate and graduate levels on such topics ascomputer architecture, introductory computer sys-tems, parallel processor design, and Internet services.Together with Professor Bryant, he developed thecourse at Carnegie Mellon that led to this book. In

    2004, he was awarded the Herbert Simon Award for Teaching Excellence by theCMU School of Computer Science, an award for which the winner is chosen basedon a poll of the students.

    Professor O’Hallaron works in the area of computer systems, with specificinterests in software systems for scientific computing, data-intensive computing,and virtualization. The best known example of his work is the Quake project, agroup of computer scientists, civil engineers, and seismologists who have devel-oped the ability to predict the motion of the ground during strong earthquakes. In2003, Professor O’Hallaron and the other members of the Quake team won theGordon Bell Prize, the top international prize in high-performance computing.

  • C H A P T E R 1A Tour of Computer Systems

    1.1 Information Is Bits + Context 3

    1.2 Programs Are Translated by Other Programs into Different Forms 4

    1.3 It Pays to Understand How Compilation Systems Work 6

    1.4 Processors Read and Interpret Instructions Stored in Memory 7

    1.5 Caches Matter 12

    1.6 Storage Devices Form a Hierarchy 13

    1.7 The Operating System Manages the Hardware 14

    1.8 Systems Communicate with Other Systems Using Networks 20

    1.9 Important Themes 21

    1.10 Summary 25

    Bibliographic Notes 26

    1

  • 2 Chapter 1 A Tour of Computer Systems

    A computer system consists of hardware and systems software that work togetherto run application programs. Specific implementations of systems change overtime, but the underlying concepts do not. All computer systems have similarhardware and software components that perform similar functions. This book iswritten for programmers who want to get better at their craft by understandinghow these components work and how they affect the correctness and performanceof their programs.

    You are poised for an exciting journey. If you dedicate yourself to learning theconcepts in this book, then you will be on your way to becoming a rare “power pro-grammer,” enlightened by an understanding of the underlying computer systemand its impact on your application programs.

    You are going to learn practical skills such as how to avoid strange numericalerrors caused by the way that computers represent numbers. You will learn howto optimize your C code by using clever tricks that exploit the designs of modernprocessors and memory systems. You will learn how the compiler implementsprocedure calls and how to use this knowledge to avoid the security holes frombuffer overflow vulnerabilities that plague network and Internet software. You willlearn how to recognize and avoid the nasty errors during linking that confoundthe average programmer. You will learn how to write your own Unix shell, yourown dynamic storage allocation package, and even your own Web server. You willlearn the promises and pitfalls of concurrency, a topic of increasing importance asmultiple processor cores are integrated onto single chips.

    In their classic text on the C programming language [58], Kernighan andRitchie introduce readers to C using the hello program shown in Figure 1.1.Although hello is a very simple program, every major part of the system mustwork in concert in order for it to run to completion. In a sense, the goal of thisbook is to help you understand what happens and why, when you run hello onyour system.

    We begin our study of systems by tracing the lifetime of the hello program,from the time it is created by a programmer, until it runs on a system, prints itssimple message, and terminates. As we follow the lifetime of the program, we willbriefly introduce the key concepts, terminology, and components that come intoplay. Later chapters will expand on these ideas.

    code/intro/hello.c

    1 #include

    2

    3 int main()

    4 {

    5 printf("hello, world\n");

    6 }

    code/intro/hello.c

    Figure 1.1 The hello program.

  • Section 1.1 Information Is Bits + Context 3

    1.1 Information Is Bits + Context

    Our hello program begins life as a source program (or source file) that theprogrammer creates with an editor and saves in a text file called hello.c. Thesource program is a sequence of bits, each with a value of 0 or 1, organizedin 8-bit chunks called bytes. Each byte represents some text character in theprogram.

    Most modern systems represent text characters using the ASCII standard thatrepresents each character with a unique byte-sized integer value. For example,Figure 1.2 shows the ASCII representation of the hello.c program.

    The hello.c program is stored in a file as a sequence of bytes. Each byte hasan integer value that corresponds to some character. For example, the first bytehas the integer value 35, which corresponds to the character ‘#’. The second bytehas the integer value 105, which corresponds to the character ‘i’, and so on. Noticethat each text line is terminated by the invisible newline character ‘\n’, which isrepresented by the integer value 10. Files such as hello.c that consist exclusivelyof ASCII characters are known as text files. All other files are known as binaryfiles.

    The representation of hello.c illustrates a fundamental idea: All informationin a system—including disk files, programs stored in memory, user data stored inmemory, and data transferred across a network—is represented as a bunch of bits.The only thing that distinguishes different data objects is the context in whichwe view them. For example, in different contexts, the same sequence of bytesmight represent an integer, floating-point number, character string, or machineinstruction.

    As programmers, we need to understand machine representations of numbersbecause they are not the same as integers and real numbers. They are finiteapproximations that can behave in unexpected ways. This fundamental idea isexplored in detail in Chapter 2.

    # i n c l u d e < s t d i o .

    35 105 110 99 108 117 100 101 32 60 115 116 100 105 111 46

    h > \n \n i n t m a i n ( ) \n {

    104 62 10 10 105 110 116 32 109 97 105 110 40 41 10 123

    \n p r i n t f ( " h e l

    10 32 32 32 32 112 114 105 110 116 102 40 34 104 101 108

    l o , w o r l d \ n " ) ; \n }

    108 111 44 32 119 111 114 108 100 92 110 34 41 59 10 125

    Figure 1.2 The ASCII text representation of hello.c.

  • 4 Chapter 1 A Tour of Computer Systems

    Aside Origins of the C programming language

    C was developed from 1969 to 1973 by Dennis Ritchie of Bell Laboratories. The American NationalStandards Institute (ANSI) ratified the ANSI C standard in 1989, and this standardization later becamethe responsibility of the International Standards Organization (ISO). The standards define the Clanguage and a set of library functions known as the C standard library. Kernighan and Ritchie describeANSI C in their classic book, which is known affectionately as “K&R” [58]. In Ritchie’s words [88], Cis “quirky, flawed, and an enormous success.” So why the success?

    . C was closely tied with the Unix operating system. C was developed from the beginning as thesystem programming language for Unix. Most of the Unix kernel, and all of its supporting toolsand libraries, were written in C. As Unix became popular in universities in the late 1970s and early1980s, many people were exposed to C and found that they liked it. Since Unix was written almostentirely in C, it could be easily ported to new machines, which created an even wider audience forboth C and Unix.

    . C is a small, simple language.The design was controlled by a single person, rather than a committee,and the result was a clean, consistent design with little baggage. The K&R book describes thecomplete language and standard library, with numerous examples and exercises, in only 261 pages.The simplicity of C made it relatively easy to learn and to port to different computers.

    . C was designed for a practical purpose. C was designed to implement the Unix operating system.Later, other people found that they could write the programs they wanted, without the languagegetting in the way.

    C is the language of choice for system-level programming, and there is a huge installed base ofapplication-level programs as well. However, it is not perfect for all programmers and all situations.C pointers are a common source of confusion and programming errors. C also lacks explicit supportfor useful abstractions such as classes, objects, and exceptions. Newer languages such as C++ and Javaaddress these issues for application-level programs.

    1.2 Programs Are Translated by Other Programs intoDifferent Forms

    The hello program begins life as a high-level C program because it can be readand understood by human beings in that form. However, in order to run hello.con the system, the individual C statements must be translated by other programsinto a sequence of low-level machine-language instructions. These instructions arethen packaged in a form called an executable object program and stored as a binarydisk file. Object programs are also referred to as executable object files.

    On a Unix system, the translation from source file to object file is performedby a compiler driver:

    unix> gcc -o hello hello.c

  • Section 1.2 Programs Are Translated by Other Programs into Different Forms 5

    Pre-processor(cpp)

    Compiler(cc1)

    Assembler(as)

    Linker(ld)

    hello.c hello.i hello.s hello.o

    printf.o

    hello

    Sourceprogram

    (text)

    Modifiedsource

    program(text)

    Assemblyprogram

    (text)

    Relocatableobject

    programs(binary)

    Executableobject

    program(binary)

    Figure 1.3 The compilation system.

    Here, the gcc compiler driver reads the source file hello.c and translates it intoan executable object file hello. The translation is performed in the sequenceof four phases shown in Figure 1.3. The programs that perform the four phases(preprocessor, compiler, assembler, and linker) are known collectively as thecompilation system.

    . Preprocessing phase.The preprocessor (cpp) modifies the original C programaccording to directives that begin with the # character. For example, the#include command in line 1 of hello.c tells the preprocessorto read the contents of the system header file stdio.h and insert it directlyinto the program text. The result is another C program, typically with the .isuffix.

    . Compilation phase. The compiler (cc1) translates the text file hello.i intothe text file hello.s, which contains an assembly-language program. Eachstatement in an assembly-language program exactly describes one low-levelmachine-language instruction in a standard text form. Assembly language isuseful because it provides a common output language for different compilersfor different high-level languages. For example, C compilers and Fortrancompilers both generate output files in the same assembly language.

    . Assembly phase. Next, the assembler (as) translates hello.s into machine-language instructions, packages them in a form known as a relocatable objectprogram, and stores the result in the object file hello.o. The hello.o file isa binary file whose bytes encode machine language instructions rather thancharacters. If we were to view hello.o with a text editor, it would appear tobe gibberish.

    . Linking phase.Notice that ourhelloprogram calls theprintf function, whichis part of the standard C library provided by every C compiler. The printffunction resides in a separate precompiled object file called printf.o, whichmust somehow be merged with our hello.o program. The linker (ld) handlesthis merging. The result is the hello file, which is an executable object file (orsimply executable) that is ready to be loaded into memory and executed bythe system.

  • 6 Chapter 1 A Tour of Computer Systems

    Aside The GNU project

    GCC is one of many useful tools developed by the GNU (short for GNU’s Not Unix) project. TheGNU project is a tax-exempt charity started by Richard Stallman in 1984, with the ambitious goal ofdeveloping a complete Unix-like system whose source code is unencumbered by restrictions on howit can be modified or distributed. The GNU project has developed an environment with all the majorcomponents of a Unix operating system, except for the kernel, which was developed separately bythe Linux project. The GNU environment includes the emacs editor, gcc compiler, gdb debugger,assembler, linker, utilities for manipulating binaries, and other components. The gcc compiler hasgrown to support many different languages, with the ability to generate code for many differentmachines. Supported languages include C, C++, Fortran, Java, Pascal, Objective-C, and Ada.

    The GNU project is a remarkable achievement, and yet it is often overlooked. The modern open-source movement (commonly associated with Linux) owes its intellectual origins to the GNU project’snotion of free software (“free” as in “free speech,” not “free beer”). Further, Linux owes much of itspopularity to the GNU tools, which provide the environment for the Linux kernel.

    1.3 It Pays to Understand How Compilation Systems Work

    For simple programs such as hello.c, we can rely on the compilation system toproduce correct and efficient machine code. However, there are some importantreasons why programmers need to understand how compilation systems work:

    . Optimizing program performance. Modern compilers are sophisticated toolsthat usually produce good code. As programmers, we do not need to knowthe inner workings of the compiler in order to write efficient code. However,in order to make good coding decisions in our C programs, we do need abasic understanding of machine-level code and how the compiler translatesdifferent C statements into machine code. For example, is a switch statementalways more efficient than a sequence of if-else statements? How muchoverhead is incurred by a function call? Is a while loop more efficient thana for loop? Are pointer references more efficient than array indexes? Whydoes our loop run so much faster if we sum into a local variable instead of anargument that is passed by reference? How can a function run faster when wesimply rearrange the parentheses in an arithmetic expression?

    In Chapter 3, we will introduce two related machine languages: IA32, the32-bit code that has become ubiquitous on machines running Linux, Windows,and more recently the Macintosh operating systems, and x86-64, a 64-bitextension found in more recent microprocessors. We describe how compilerstranslate different C constructs into these languages. In Chapter 5, you willlearn how to tune the performance of your C programs by making simpletransformations to the C code that help the compiler do its job better. InChapter 6, you will learn about the hierarchical nature of the memory system,how C compilers store data arrays in memory, and how your C programs canexploit this knowledge to run more efficiently.

  • Section 1.4 Processors Read and Interpret Instructions Stored in Memory 7

    . Understanding link-time errors. In our experience, some of the most perplex-ing programming errors are related to the operation of the linker, especiallywhen you are trying to build large software systems. For example, what doesit mean when the linker reports that it cannot resolve a reference? What is thedifference between a static variable and a global variable? What happens ifyou define two global variables in different C files with the same name? Whatis the difference between a static library and a dynamic library? Why does itmatter what order we list libraries on the command line? And scariest of all,why do some linker-related errors not appear until run time? You will learnthe answers to these kinds of questions in Chapter 7.

    . Avoiding security holes. For many years, buffer overflow vulnerabilities haveaccounted for the majority of security holes in network and Internet servers.These vulnerabilities exist because too few programmers understand the needto carefully restrict the quantity and forms of data they accept from untrustedsources. A first step in learning secure programming is to understand the con-sequences of the way data and control information are stored on the programstack. We cover the stack discipline and buffer overflow vulnerabilities inChapter 3 as part of our study of assembly language. We will also learn aboutmethods that can be used by the programmer, compiler, and operating systemto reduce the threat of attack.

    1.4 Processors Read and Interpret InstructionsStored in Memory

    At this point, our hello.c source program has been translated by the compilationsystem into an executable object file called hello that is stored on disk. To runthe executable file on a Unix system, we type its name to an application programknown as a shell:

    unix> ./hello

    hello, world

    unix>

    The shell is a command-line interpreter that prints a prompt, waits for you to type acommand line, and then performs the command. If the first word of the commandline does not correspond to a built-in shell command, then the shell assumes thatit is the name of an executable file that it should load and run. So in this case,the shell loads and runs the hello program and then waits for it to terminate. Thehello program prints its message to the screen and then terminates. The shell thenprints a prompt and waits for the next input command line.

    1.4.1 Hardware Organization of a System

    To understand what happens to our hello program when we run it, we needto understand the hardware organization of a typical system, which is shown inFigure 1.4. This particular picture is modeled after the family of Intel Pentium

  • 8 Chapter 1 A Tour of Computer Systems

    Figure 1.4Hardware organizationof a typical system. CPU:Central Processing Unit,ALU: Arithmetic/LogicUnit, PC: Program counter,USB: Universal Serial Bus.

    CPU

    Register file

    PC ALU

    Bus interfaceI/O

    bridge

    System bus Memory bus

    Mainmemory

    I/O busExpansion slots forother devices suchas network adaptersDisk

    controllerGraphicsadapter

    DisplayMouse Keyboard

    USBcontroller

    Diskhello executable

    stored on disk

    systems, but all systems have a similar look and feel. Don’t worry about thecomplexity of this figure just now. We will get to its various details in stagesthroughout the course of the book.

    Buses

    Running throughout the system is a collection of electrical conduits called busesthat carry bytes of information back and forth between the components. Busesare typically designed to transfer fixed-sized chunks of bytes known as words. Thenumber of bytes in a word (the word size) is a fundamental system parameter thatvaries across systems. Most machines today have word sizes of either 4 bytes (32bits) or 8 bytes (64 bits). For the sake of our discussion here, we will assume a wordsize of 4 bytes, and we will assume that buses transfer only one word at a time.

    I/O Devices

    Input/output (I/O) devices are the system’s connection to the external world. Ourexample system has four I/O devices: a keyboard and mouse for user input, adisplay for user output, and a disk drive (or simply disk) for long-term storage ofdata and programs. Initially, the executable hello program resides on the disk.

    Each I/O device is connected to the I/O bus by either a controller or an adapter.The distinction between the two is mainly one of packaging. Controllers are chipsets in the device itself or on the system’s main printed circuit board (often calledthe motherboard). An adapter is a card that plugs into a slot on the motherboard.Regardless, the purpose of each is to transfer information back and forth betweenthe I/O bus and an I/O device.

  • Section 1.4 Processors Read and Interpret Instructions Stored in Memory 9

    Chapter 6 has more to say about how I/O devices such as disks work. InChapter 10, you will learn how to use the Unix I/O interface to access devices fromyour application programs. We focus on the especially interesting class of devicesknown as networks, but the techniques generalize to other kinds of devices as well.

    Main Memory

    The main memory is a temporary storage device that holds both a program andthe data it manipulates while the processor is executing the program. Physically,main memory consists of a collection of dynamic random access memory (DRAM)chips. Logically, memory is organized as a linear array of bytes, each with its ownunique address (array index) starting at zero. In general, each of the machineinstructions that constitute a program can consist of a variable number of bytes.The sizes of data items that correspond to C program variables vary according totype. For example, on an IA32 machine running Linux, data of type short requirestwo bytes, types int, float, and long four bytes, and type double eight bytes.

    Chapter 6 has more to say about how memory technologies such as DRAMchips work, and how they are combined to form main memory.

    Processor

    The central processing unit (CPU), or simply processor, is the engine that inter-prets (or executes) instructions stored in main memory. At its core is a word-sizedstorage device (or register) called the program counter (PC). At any point in time,the PC points at (contains the address of) some machine-language instruction inmain memory.1

    From the time that power is applied to the system, until the time that thepower is shut off, a processor repeatedly executes the instruction pointed at by theprogram counter and updates the program counter to point to the next instruction.A processor appears to operate according to a very simple instruction executionmodel, defined by its instruction set architecture. In this model, instructions executein strict sequence, and executing a single instruction involves performing a seriesof steps. The processor reads the instruction from memory pointed at by theprogram counter (PC), interprets the bits in the instruction, performs some simpleoperation dictated by the instruction, and then updates the PC to point to the nextinstruction, which may or may not be contiguous in memory to the instruction thatwas just executed.

    There are only a few of these simple operations, and they revolve aroundmain memory, the register file, and the arithmetic/logic unit (ALU). The registerfile is a small storage device that consists of a collection of word-sized registers,each with its own unique name. The ALU computes new data and address values.Here are some examples of the simple operations that the CPU might carry outat the request of an instruction:

    1. PC is also a commonly used acronym for “personal computer.” However, the distinction betweenthe two should be clear from the context.

  • 10 Chapter 1 A Tour of Computer Systems

    . Load: Copy a byte or a word from main memory into a register, overwritingthe previous contents of the register.

    . Store: Copy a byte or a word from a register to a location in main memory,overwriting the previous contents of that location.

    . Operate: Copy the contents of two registers to the ALU, perform an arithmeticoperation on the two words, and store the result in a register, overwriting theprevious contents of that register.

    . Jump: Extract a word from the instruction itself and copy that word into theprogram counter (PC), overwriting the previous value of the PC.

    We say that a processor appears to be a simple implementation of its in-struction set architecture, but in fact modern processors use far more complexmechanisms to speed up program execution. Thus, we can distinguish the pro-cessor’s instruction set architecture, describing the effect of each machine-codeinstruction, from its microarchitecture, describing how the processor is actuallyimplemented. When we study machine code in Chapter 3, we will consider theabstraction provided by the machine’s instruction set architecture. Chapter 4 hasmore to say about how processors are actually implemented.

    1.4.2 Running the hello Program

    Given this simple view of a system’s hardware organization and operation, we canbegin to understand what happens when we run our example program. We mustomit a lot of details here that will be filled in later, but for now we will be contentwith the big picture.

    Initially, the shell program is executing its instructions, waiting for us to typea command. As we type the characters “./hello” at the keyboard, the shellprogram reads each one into a register, and then stores it in memory, as shown inFigure 1.5.

    When we hit the enter key on the keyboard, the shell knows that we havefinished typing the command. The shell then loads the executable hello file byexecuting a sequence of instructions that copies the code and data in the helloobject file from disk to main memory. The data include the string of characters“hello, world\n” that will eventually be printed out.

    Using a technique known as direct memory access (DMA, discussed in Chap-ter 6), the data travels directly from disk to main memory, without passing throughthe processor. This step is shown in Figure 1.6.

    Once the code and data in the hello object file are loaded into memory, theprocessor begins executing the machine-language instructions in the hello pro-gram’s main routine. These instructions copy the bytes in the “hello, world\n”string from memory to the register file, and from there to the display device, wherethey are displayed on the screen. This step is shown in Figure 1.7.

  • Section 1.4 Processors Read and Interpret Instructions Stored in Memory 11

    Figure 1.5Reading the hellocommand from thekeyboard.

    CPU

    Register file

    PC ALU

    Bus interfaceI/O

    bridge

    System bus Memory bus

    Mainmemory

    I/O busExpansion slots forother devices suchas network adaptersDisk

    controllerGraphicsadapter

    DisplayMouse Keyboard

    USBcontroller

    Disk

    “hello”

    Usertypes“hello”

    Disk

    CPU

    Register file

    PC ALU

    Bus interfaceI/O

    bridge

    System bus Memory bus

    Mainmemory

    I/O busExpansion slots forother devices suchas network adaptersDisk

    controllerGraphicsadapter

    DisplayMouse Keyboard

    USBcontroller

    “hello, world\n”

    hello code

    hello executablestored on disk

    Figure 1.6 Loading the executable from disk into main memory.

  • 12 Chapter 1 A Tour of Computer Systems

    Figure 1.7Writing the output stringfrom memory to thedisplay.

    CPU

    Register file

    PC ALU

    Bus interfaceI/O

    bridge

    System bus Memory bus

    Mainmemory

    I/O busExpansion slots forother devices suchas network adaptersDisk

    controllerGraphicsadapter

    DisplayMouse Keyboard

    USBcontroller

    Disk“hello, world\n”

    “hello, world\n”

    hello code

    hello executablestored on disk

    1.5 Caches Matter

    An important lesson from this simple example is that a system spends a lot oftime moving information from one place to another. The machine instructions inthe hello program are originally stored on disk. When the program is loaded,they are copied to main memory. As the processor runs the program, instruc-tions are copied from main memory into the processor. Similarly, the data string“hello,world\n”, originally on disk, is copied to main memory, and then copiedfrom main memory to the display device. From a programmer’s perspective, muchof this copying is overhead that slows down the “real work” of the program. Thus,a major goal for system designers is to make these copy operations run as fast aspossible.

    Because of physical laws, larger storage devices are slower than smaller stor-age devices. And faster devices are more expensive to build than their slowercounterparts. For example, the disk drive on a typical system might be 1000 timeslarger than the main memory, but it might take the processor 10,000,000 timeslonger to read a word from disk than from memory.

    Similarly, a typical register file stores only a few hundred bytes of information,as opposed to billions of bytes in the main memory. However, the processor canread data from the register file almost 100 times faster than from memory. Evenmore troublesome, as semiconductor technology progresses over the years, thisprocessor-memory gap continues to increase. It is easier and cheaper to makeprocessors run faster than it is to make main memory run faster.

    To deal with the processor-memory gap, system designers include smallerfaster storage devices called cache memories (or simply caches) that serve astemporary staging areas for information that the processor is likely to need inthe near future. Figure 1.8 shows the cache memories in a typical system. An L1

  • Section 1.6 Storage Devices Form a Hierarchy 13

    Figure 1.8Cache memories.

    I/Obridge

    CPU chip

    Cachememories

    Register file

    System bus Memory bus

    Bus interfaceMain

    memory

    ALU

    cache on the processor chip holds tens of thousands of bytes and can be accessednearly as fast as the register file. A larger L2 cache with hundreds of thousandsto millions of bytes is connected to the processor by a special bus. It might take 5times longer for the process to access the L2 cache than the L1 cache, but this isstill 5 to 10 times faster than accessing the main memory. The L1 and L2 caches arei