real world multicore embedded systems

13
Real World Multicore Embedded Systems A Practical Approach Expert Guide Bryon Moyer AMSTERDAM • BOSTON • HEIDELBERG • LONDON I J^# J NEW YORK OXFORD • PARIS • SAN DIEGO S V J SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO ^V«i^ Newnes is an imprint of Elsevier NCWTIGS

Upload: others

Post on 27-Dec-2021

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Real World Multicore Embedded Systems

Real World Multicore Embedded Systems

A Practical Approach

Expert Guide

Bryon Moyer

AMSTERDAM • BOSTON • HEIDELBERG • LONDON I J ^ # J NEW YORK • OXFORD • PARIS • SAN DIEGO S V J

SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO ^ V « i ^ Newnes is an imprint of Elsevier NCWTIGS

Page 2: Real World Multicore Embedded Systems

Contents

About the Editor xvii About the Authors xix

Chapter 1: Introduction and Roadmap 1 Multicore is here 1 Scope 3 Who should read this book? 3 Organization and roadmap 4

Concurrency 4 Architecture 4 Infrastructure 5 Virtualization 6 Application software 6 Hardware assistance 7 System-level considerations 8

A roadmap of this book 9

Chapter 2: The Promise and Challenges of Concurrency 11 Concurrency fundamentals 12 Two kinds of concurrency 14

Data parallelism 15 Functional parallelism 16

Dependencies 18 Producers and consumers of data 19

Loops and dependencies 23 Shared resources 30 Summary 31

Chapter 3: Multicore Architectures 33 The need for multicore architectures 34

v

Page 3: Real World Multicore Embedded Systems

vi Contents

Multicore architecture drivers 36 Traditional sequential software paradigms break 38 Scope of multicore hardware architectures 41 Basic multicore hardware architecture overview 43 Specific multicore architecture characteristics 45 Processing architectures 45 ALU processing architectures 46 Lightweight processing architectures 46 Mediumweight processing architectures 48 Heavyweight processing architectures 48 Communication Architectures 49 Memory architectures 54 Application specificity 56 Application-specific platform topologies 59 Integration of multicore systems, MPSoCs and sub-systems 63 Programming challenges 66 Application characteristics 67 MPSoC analysis, debug and verification 68 Shortcomings and solutions 69 MPSoC parallel programming 69 Parallel software and MPSoCs 70 Summary 71 References 73

Chapter 4: Memory Models for Embedded Multicore Architecture 75 Introduction 76 Memory types 77 Memory architecture 79

Cache 79 Cache customization 86 Virtual memory 87 Scratchpad 88 Software overlays 89 DMA 90 DRAM 92

Special-purpose memory 93 Memory structure of multicore architecture 94

Shared memory architecture 94 Distributed memory architecture 95 Cache memory in multicore chips 96

Page 4: Real World Multicore Embedded Systems

Contents vii

Cache coherency 97 Directory-based cache coherence protocol 99 Snoopy cache coherence protocol 102 MESI cache coherence protocol 103 Cache-related performance issues 107

Transactional memory 109 Software transactional memory 112 Hardware transactional memory 115 Hybrid transactional memory 115

Summary 115 References 116

Chapter 5: Design Considerations for Multicore SoC Interconnections 117

Introduction 119 Importance of interconnections in an SoC 120 Terminology 121 Organization of the chapter 121

Communication activity in multicore SoCs 124 Transaction-Based communication 124 Storage-Oriented transactions 124 Concurrency of communication and segregation of traffic 126 Recent trends in SoCs 127

Functional requirements and topologies of SoC traffic 129 Memory organization 131 Implications of inter-device communication paradigms 134

Performance considerations 142 Transaction latency 142 Queuing delays 145 Bandwidth 149

Interconnection networks: representation and terminology 150 Representation of interconnection networks 150 Direct versus indirect networks 153 Circuit-Switched versus packet-switched communication and blocking versus non-blocking networks 155 Base-Form vs. encoded signaling 156 Transaction routing 157 Bus as an SoC interconnection 158 Limitations of the bus architecture 160

Fabric-oriented interconnects for larger SoCs 161

Page 5: Real World Multicore Embedded Systems

viii Contents

Transaction formats 163 Transaction routing 166

Building blocks of scalable interconnections 167 Links 168 Clocking considerations 169 Switches 169

Evaluating and comparing interconnection topologies for future SoCs 185

Metrics for comparing topologies 186 A Survey of interconnection networks suitable for future SoCs 188

A Pipelined bus 188 Multiple buses 189 A ring 189 A crossbar 190 Mesh topology 191

Some practical considerations in designing interconnections 192 Hierarchies of interconnections 192 Scalability in implementations 192

Summary 193 References 196 Further reading 197

Chapter 6: Operating Systems in Multicore Platforms 799 Introduction 199 Symmetric multiprocessing systems and scheduling 202 Assymetric multiprocessor systems 207

OS-per-core 207 Multiple SMP 211 SMP + RTOS 212 SMP + bare-metal 212

Virtualization 214 Controlling OS behavior 214

Controlling the assignment of threads in an SMP system 214 Controlling where interrupt handlers run 215 Partitions, containers, and zones 216 Priority 217 Kernel modifications, drivers, and thread safety 218 System start-up 220

Debugging a multicore system 221 The information gathered 222

Page 6: Real World Multicore Embedded Systems

Contents ix

Uploading the information 223 Painting the picture 224

Summary 225 Reference 226

Chapter 7: System Virtualization in Multicore Systems 227 What is virtualization? 228 A brief retrospective 230 Applications of system virtualization 231

Environment sandboxing 231 Virtual appliances 232 Workload consolidation 232 Operating system portability 233 Mixed-criticality systems 233 Maximizing multicore processing resources 233 Improved user experience 233

Hypervisor architectures 234 Type 2 234 Type 1 235 Paravirtualization 236 Monolithic hypervisor 237 Console guest hypervisor 237 Microkernel-based hypervisor 238 Core partitioning architectures 239

Leveraging hardware assists for virtualization 241 Mode hierarchy 241 Intel VT 242 Power architecture ISA 2.06 embedded hypervisor extensions 243 ARMTrustZone 244 ARM Virtualization Extensions 246

Hypervisor robustness 246 SubVirt 247 Blue pill 247 Ormandy 247 Xen owning trilogy 247 VMware's security certification 248

I/O Virtualization 249 Peripheral virtualization architectures 249 Peripheral sharing architectures 253 Combinations of I/O virtualization approaches 255

Page 7: Real World Multicore Embedded Systems

x Contents

I/O virtualization within microkernels 255 Case study: power architecture virtualization and the freescale P4080 257

Power architecture hardware hypervisor features 257 Power architecture virtualized memory management 260 Freescale P4080 IOMMU 261 Hardware support for I/O sharing in the P4080 262 Virtual machine configuration in power architecture 263

Example use cases for system virtualization 263 Telecom blade consolidation 264 Electronic flight bag 264 Intelligent Munitions System 264 Automotive infotainment 265 Medical imaging system 266

Conclusion 266 References 267

Chapter 8: Communication and Synchronization Libraries 269 Introduction 269 Library overview and basics 270

Thread APIs 270 Message-passing APIs 270

Explicit threading libraries 270 Windows Threads 271 POSIX Threads 272 Cl l and C ++ 11 Threads 275

OpenMP 277 Threading Building Blocks 282 Boost Threads 285 MCAPI 286 Conclusion 288 References 288

Chapter 9: Programming Languages 289 Programming languages for multicore embedded systems 289 C 290

Multi-threading support in C 295 Assembly language 295

Multi-threading and assembly 296

Page 8: Real World Multicore Embedded Systems

Contents xi

C++ 297 Features of C + + that work well for embedded systems 297 Features of C + + that do not work well for embedded systems 300 Multi-threading support in C + + 303

Java 304 Multi-threading support in Java 305

Python 307 Multi-threading support in Python 309

Ada 310 Concurrency support in Ada 311

Summary 311 References 312

Chapter 10: Tools 373 Introduction 314 Real-Time operating systems (RTOS) 315

DEOSbyDDC-1 315 EneaOSE 316 Express logic ThreadX 316 Green Hills Integrity 317 Lynux Works 318 Mentor Graphics Nucleus 318 MontaVista 319 QNX 319 Wind River VxWorks 320

Communication tools 321 PolyCore Software 321 Enea Linx 322

Parallelizing serial software tools 323 CriticalBlue Prism 323 Vector Fabrics 323 Open multiprocessing (MP) 324 Clean C 325

Software development and debug tools 325 Intel Parallel Studio 325 Benchmarking tools 326

Embedded Microprocessor Benchmark Consortium (EEMBC) 326 Standard Performance Evaluation Corporation (SPEC) CPU2600...328

Conclusion 328 Acknowledgments 329

Page 9: Real World Multicore Embedded Systems

xii Contents

Chapter 11: Partitioning Programs for Multicore Systems 331 Introduction 332 What level of parallelism? 334

Threads of control 334 Solutions, algorithms, and implementations 336

The basic cost of partitioning 338 A high-level partitioning algorithm 340 The central role of dependencies 341

Breaking dependencies 341 Types of dependencies 343 Locating dependencies 348 Handling broken dependencies 351

Critical sections 360 Synchronizing data 361

Using counting semaphores 361 Using FIFOs 362

Implementing a partitioning strategy 367 Using tools to simplify partitioning 368

Vector Fabrics's Pareon 369 CriticalBlue's Prism 375

Summary 384 References 384

Chapter 12: Soßware Synchronization 385 Introduction 387 Why is synchronization required? 388

Data integrity 388 Atomicity 390 Sequence of processing 393 Access to limited resources 395 Critical timing for real-time 395

Problems with not synchronizing (or synchronizing badly) 395 Slower throughput 396 Errors in synchronization logic 397 Consumes more power 397

Testing for proper synchronization 397 How is synchronization achieved? 398

Exclusion 399

Page 10: Real World Multicore Embedded Systems

Contents xiii

Test and set; compare and swap (CAS) 405 Barrier 406 Architectural design 407

Specific conditions requiring synchronization 412 Data races 413 Deadlocks 416 Livelocks 416 Non-atomic operations 417 Data caching 418 Conversion for endianness 419 How to implement synchronization 419

Language support for implementation 424 Intro 424 Language features and extensions 425 Libraries 426

Patterns 426 Finding concurrency design patterns 427 Algorithm structure design patterns 428 Supporting structures design patterns 429 Implementation mechanisms 429

Side-effects of synchronization 430 Incorrect synchronization 430 Program execution 431 Priority inversion 432 Performance 432 Code complexity 433 Software tools 436

Hardware and OS effects on synchronization 437 Number of cores 438 Memory, caches, etc 438 Thread scheduling 439 Garbage collection (and other system-level globally synchronized operations) 439

Problems when trying to implement synchronization 440 Inconsistent synchronization: not synchronizing all access methods 440 Data escapes 441 Using a mutable shared object with two different access steps (i.e., init() and parse()) 441

Page 11: Real World Multicore Embedded Systems

xiv Contents

Cached "Scratch-Pad" data 441 Multiple lock objects created 442 Trying to lock on a null-pointer 442 Double-check locking errors 442 Simple statements not atomic (i.e., increments, 64-bit assignments) 443 Check/act logic not synchronized 443 Synchronization object used for many unrelated things 444 Summary — synchronization problems 444

References 445

Chapter 13: Hardware Accelerators 447 Introduction 447 Architectural considerations 449

Blocking vs. non-blocking 449 Shared or dedicated 451 SMP vs. AMP 452 Copying data — or not 452 Signaling completion 453

The interface: registers, drivers, APIs, and ISRs 454 Hardware interface 454 Drivers 456 Software API 456 ISRs 457

Initialization 461 Operating system considerations 462 Coherency 462 Making the architectural decisions 466 Video example 467

The interface 470 The application 471 The driver 474 Real-world refinements 477

Summary 480

Chapter 14: Multicore Synchronization Hardware 481 Chapter overview 481 Instruction set support for synchronization 483

Test-and-set 484 Compare-and-swap 487

Page 12: Real World Multicore Embedded Systems

Contents xv

Load-reserved/store-conditional 489 Creating new primitives 491 Compiler intrinsics 493

Hardware support for synchronization 494 Bus locking 494 Load-reserved and store-conditional 495

Hardware support for lock-free programming 496 Lock-free synchronization with hardware queues 497 Decorated storage operations 499 Messaging 500 Hardware transactional memory 507

Memory subsystem considerations 509 Memory ordering rules 510 Using memory barriers and synchronizations 512

Conclusions 514 References 514

Chapter 15: Bare-Metal Systems 577 Introduction 517

What is a bare-metal setup? 519 Who should use bare metal? 521

Architectural arrangements 522 Data parallelism: SIMD 523 Functional parallelism: pipelines 526

Software architecture 535 Building the executable image(s) 536 Example: IPv4 forwarding 541

Packet forwarding 542 Next-hop lookup: longest prefix match 545 The DIR-24-8-BASIC algorithm 547 Example target architecture: Cavium OCTEON CN3020 552 Select code examples 555

Conclusion 560 Reference 560

Chapter 16: Multicore Debug 561 Introduction — why debug instrumentation 561

How does multicore differ from single-core debug? 566 Background — silicon debug and capabilities 568 Trace methods for multicore debug analysis 569

Page 13: Real World Multicore Embedded Systems

xvi Contents

Types of instrumentation logic blocks 571 JTAG interfaces for multicore debug 580 External interfaces for on-chip instrumentation 581

Debug flows and subsystems 582 Commercial approaches 586

The OCP debug interface 586 Nexus/IEEE 5001 587 ARMCoreSight 593 Example: MIPS PDTrace and RRT analysis 596

The future of multicore debug 600 References 602 Further reading 602

Index 603