hkg15-the machine: a new kind of computer- keynote by dejan milojicic
TRANSCRIPT
© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
The MachineDejan Milojicic, HP Labs Palo Alto
IEEE Computer Society, President 2014
Linaro Connect, Hong Kong, February 2015
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.2
Disclaimer, Acknowledgements
The views in this talk are my opinions and not necessarily those of HP
Thanks to
Greg Astfalk, Alvin AuYoung, Cullen Bash, Dhruva Chakrabarti, Al Davis,
Paolo Faraboschi, Gary Gostin, Richard Lewington, Terence Kelly, Kim Keeton,
Pat Knebel, Hideaki Kimura, Mike Krause, Naveen Muralimanohar, Indrajit Roy,
Rob Schreiber, Mike Tan, Haris Volos… and probably many more
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.3
HP Labs: history of innovation
1975Standard for
Interface Bus
1966Light Emitting
Diode (LED)
1968Programmable
Desktop
Calculator
1989 Digital Data
Storage
Drive
1980 64-channel
Ultrasound
1986 Commercialized
RISC chips
2005 Virus
Throttle
1999 Molecular
Logic Gate
2003 Smart Cooling
1986 3D graphics
workstations
1972 Pocket
Scientific
Calculator
1984 Inkjet Printer
1980 Office Laser
Printer
1967Cesium-beam atomic
clock
1994 64-bit architecture
2001 Utility Data Center
2002 Rewritable DVD
for standard
players 2008 Memristor
discovered
1966 2011MagCloud
2012StoreAll
OpenFlow
switches
2013Moonshot
2010ePrint
StoreOnce
3D Photon
Engine
Threat Central
2014Location Aware
SureStart
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.4
Innovation horizons
HP Labs Business Units
1 2 3 5 6 10 204
Applied Research
2 – 5 years
Advanced
Development
Up to 2 years
Exploratory Research
5 – 20+ years
R e v o l u t i o n a r y Evolutionary
The future
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.5
By 2020
… for 8
Billion (4)
Next wave: cyber physical age
Pervasive
Connectivity
Explosion of
Information
Smart Device
Expansion
Internet of Things
(1) IDC “Worldwide Internet of Things (IoT) 2013-2020 forecast” October 2013. (2) IDC "The Digital Universe of Opportunities: Rich Data and the Increasing Value of the Internet of
Things" April 2014 (3) Global Smart Meter Forecasts, 2012-2020. Smart Grid Insights (Zypryme), November 2013 (4) http://en.wikipedia.org
200
Billion (1)
IoT “Things”
30
Billion (2)
Connected
Devices
(3)
1
Billio
n
Smart Meters
Internet of People
44 ZB
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.6
Architecture has not changed for 60 years
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.7
Traditional computer architecturePerformance Wall:
• Multi-core introduced due to
single-thread performance
“wall.”
Storage Hierarchy:
• HDD/SSD layer is significant
performance bottleneck.
• Prevents data getting closer to
compute.Data Movement:
• Too slow for real-time access to shared
memory.
Memory Wall:
• DRAM reaching a technology scaling
wall.
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.8
Architecture of the future: The Machine
Special purpose SoCs
Photonics
Massive memory pool
Photons IonsElectron
s
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.9
Application-focused silicon
• Less general-purpose, more workload focused
• Dramatic reduction in power, cost, and space
• SoC vendors bring their own differentiated
features and opportunities to disrupt markets
Traditional Server Motherboard
StorageCtrlr
Mgmt
Network
ManagementLogic
Video
Southbridge
Production
Network
NIC(s)
VGA
Console
ProcessorProcessor
ECC Memory ECC Memory
HDDs
System on a Chip (SoC)-based Server
StorageCtrlr
Mgmt Production
Network
NIC(s)
Processor
ECC Memory
Storage
MgmtInterface
Custom Accelerators
SoC
Special purpose SoCs
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.10
Communication fire hose for memristor stores
Photonics technology
Designed for multiple use cases
• Low cost
• Compact form factor
• Easy to integrate on circuit boards
• No calibration required
• Extensible to higher bandwidth
Orders of magnitude lower energy per
bit
• 1-2 pJ per bit
Short term: short range, low cost
VCSEL
Long term: micro-ring resonator
(low cost, long distance, integrated on silicon)
Photonics
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.11
Memristors: ions to store information
Ions (charged atoms) are much better behaved than electrons
Heavier, and can be pushed by an electrical field
Stay where you park them, even in a very small box (4F2)
A bit can be represented by the location of an ion in a box
M
Ω
1nm
++
10
kΩ
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.12
Non-volatile memory (NVM)
Persistently stores data
Access latencies comparable to DRAM
Byte addressable (load/store) rather than block addressable (read/write)
Flash-backed
DRAM
2D and 3D
Flash
Phase-Change Memory
Spin-Transfer Torque
MRAM
Resistive RAM
(e.g., Memristor)
ns μs
Latency
Haris Volos, et al. "Aerie: Flexible File-System Interfaces to
Storage-Class Memory," Proc. EuroSys 2014.
Massive memory pool
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.13
SRAM
DRAM
Hard Disk
On-chip caches
Main memory
Disk
Disk cache
Flash SSD
Memory hierarchy today
Speed
Co
st p
er
bit
Capacity
Massive memory pool
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.14
Collapsing the hierarchy
CPUs
DIMM DDR
HDD DISK
High Capacity DDR Tier
Cold Storage HDD tier
Intelligent Flash SSD Tier
CPUs
High Bandwidth Tier
2.5D
Performance + Capacity NVM Tier
CPUs
3D DRAM or NVM
Extreme Bandwidth Tier
Massive memory pool
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.15
X
XX
NVRA
M
global global
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.16
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.17
Traditional file systems
Examples
Ext2/3/4, XFS, BTRFS, ZFS, LFS
Separate storage address space
Data is copied between storage and
DRAM
Block-level abstraction leads to
inefficiencies
Use of page cache leads to extra
copies
True even for memory-mapped I/O
Software layers add overheadSubramanya R Dulloor, et al. "System Software for Persistent Memory," Proc. EuroSys 2014.
Storage: disks, SSDs
Traditional FS
Applications
Page Cache
Block Device
mma
p
file IO
VFS
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.18
Non-volatile memory aware file systems
Examples
Microsoft BPFS
Intel PMFS
Low overhead access to
persistent memory
No page cache
Direct access with mmap
Leverage hardware
support for consistencyPM
Traditional FS
Applications
Page Cache
Block Device
mma
p
file IO
NVM
FS
mmu
mappings
mma
pVFS
file IO
Subramanya R Dulloor, et al. "System Software for Persistent Memory," Proc. EuroSys 2014.
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.19
Linux synergies for non-volatile memory
kernel.org
HP Proof Of Concept• x86 cache-
coherent• Sliding windows• Parametric• Legacy filesystem
semantics
Intel PV Investigation• Start with Execute
In Place (XIP)• Evaluate for
general use
Intel Evolution• DAX: Direct
Access• Extend FS API
HP Evolution• Split PoC driver• Replace
proprietary section with DAX
Pure DAX
HW Enablementextending DAX
© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.20
Do we need a separate durable data
representation?• Conventional durability techniques
– Separate object and persistent formats
– Translation code
– Programmability and performance issues
• In-memory durability
– Enabled by NVRAM (memristors, PCM, etc.)
– In-memory objects are durable throughout
– Byte-addressability simplifies programmability
– Low load/store latencies offer high
performance
In-
memor
y
objects
File or
Database
Serialize
Deserialize
CPU
CACHES
DRAM NVRAM
© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.21
Why can’t I just write my program, and have all my data be persistent?
The NVM programming problem
• Consider a simple banking program (just two accounts):
double accounts[2];
• Between which I want to transfer money. Naïve implementation:
transfer(int from, int to, double amount)
accounts[from] -= amount;
accounts[to] += amount;
What if I crash here?What if I crash here?
Crashes cause corruption, which prevents us from merely restarting the
computation
© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.22
Manual solution
• Need code that plays back undo log on restart
• Getting this to work with threads and locks is very hard
• Really want to optimize it
• Very unlikely application programmers will get it right!
persistent double accounts[2];transfer(int from, int to, double amount) <save old value of accounts[from] in undo log>;<flush log entry to NVRAM>
accounts[from] -= amount;<save old value of accounts[to] in undo log>;<flush log entry to NVRAM>
accounts[to] += amount;<flush all other persistent stores to NVRAM><clear and flush log>
© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.23
Provide a construct that atomically updates NVRAM
Our solution: consistent sections
• Ensures that updates in __atomic block are either completely visible after
crash or not at all
• If updates in __atomic block are visible, then so are prior updates to
persistent memory
persistent double accounts[2];transfer(int from, int to, double amount) __atomic
accounts[from] -= amount;accounts[to] += amount;
© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.24
Atlas programming model
• Programmer distinguishes persistent and transient data
• Persistent data lives in a “persistent region”
• Directly mappable into process address space (without DRAM buffers)
• Accessed via CPU loads and stores
• Programmer writes ordinary multithreaded code
• Automatic durability support at a fine granularity, complete with recovery code
• Supports consistency of durable data derived from concurrency constructs
• Protection against failures
• Process crash: works with existing architecture
• Tolerating kernel panics and power failures requires NVRAM and CPU cache flushes
D. Chakrabarti, H. Boehm and K. Bhandari. Atlas: Leveraging Locks for Non-volatile Memory Consistency. Proc. OOPSLA, 2014.
© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.25
Tools: performance emulator for NVM,
interconnect• The Machine components are not available yet in all configurations and form factors
– Future NVRAM and Interconnect technologies will offer a variety of performance characteristics
– It is extremely difficult to predict and optimize performance of complex application’s on a future hardware
– Which ranges of latencies/bandwidth are critical for good performance/scalability of different applications
• Solution: a performance emulator for NVRAM and Interconnect using a commodity hardware to enable
– Performance evaluation of design choices for the Machine
– Application sensitivity analysis for ranges of hw performance characteristics
• Challenges: intercepting memory and interconnect requests to change their perceived latency/bw at current
hardware speeds is a very challenging task!
• The performance emulator has two main components:
– DRAM–based NVM emulator
– Infiniband-based Interconnect emulator
• Two knobs for performance characteristics of NVM and Interconnect: bandwidth and latency
• We are assembling a suite of memory- and communication-intensive applications to perform their analysis on
systems with the Machine-like configurations and a variety of NVM and Interconnect performance
characteristics
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.26
Summary Everything changes…Hardware
• Memory controller
Architecture
• Coherence/sharing model
• Consistency model
• Error handling, RAS
Software
• OS, memory management
• Compilers and runtime
• Algorithms and data structures
• Storage hierarchy
• Applications
• Security and ProtectionMagnetic
Universal
Memory
Registers
Cache
CPU
Load/
Store
Direct
Access
Block
Indirect
Access
Non-volatilePooledStorage classMemory speed
Universal memory is coming
Computing shifts to a persistent
world
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.27
Summary, Continued
The Machine provides new computing architecture
Specialized SoCs + massive shared NVM pool + photonic interconnects
Many opportunities for OS and software innovation
Where to look for more information
http://www.hpl.hp.com/research/systems-research/themachine/
HP Discover 2014 talks on The Machine
• HP Labs Director Martin Fink's announcement: https://www.youtube.com/watch?v=Gxn5ru7klUQ
• Kim Keeton’s talk on technologies: https://www.youtube.com/watch?v=J6_xg3mHnng
Paolo Faraboschi’s keynote at HPCA/PPoPP/CGO
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.28
Asks from the community
1. NUMA support?
2. Hot add memory support?
3. Current state of
• OFED stack on ARMv8, does RDMA work?
• NEON autovectorization in gcc
• state of UEFI boot for ARM64
• containers and hypervisors
• Java
• Tools (equivalents to: Parallel Studio from Intel; vtune for detailed root cause performance analysis;
Intel’s TBB parallelization inspector tools)
4. gcc is not optimal for __uint128_t. generated binary is not exploiting ARMv8's "pair"
instructions. It seems it just does two 64-bit operations. This affects std::sort on __uint128_t
entries
© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Q&A
© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.30
We are hiring aggressively !
Who: regular hires, contractors, postdocs, interns (undergraduate & graduate),
….
Experience: experienced, recent graduates, and anywhere in between
Areas:
• Systems software for peta-scale NVM systems: OSes; data management; programming models;
runtimes and compiler/language support; security; manageability; RAS; QoS; system modelling
and workload characterization
• Analytics at peta-scale: frameworks for scalable big data analytics; machine learning; graph
analytics; visualization
• Networking and mobility: enterprise, data-center and cloud networks; software defined
networking; mobile cloud architectures, systems, platforms and services; mobile sensing and
context awareness
Locations: Palo Alto, Ft Collins, Bristol, Haifa, ….
For OS research: http://www.hpl.hp.com/careers/research-careers or
For OS development: http://www8.hp.com/us/en/jobs or [email protected]