building the future of 64-bit computing with armv8-a€¦ · enhancement why it matters 64-bit...
Post on 19-Oct-2020
2 Views
Preview:
TRANSCRIPT
1
Building the future of 64-bit computing
with ARMv8-A
Ian Smythe
June 2014
2
Introduction to ARMv8- A Architecture and Instruction Set Enhancements
Designing efficient systems with ARMv8-A
Driving success by building on the ARM® software ecosystem
Agenda:
3
ARM’s business model has fostered a wave of innovation in mobile devices
Advanced personal computers are becoming affordable to all
Datacentre and network operators are turning to ARM solutions to drive efficiency
Smart Mobile Device Shipments
(Smartphones and Tablets)
ARM and Gartner Estimates, CAGR figures based on 2013
Vo
lum
e in m
illio
ns
Entry-level Mid-range Premium
At The Heart Of Modern Computing
0
500
1000
1500
2000
2500
2013 2015 2018
<$150 >$400
0
10
20
30
40
50
2012 2020
Global Data Creation (Zetabytes)
Computer Science Group 2013
4
ARM7TDMI ARM1176 Cortex®-A9 Cortex-A50 series ARM926EJ
ARMv4
Increasing SoC complexity
Increasing OS complexity
Increasing choice of HW and SW
Architecture Evolution
1995 2005 2015
Virtualization
5
ARMv8-A Design Requirements
Extend OS capabilities to sub-$100 devices
Performance apps
Enhanced multimedia processing
64-bit memory addressing Virtualisation High bandwidth Enable innovation for hyperscale operators
Entry-level
Computing
High-end
Enterprise
‘Desktop Class’
Computing
6
ARMv8-A Instruction Set Enhancements
ARMv8-A is one of the most significant architecture changes in ARM’s history
ARMv8-A ARMv7-A
ARMv7-A Compatible
A32+T32 A64
CRYPTO
Scalar FP
Advanced SIMD
AArch32 AArch64
Applications
and software
ARMv8 extends ARM’s 32-bit code, whilst
being fully compatible with ARMv7
Addressing emerging software trends
AARCH32: Evolution of 32-bit
Ideal for concurrent programming
C11, C++11 Java5
More efficient, high-performance thread-safe software
Enhanced security and encryption
AARCH64: Efficient 64-bit execution Clean instruction set
Modern compiler friendly
Reduced complexity for operating systems, hypervisors
Designed to maximize reuse of existing hardware
7
ARMv8-A Architecture Designed for Efficiency Enhancement Why it Matters
64-bit architecture
Increased number and size of
general purpose registers
Efficient access to large datasets
Gains in performance and code efficiency
Large Virtual Address Space Applications not limited to 4GB memory
Large memory mapped files handled efficiently
Efficient 32-bit/64-bit architecture Common software architecture (phone, tablet, clamshell)
A single software model across the entire portfolio
Double the number and size of
NEON™ registers
Enhanced capacity of SIMD multimedia engine
Cryptography support
Over10x software encryption performance
New security models for consumer and enterprise
8
Rapidly Evolving Consumer Requirements
Secure Digital World
Business & Productivity
Natural Language
Sleeker and Slimmer form factors
9
Tablets Have Changed Computing
0
100
200
300
400
500
600
2010 2011 2012 2013 2014
Mo
bile C
om
pu
tin
g V
olu
me (
Mu
)
ARM Tablets RoW
ARM Tablets China SoC
x86 Laptops
10
ARMv8-A Networking Opportunity
Mobile
Subscriber
Office /
Home User
Industrial /
Internet of
Things
n x Cortex-A57/A53
Cortex-R/M
+ Partner IP
n x Cortex-A57
+ Partner IP
n x Cortex-A57
n x Cortex-A53
+ Partner IP
“C”-Programmable,
Software Defined,
Heterogeneous Compute
Aggregation Network
Wid
e R
ange
of D
esi
gn P
oin
ts
Core Networking & Server
Access Network
11
Enhanced Privacy, Security And Personalization
ARM security framework with TrustZone® is available in all ARMv7-A and ARMv8-A processors.
ARM security and virtualization framework is available in ARMv8-A and ARMv7-A processors launched since 2010
ARM7TDMI ARM1176 Cortex-A9 Cortex-A50 series ARM926EJ
ARMv4
Virtualization
12
Premium 64b/32b
performance
Industry leading premium
performance cores aimed at
top range 32b/64b SoCs
Mid range 32-bit mobile Premier performance for
midrange cost and power in
smartphones and consumer
devices
High Efficiency
CPUs Delivering industry
leading power efficiency
enabling low cost, low
power, and maximum
efficiency for standalone and
big.LITTLE combinations
Cortex-A7*
ARM big.LITTLETM processing solutions portfolio
ARMv8
13+ stage out-of-order
Triple Decode, Wide Multi-issue
ARMv8
8 stage in-order
Dual issue
Cortex-A53
ARMv7
Out-of-order
ARMv7
8 stage in-order
Partial Dual issue
Cortex-A7
2015 2014
Cortex-A15
Cortex-A7
Cortex-A57
Cortex-A53
Cortex-A17
Cortex-A7
ARMv7
8 stage in-order
Partial Dual issue
Cortex-A7
ARMv7
8 stage in-order
Partial Dual issue
Market Drivers
ARMv8
8 stage in-order
Dual issue
Cortex-A53
13
Highest single-threaded performance
Out-of-order, multi-issue pipeline tuned for modern workloads
Streamlined 64b architecture w/ improved 32b performance
10x speedup in encryption with new cryptography extensions
Premium Mobile feature set
Maximum performance in smart phone power envelope
big.LITTLE compatible for extended dynamic range of operation
Multicore scalability via AMBA® 4 ACE
Enterprise class feature set
64 bit , RAS/ECC, Virtualization support
High Throughput floating point unit
Multicore scalability via AMBA 5 CHI
ARM Cortex-A57: Highest Performance ARMv8-A core
14
Highest Efficiency Core with ARM V8 Architecture
In-order pipeline, balanced design
10x performance improvement for encryption and SIMD
Hardware controlled power retention
Versatile Efficient Core
Most optimized dual issue in order architecture for power performance balance
Numerous configuration options adopting to different application needs
Enhanced for Enterprise Applications
Area scales from 6.4mm2 – 0.7mm2
Enhanced high bandwidth memory subsystem
RAS support for networking work load
ARM Cortex-A53: Delivering More For Less
ARM® Cortex® -A53 MPCore
4
3
2
128-bit AMBA®4 ACE or
AMBA5 CHI Coherent Bus Interface
SCU
Cortex-A53 Core
ARMv8
32b/64b Core
NEON
SIMD engine
Floating
Point Unit
8-64K I-Cache,
Optional Parity
8-64K D-Cache,
Optional ECC
Core 1
L2 Cache w/ Optional ECC (128KB – 2MB)
CoreSight™ Multicore Debug and Trace
SPECint2000
Octane
Bbench
Cortex-A53 Cortex-A7
Cortex-A53 Performance Improvement over Cortex-A7
@ same frequency
15
0.0x
0.5x
1.0x
1.5x
2.0x
Cortex-A7 Cortex-A53 Cortex-A53
0.0x
0.5x
1.0x
1.5x
2.0x
Cortex-A15 Cortex-A57 Cortex-A57
ARMv8-A AArch64 Performance Improvements
ARMv8-A processors are fully compatible with ARMv7-A software
Existing ARMv7-A 32-bit software runs faster on today’s ARMv8-A processors
Browsing-related workloads
Same process
technology node
Same process
technology node
Target process
technology node Target process
technology node
Rela
tive p
erf
orm
an
ce
16
Building the next generation mobile computing devices
Mali-T72x
GPU L2
DMC+DDR
L2
Cortex-A53
Cortex-A57
CCI-400 Cache Coherent Interconnect
Superphone
Premium Computing Systems
Notebook performance for any screen size
2.5K tablets, 4K monitors
Virtualization and TrustZone support enable a secure,
multi-profile device
Always on user experience with improved battery life
Mass Market Computing Systems
big.LITTLE 2.4 and 4.4 configurations
1080p driving 4K, compatibility with 4K camera
Standalone quad and octa Cortex-53
platforms possible based on segment
Process technology: 28nm down to 14nm
DMC+DDR
Tablet/Clamshell
CCI-400 Cache Coherent Interconnect
MaliTM-T76x
GPU L2
Cortex-A57
L2
Cortex-A53
17
Premium Mobile System Design
GIC-500
I/O Coherent
Masters
Cortex-A57 Cortex-A53
Memory System LPDDR3 Peripherals
Media Subsystem
MMU-500
MMU-500
DRAM
VP-500DP-500
NIC-400
NIC-400
Mali T760
GPU
CoreLink CCI-400
TZC-400
18
Mid-range Mobile System Design
CCI-400
Mali T720
GPU
GIC-400
I/O Coherent
Masters
Cortex-A17 Cortex-A7
Memory System LPDDR3 Peripherals
Media Subsystem
MMU-500
MMU-500
DRAM
Mali-
V500
Mali-
DP500
NIC-400
NIC-400
TZC-400
19
Entry-Level Mobile 2015 Example System
Cortex-A53 based heterogeneous
system
2x Cortex-A53 Clusters
1x Tuned for maximum performance
point
1 x Tuned for highest efficiency
Mid-range Area Optimized GPU
4 Shader cores
CCI-400
Mali-T720
GPU
GIC-400
I/O Coherent
Masters
Cortex-A53* Cortex-A53
Memory System LPDDR3 Peripherals
Media Subsystem
DRAM
Mali-
V500
Mali-
DP500
NIC-400
NIC-400
TZC-400
20
CPU Sub-Systems in Set Top Box and OTT devices
Dual Cortex-A9
40nm, 600Mhz-1.2Ghz Quad Cortex-A7
• Single cluster design
• Low cost (die, power)
big.LITTLE Octa Cortex A15/Cortex-A7
• Dual cluster design with CCI-400
• Well suited for mixed workloads
• System coherency
Quad Cortex-A12/17
• Performance leadership in the mid-range
• Most area and cost-efficient solution
• 60% performance uplift over Cortex-A9
ARMv8 (Cortex A53/57 & ISA)
• Single or multi-cluster design
• 64-bit,32bit
• 30% increase in performance
2012 2013 2014 2015
Quad Cortex-A9
28nm, 1.4Ghz -1.2Ghz
Octa core : big.LITTLE
Cortex-A15/Cortex-A7
ARM v7 big.LITTLE
Cortex A17/ Cortex-A7
Cortex-A15/Cortex-A7
Quad Cortex-A17
28nm, 1.4Ghz -1.2Ghz
Cost effective mass production
Dual/Quad Cortex-A7
Ideal for entry-level
smart TV
ARMv8, big.LITTLE
Cortex-A57/Cortex-A53
advanced interconnect
Quad Cortex-A15
1.2-1.5Ghz
21
Infrastructure System Design
DSPDSP
ACE
NIC-400
Flash GPIO
NIC-400
USB
Interrupt Control
CoreLink GIC-500
CoreLink
DMC-520
AHB
PCIe
10-40
GbE
DPI Crypto
CoreLink™ CCN-508 Cache Coherent Network
DSP SATA
CoreLink
DMC-520
CoreLink
DMC-520
CoreLink
DMC-520
PCIe
DPI
I/O Virtualisation CoreLink MMU-500
Cortex-A57 Cortex-A57 Cortex-A57 Cortex-A57
Cortex-A53 Cortex-A53 Cortex-A53 Cortex-A53
NIC-400
Memory System DDR4/3
DRAM
22
ARMv8-A for Software and System Developers
ARM Compiler 6 for ARMv8-A
ARM Fast Models Open Source Tools SW Evolution
DS-5 Ultimate Edition
Full suite of
professional software
development tools
Full ARMv8-A support
Validated by lead
partners for > 2 years
SW Evolution
Software partners
driving more 64bit
optimizations
Test silicon available
Server Base System
Architecture
Linux Kernel and tools
Open source tools and
compilers
Linux kernel support
Custom virtual
platforms
Platform for early
software development
Reduce
time-to-market
Bring-up silicon in days
Today
23
2014 2013 2012
AOSP on 64-bit platforms History
32-bit AOSP on
an AArch64 Kernel
64-bit AOSP - VM Prototype
AArch64 Webkit
64-bit AOSP
(Jellybean)
64-bit AOSP 64bit on HW
(Jellybean)
64-bit ART Booting (KitKat)
First 64-bit Kernel patches Start of upstreaming to AOSP
Collaboration with Google on ART Begins
AArch64 Google V8 merged Code Published
Achievements
24
Enabling the developer ecosystem to build on ARM
Linux & Android
OS awareness
System-wide performance
analysis and optimization
Userspace app optimization
Kernel profiling
CPU & MCU
Mali GPU (OpenCL & OpenGL ES)
Real-world energy consumption
Comprehensive
device debug support
Off-the-shelf dev boards
Custom devices
big.LITTLE & multicore
Debug over USB & TCP/IP
Making software as efficient as the hardware it’s running on
Process & thread activity
Native app debug
Runs on ARM FVPs
Reversible debugging
25
Building on the ARM ecosystem for Android
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
7/1/2013 8/1/2013 9/1/2013 10/1/2013 11/1/2013 12/1/2013 1/1/2014 2/1/2014 3/1/2014 4/1/2014 5/1/2014
ARM Native Only ARM & x86 Native Bytecode Only Unknown
26
x86 Android App Compatibility - China
Chinese Android app ecosystem is predominately optimized for ARM.
Source: http://www.igao7.com/x86.html
27
200+ China Ecosystem Partners
Enabling the ARM software ecosystem
28
Enabling the next generation of mobile experience
Software for professionals demanding compute intensive tasks:
for e.g. modelling tools (AutoCAD)& RAW image editing
Enterprise productivity suite for word processing, data
crunching, database management and editing on the go.
Productivity and Compute Tools Tools for Professionals
Graphics capability becoming a key factor in consumer
purchasing decisions
Driving 4K video/ Digital life experience
Combine ARM Cortex and Mali processors into an efficient
unified computing subsystem
Realistic, Immersive gaming
29
The ARMv8-A architecture is a major step
forward for mobile computing
ARMv8-A processors and systems are being
deployed today
In production and in development
Bringing high end capabilities to mobile systems
and beyond
The ARMv8-A software ecosystem provides
the broadest access to devices
With uncompromising performance
A Premium Experience For All Devices
30
ARMv8-A Everywhere From entry-level smartphones to high-end servers
31
Thank you
The trademarks featured in this presentation are registered and/or unregistered trademarks of ARM Limited (or its subsidiaries) in the EU
and/or elsewhere. All rights reserved. Any other marks featured may be trademarks of their respective owners
top related