Download - Adapting to a Cambrian AI/SW/HW explosion with open co-design competitions and Collective Knowledge
Community-Driven and Knowledge-Guided Optimization of AI Applications Across the Whole SW/HW Stack
or how to adapt to a Cambrian explosion inor how to adapt to a Cambrian explosion in AI / SW / HWAI / SW / HW ……
ARM Research SummitARM Research Summit Cambridge, September 2017Cambridge, September 2017
Grigori FursinGrigori Fursin CTO and coCTO and co--founder, dividiti, UKfounder, dividiti, UK
Chief Scientist, cTuning foundationChief Scientist, cTuning foundation
… with cKnowledge.org and open co… with cKnowledge.org and open co--design competitionsdesign competitions
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((2 of 24)of 24)
A race to develop innovative AI products and systems (SW & HW) …A race to develop innovative AI products and systems (SW & HW) …
Various form factors: IoT, mobile, data centers, supercomputers
Various constraints: speed, energy, accuracy, size, resiliency, costs
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((3 of 24)of 24)
… leads to a Cambrian AI/SW/HW explosion and technological chaos… leads to a Cambrian AI/SW/HW explosion and technological chaos
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((4 of 24)of 24)
Which AI/SW/HW solutions will survive?Which AI/SW/HW solutions will survive?
AI users
We at dividiti.com perform competitive analysis
and optimization of the whole AI/SW/HW stack for various realistic scenarios
(object detection, image classification, etc)
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((5 of 24)of 24)
Scenario: image classification on mobile devices
800+ distinct mobile devices mobile CPUs and GPUs Caffe, TensorFlow OpenBLAS, CLBlast, ViennaCL, Eigen AlexNet, GoogleNet, SqueezeNet ImageNet and user images
Requirement: speed vs cost (vs energy vs accuracy vs model size vs memory usage vs reliability…)
Price (euros)
Exe
cuti
on
tim
e (
sec)
Just a few winning "AI+SW+HW species"
must be optimized further or may "extinct"
Obtained using our CK-based Android app to crowdsource experiments across devices provided by volunteers (later in the talk)
cKnowledge.org/repo cKnowledge.org/ai
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((6 of 24)of 24)
Optimization is adOptimization is ad--hoc, tedious, expensive and time consuming hoc, tedious, expensive and time consuming
Mobile device Server Mobile device Server
Data centersData centers
Available libraries / skeletons Available libraries / skeletons
Compilers Compilers
Binary or byte code Binary or byte code
Hardware, simulators Hardware, simulators
Run-time environment Run-time environment
Run-time state Run-time state of the system
Inputs Inputs
Existing frameworks / algorithms Existing frameworks / algorithms
Various models Various models
User front-end (cloud, GRID, User front-end (cloud, GRID, supercomputer, etc)
Algorithm / source code Algorithm / source code
Microsoft Azure, AWS, Google Cloud, XSEDE, PRACE, Watson…
100s of models for TensorFlow,Caffe,Torch,Theano,MxNet,CNTK 100s of models for TensorFlow,Caffe,Torch,Theano,MxNet,CNTK
CUDA, MPI, OpenMP, TBB, OpenCL, StarPU, OmpSs …
C,C++,Fortran,Java,Python,byte code, assembler …
LLVM,GCC,ICC,Rose,PGI,Lift ,functional programming …
cuBLAS, BLAS,MAGMA,ViennaCL,CLBlast,cuDNN, openBLAS,
clBLAS, libDNN, tinyDNN,ARM compute lib, libxsmm, skeletons
diverse hardware: heterogeneous, out-of-order, caches
(ARM,x86,CUDA,Mali,Adreno,Power,TPU,FPGA,MIPS,AVX,neon)
Linux (CentOS, Ubuntu, RedHat, SUSE, Debian), Android, Windows, BSD, iOS, MacOS …
Too many design and optimization choices at each level of continuously changing SW/HW stack!
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((7 of 24)of 24)
Mobile device Server Mobile device Server
Data centersData centers
Available libraries / skeletons Available libraries / skeletons
Compilers Compilers
Binary or byte code Binary or byte code
Hardware, simulators Hardware, simulators
Run-time environment Run-time environment
Run-time state Run-time state of the system
Inputs Inputs
Existing frameworks / algorithms Existing frameworks / algorithms
Various models Various models
User front-end (cloud, GRID, User front-end (cloud, GRID, supercomputer, etc)
Algorithm / source code Algorithm / source code
Microsoft Azure, AWS, Google Cloud, XSEDE, PRACE, Watson…
Hundreds of models for TF, Caffe, Torch, Theano, MxNet, CNTK
CUDA, MPI, OpenMP, TBB, OpenCL, StarPU, OmpSs …
C,C++,Fortran,Java,Python,byte code, assembler …
LLVM,GCC,ICC,Rose,PGI,Lift , functional programming …
cuBLAS, BLAS,MAGMA,ViennaCL,CLBlast,cuDNN, openBLAS, clBLAS, libDNN, tinyDNN,ARM compute lib, libxsmm, skeletons
diverse hardware: heterogeneous, out-of-order, caches
(ARM,x86,CUDA,Mali,Adreno,Power,TPU,FPGA,MIPS,AVX,neon)
Linux (CentOS, Ubuntu, RedHat, SUSE, Debian), Android, Windows, BSD, iOS, MacOS …
Time to reinvent computer engineering
and enable open, collaborative and reproducible AI/SW/HW co-design!
Time to reinvent computer engineering
and enable open, collaborative and reproducible AI/SW/HW co-design!
Optimization is adOptimization is ad--hoc, tedious, expensive and time consuming hoc, tedious, expensive and time consuming
Too many design and optimization choices at each level of continuosly changing SW/HW stack!
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((8 of 24)of 24)
cKnowledge.org: cKnowledge.org: pluginplugin--based workflow framework to cobased workflow framework to co--design AI/SW/HW stackdesign AI/SW/HW stack
Grigori Fursin, Anton Lokhmotov, Ed Plowman, "Collective Knowledge: towards R&D sustainability", DATE'16
Available libraries / skeletons Available libraries / skeletons
Compilers Compilers
Binary or byte code Binary or byte code
Hardware, simulators Hardware, simulators
Run-time environment Run-time environment
Run-time state Run-time state of the system
Inputs Inputs Various models Various models
Algorithm / source code Algorithm / source code
AI framework AI framework
Common JSON API Common JSON API
Initial funding (2015)
Common experimental framework for computer engineering and AI research
https://github.com/ctuning/ck
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((9 of 24)of 24)
Repositories with reusable and customizable artifacts (JSON API and meta info)Repositories with reusable and customizable artifacts (JSON API and meta info)
Unified models Unified models
CK JSON API CK JSON API
CK meta CK meta MobileNets
GoogleNet GoogleNet
AlexNet
SqueezeNet SqueezeNet
ResNet ResNet
CK meta CK meta
CK meta CK meta
CK meta CK meta
CK meta CK meta
AI frameworks AI frameworks
CK JSON API CK JSON API
CK meta CK meta TensorFlow
Caffe
Caffe2
CNTK
MxNet MxNet
CK meta CK meta
CK meta CK meta
CK meta CK meta
CK meta CK meta … …
…
Available libraries / skeletons Available libraries / skeletons
Compilers Compilers
Binary or byte code Binary or byte code
Hardware, simulators Hardware, simulators
Run-time environment Run-time environment
Run-time state Run-time state of the system
Inputs Inputs Various models Various models
Algorithm / source code Algorithm / source code
AI framework AI framework
Common JSON API Common JSON API
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((10 of 24)of 24)
Unified models Unified models
CK JSON API CK JSON API
AI frameworks AI frameworks
CK JSON API CK JSON API
… …
CK API CK API
Image classification
Image classification
CK API CK API
Object detection
Object detection
CK API CK API
Emotion Emotion analysis
Available libraries / skeletons Available libraries / skeletons
Compilers Compilers
Binary or byte code Binary or byte code
Hardware, simulators Hardware, simulators
Run-time environment Run-time environment
Run-time state Run-time state of the system
Inputs Inputs Various models Various models
Algorithm / source code Algorithm / source code
AI framework AI framework
Common JSON API Common JSON API
Repositories with reusable and customizable workflows (JSON API)Repositories with reusable and customizable workflows (JSON API)
CK meta CK meta MobileNets
GoogleNet GoogleNet
AlexNet
SqueezeNet SqueezeNet
ResNet ResNet
CK meta CK meta
CK meta CK meta
CK meta CK meta
CK meta CK meta
CK meta CK meta TensorFlow
Caffe
Caffe2
CNTK
MxNet MxNet
CK meta CK meta
CK meta CK meta
CK meta CK meta
CK meta CK meta
…
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((11 of 24)of 24)
Available libraries / skeletons Available libraries / skeletons
Compilers Compilers
Binary or byte code Binary or byte code
Hardware, simulators Hardware, simulators
Run-time environment Run-time environment
Run-time state Run-time state of the system
Inputs Inputs Various models Various models
Algorithm / source code Algorithm / source code
AI framework AI framework
Common JSON API Common JSON API
Unified models Unified models
CK JSON API CK JSON API
AI frameworks AI frameworks
CK JSON API CK JSON API
… …
CK API CK API
Image classification
Image classification
CK API CK API
Object detection
Object detection
CK API CK API
Emotion Emotion analysis
Crowdsource AI expeirments
across diverse platforms provided by volunteers
ContinuousContinuous competition of competition of various AI/SW/HW combinations various AI/SW/HW combinations ((species)species)
cKnowledge.org/repo
Everyone is on the same page: fair and reproducible competitions
CK meta CK meta MobileNets
GoogleNet GoogleNet
AlexNet
SqueezeNet SqueezeNet
ResNet ResNet
CK meta CK meta
CK meta CK meta
CK meta CK meta
CK meta CK meta
CK meta CK meta TensorFlow
Caffe
Caffe2
CNTK
MxNet MxNet
CK meta CK meta
CK meta CK meta
CK meta CK meta
CK meta CK meta
…
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((12 of 24)of 24)
CK concepts: convert your artifacts into reusable and customizable componentsCK concepts: convert your artifacts into reusable and customizable components
setup setup soft soft
find find
extract features extract features dataset dataset
compile compile
run run
add add
replay replay experiment experiment
autotune autotune
program program
TensorFlow TensorFlow
Caffe2 Caffe2
ARM compute lib ARM compute lib
image classification image classification
object detection object detection
ImageNet ImageNet
Car video stream Car video stream
Real surveillance camera Real surveillance camera
GEMM OpenCL GEMM OpenCL
convolution CPU convolution CPU
performance results performance results
training / accuracy training / accuracy
bugs bugs
with some desc. with some desc.
with some desc. with some desc.
with some desc. with some desc.
with some desc. with some desc.
with some desc. with some desc.
with some desc. with some desc.
with some desc. with some desc.
with some desc. with some desc.
with some desc. with some desc.
with some desc. with some desc.
Ad-hoc scripts to perform some actions on some artifacts
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((13 of 24)of 24)
CK concepts: convert your artifacts into reusable and customizable componentsCK concepts: convert your artifacts into reusable and customizable components
setup soft
find
extract features extract features dataset
compile
run
add
replay experiment
autotune
program
TensorFlow TensorFlow
Caffe2 Caffe2
ARM compute lib ARM compute lib
image classification image classification
object detection object detection
ImageNet ImageNet
Car video stream Car video stream
Real surveillance camera Real surveillance camera
GEMM OpenCL GEMM OpenCL
convolution CPU convolution CPU
performance results performance results
training / accuracy training / accuracy
bugs bugs
JSON file JSON file
JSON file JSON file
JSON file JSON file
JSON file JSON file
JSON file JSON file
JSON file JSON file
JSON file JSON file
JSON file JSON file
JSON file JSON file
JSON file JSON file
/ 1st level directory – CK modules / 2nd level dir - CK entries / CK meta info
Python module Python module JSON API JSON API holder for original artifact holder for original artifact CK meta CK meta
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((14 of 24)of 24)
CK concepts: convert your artifacts into reusable and customizable componentsCK concepts: convert your artifacts into reusable and customizable components
setup soft
find
extract features extract features dataset
compile
run
add
replay experiment
autotune
program
TensorFlow TensorFlow
Caffe2 Caffe2
ARM compute lib ARM compute lib
image classification image classification
object detection object detection
ImageNet ImageNet
Car video stream Car video stream
Real surveillance camera Real surveillance camera
GEMM OpenCL GEMM OpenCL
convolution CPU convolution CPU
performance results performance results
training / accuracy training / accuracy
bugs bugs
JSON file JSON file
JSON file JSON file
JSON file JSON file
JSON file JSON file
JSON file JSON file
JSON file JSON file
JSON file JSON file
JSON file JSON file
JSON file JSON file
JSON file JSON file
/ 1st level directory – CK modules / 2nd level dir - CK entries / CK meta info
Python module Python module JSON API JSON API holder for original artifact holder for original artifact CK meta CK meta
Collective Knowledge (github.com/ctuning/ck) –
$ $ ck pull $ ck add $ ck compile $ ck run
Collective Knowledge (github.com/ctuning/ck) – assists you in unifying, executing, sharing and reusing your artifacts:
$ sudo pip install ck $ ck pull repo:ck-autotuning $ ck add dataset:my-new-dataset (UID will be automatically generated) $ ck compile program:cbench-automotive-susan $ ck run program:cbench-automotive-susan
https://github.com/ctuning/ck/wiki/Shared-modules
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((15 of 24)of 24)
We already converted multiple AI frameworks, artifacts and workflows to the CKWe already converted multiple AI frameworks, artifacts and workflows to the CK
ICC 17.0
CUDA 8.0CUDA 8.0
GCC 7.0
LLVM 4.0
Databases, local repositoriesDatabases, local repositories
Ad
-ho
c in
it
Ad
-ho
c in
it
scri
pts
Ad-hoc scripts to
process CSV, XLS, TXT, etc.
Ad-hoc experimental workflows
Pro
gram
Pro
gram
C
K p
rog
ram
CK
pip
elin
e
CK compiler
CK AI framework
CK math library CK experiment
Caffe OpenCL
Caffe CUDACaffe CUDA
TensorFlow TensorFlow CPU/CUDA
MAGMA
cuBLAS
OpenBLASOpenBLAS
ViennaCL
CLBlast Stat. analysis, predictive analytics,
visualization
• github.com/dividiti/ck-caffe • github.com/ctuning/ck-caffe2 • github.com/ctuning/ck-tensorflow
$ ck pull repo –url= github.com/dividiti/ck-caffe
$ ck compile program:caffe-classification
$ ck run program:caffe-classification
https://github.com/ctuning/ck/wiki/Shared-repos
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((16 of 24)of 24)
We've already converted multiple AI frameworks, artifacts and workflows to the CKWe've already converted multiple AI frameworks, artifacts and workflows to the CK
ICC 17.0
CUDA 8.0CUDA 8.0
GCC 7.0
LLVM 4.0
Databases, local repositoriesDatabases, local repositories
Ad
-ho
c in
it
Ad
-ho
c in
it
scri
pts
Ad-hoc scripts to
process CSV, XLS, TXT, etc.
Un
ifie
d A
PI (
inp
ut)
U
nif
ied
AP
I (in
pu
t) Read
program Read
program meta
Detect all software Detect all software dependencies; ask user
If multiple versions exists
Prepare environment
Compile Compile program
Run program
Un
ifie
d A
PI (
ou
tpu
t)
Un
ifie
d A
PI (
ou
tpu
t)
Ad-hoc experimental workflows
Pro
gram
Pro
gram
C
K p
rog
ram
CK
pip
elin
e
CK compiler
CK AI framework
CK math library CK experiment
JSON JSON
CK program module can automatically adapt
to underlying environment via dependencies
Source files and auxiliary scripts Source files and auxiliary scripts
CK program entry (native directory) CK program entry (native directory)
.cm/meta.json – describes soft dependencies ,
data sets, and how to compile and run this program
.cm/meta.json – describes soft dependencies ,
data sets, and how to compile and run this program
CK entries associated with a given module describe a given object
using meta.json while storing all necessary files and sub-directories
Caffe OpenCL
Caffe CUDACaffe CUDA
TensorFlow TensorFlow CPU/CUDA
MAGMA
cuBLAS
OpenBLASOpenBLAS
ViennaCL
CLBlast Stat. analysis, predictive analytics,
visualization
• github.com/dividiti/ck-caffe • github.com/ctuning/ck-caffe2 • github.com/ctuning/ck-tensorflow
$ ck pull repo –url= github.com/dividiti/ck-caffe
$ ck compile program:caffe-classification
$ ck run program:caffe-classification
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((17 of 24)of 24)
Automatically adapting workflow to any underlying software and hardware
local / env / 03ca0be16962f471 / env.sh
Tags: compiler,cuda,v8.0
local / env / 03ca0be16962f471 / env.sh
Tags: compiler,cuda,v8.0
local / env / 0a5ba198d48e3af3 / env.bat
Tags: lib,blas,cublas,v8.0
local / env / 0a5ba198d48e3af3 / env.bat
Tags: lib,blas,cublas,v8.0
Soft entries in CK describe how to detect if a given software is
already installed, how to set up all its environment including
all paths (to binaries, libraries, include, aux tools, etc),
and how to detect its version
$ ck detect soft --tags=compiler,cuda $ ck detect soft --tags=compiler,cuda
$ ck detect soft:compiler.gcc $ ck detect soft:compiler.gcc
$ ck detect soft:compiler.llvm $ ck detect soft:compiler.llvm
$ ck list soft:compiler* $ ck list soft:compiler*
$ ck detect soft:lib.cublas $ ck detect soft:lib.cublas
Env entries are created in CK local repo for all found software
instances together with their meta and an auto-generated environment
script env.sh (on Linux) or env.bat (on Windows)
Package entries describe how to install a given software if it is not already installed (using CK Python
plugin together with install.sh script on Linux host or install.bat
on Windows host)
$ ck install package:caffemodel-bvlc-googlenet $ ck install package:caffemodel-bvlc-googlenet
$ ck install package:imagenet-2012-val $ ck install package:imagenet-2012-val
$ ck install package:lib-tensorflow-cuda $ ck install package:lib-tensorflow-cuda
$ ck list package:*caffemodel* $ ck list package:*caffemodel*
Lo
cal C
K r
epo
L
oca
l CK
rep
o
$ ck search soft --tags=blas $ ck search soft --tags=blas
$ ck show env $ ck show env
$ ck show env –tags=cublas $ ck show env –tags=cublas
$ ck rm env:* –tags=cublas $ ck rm env:* –tags=cublas
$ ck search package –tags=caffe $ ck search package –tags=caffe
$ ck list package:*tensorflow* $ ck list package:*tensorflow* $ ck install package:lib-caffe-bvlc-master-cuda-universal $ ck install package:lib-caffe-bvlc-master-cuda-universal
https://github.com/ctuning/ck/wiki/Portable-workflows
Multiple versions of tools may easily co-exist and plugged in to CK workflows!
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((18 of 24)of 24)
Applying methodology from natural sciences to optimize computer systems
https://github.com/ctuning/ck/wiki/Autotuning
CK Python modules (wrappers) with a unified JSON API
CK
inp
ut
(JSO
N/d
ict)
C
K o
utp
ut
(JSO
N/d
ict)
Unified input
Behavior Behavior
Choices Choices
Features Features
State State
Action Action
Unified output
Behavior Behavior
Choices Choices
Features Features
State State
b = B( c , f , s ) … … … …
Formalized function B of a behavior of any CK object
Flattened CK JSON vectors (dict converted to vector)
to simplify statistical analysis, machine learning and data mining
Some
actions
Tools (compilers, profilers, etc)Tools (compilers, profilers, etc) Generated filesGenerated files
Chain CK modules to implement research workflows such as multi-objective autotuning and co-design
exploration Choose
exploration strategy
Perform SW/HW DSE Perform SW/HW DSE (math transforms, skeleton params,
compiler flags, transformations …)
Perform
Perform stat.
analysis
Detect (Pareto) frontier
Model
optimizations
Model behavior,
predict optimizations
Reduce
complexity
Set Set environment
for a given tool version
CK program module with pipeline function
Compile Compile program
Run code
i
i
i i
First expose coarse grain high-level choices, features, system state and behavior characteristics
Crowdsource benchmarking and random exploration across diverse inputs and devices;
Keep best species (AI/SW/HW choices); model behavior; predict better optimizations and designs
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((19 of 24)of 24)
Prepare first proof-of-concept community experiments
Available libraries / skeletons Available libraries / skeletons
Compilers Compilers
Binary or byte code Binary or byte code
Hardware, simulators Hardware, simulators
Run-time environment Run-time environment
Run-time state Run-time state of the system
Inputs Inputs Various models Various models
Algorithm / source code Algorithm / source code
AI framework AI framework
Algorithms: object classification, object detection
AI frameworks: Caffe CPU, Caffe OpenCL, TensorFlow CPU
Math libraries: OpenBLAS, ViennaCL, clBLAS, CLBlast, cuBLAS, cuDNN, Eigen, gemmlowp
Compilers: GCC 5+
Models: AlexNet, GoogleNet, VGG, ResNet, SqueezeNet, SqueezeDet, SSD
Datasets: KITTI, COCO, VOC, ImageNet
Optimization choices: batch size, number of CPU threads
Characteristics: total execution time (including OpenCL overheads), top1/top5 model accuracy, static model size (MB), device cost, max power consumption (if available)
System state: CPU/GPU frequency, memory
cKnowledge.org/repo
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((20 of 24)of 24)
Crowdsource benchmarking across Android devices provided by volunteers
Continuously collect statistics, bugs and misclassifications at cKnowledge.org/repo
The number of distinct participated platforms:800+
The number of distinct CPUs: 260+
The number of distinct GPUs: 110+
The number of distinct OS: 280+
Power range: 1-10W
No need for a dedicated and expensive cloud –
volunteers help us validate research ideas
similar to SETI@HOME
Also collecting real images from users for misclassifications to build an open
and continuously updated training set)!
Winning solutions on various frontiers
Tim
e p
er
imag
e (
seco
nd
s)
Cost(euros)
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((21 of 24)of 24)
Crowdsource benchmarking across Android devices provided by volunteers
Continuously collect statistics, bugs and misclassifications at cKnowledge.org/repo
Winning solutions on various frontiers
Firefly-RK3399
The number of distinct participated platforms:790+
The number of distinct CPUs: 260+
The number of distinct GPUs: 110+
The number of distinct OS: 280+
Power range: 1-10W
No need for a dedicated and expensive cloud –
volunteers help us validate research ideas
similar to SETI@HOME
Also collecting real images from users for misclassifications to build an open
and continuously updated training set)!
Tim
e p
er
imag
e (
seco
nd
s)
Cost(euros)
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((22 of 24)of 24)
Let's dig further – (crowdsource) BLAS autotuning in Caffe on Firefly-RK3399
Collaboration between Marco Cianfriglia (Roma Tre University), Cedric Nugteren (TomTom), Flavio Vella, Anton Lokhmotov and Grigori Fursin (dividiti)
Name Description Ranges
KWG 2D tiling at workgroup level {32,64}
KWI KWG kernel-loop can be unrolled by a factor KWI {1}
MDIMA Local Memory Re-shape {4,8}
MDIMC Local Memory Re-shape {8, 16, 32}
MWG 2D tiling at workgroup level {32, 64, 128}
NDIMB Local Memory Re-shape {8, 16, 32}
NDIMC Local Memory Re-shape {8, 16, 32}
NWG 2D tiling at workgroup level {16, 32}
SA manual caching using the local memory {0, 1}
SB manual caching using the local memory {0, 1}
STRM Striding within single thread for matrix A and C {0,1}
STRN Striding within single thread for matrix B {0,1}
VWM Vector width for loading A and C {8,16}
VWN Vector width for loading B {0,1}
Tunable parameters of OpenCL-based BLAS ( github.com/CNugteren/CLBlast )
For now only two data sets (small & large)
Some extra constraints to avoid illegal combinations
Use different autotuners under CK to speed up
design space exploration based on probabilistic
focused search, generic algorithms,
deep learning, SVM, KNN, MARS, decision trees …
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((23 of 24)of 24)
Let's dig further – autotuning BLAS (CLBlast) in Caffe on Firefly-RK3399
• Caffe with autotuned OpenBLAS (threads and batches) is the fastest • Caffe with autotuned CLBlast is 6..7x faster than default version and competitive with
OpenBLAS-based version– now worth making adaptive selection at run-time.
Sharing results in a reproducible way with the community for validation and improvement: https://nbviewer.jupyter.org/github/dividiti/ck-caffe-firefly-rk3399/ blob/master/script/batch_size-libs-models/analysis.20170531.ipynb
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((24 of 24)of 24)
• Bring together industry and academia to participate in open and reproducible AI/SW/HW co-design competitions using CK framework • Share more artifacts, workflows and results in a reusable and customizable CK format (common JSON API and meta description) • Collaboratively improve models and find missing features • Gradually expose more design and optimization knobs at all AI/SW/HW levels • Enable distributed on-line learning for self-optimizing and self-learning systems
http://cKnowledge.org/partners http://cKnowledge.org/publications
Join the growing Collective Knowledge community!