efficient(concurrency(supportroblkw.com/likamwa2015starfish-slides.pdf · facedetect(img) vision*...

36
Efficient Concurrency Support for Computer Vision Applications Robert LiKamWa Lin Zhong Rice University Starfish 1

Upload: others

Post on 10-Aug-2020

17 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Efficient(Concurrency(Supportroblkw.com/likamwa2015starfish-slides.pdf · faceDetect(img) Vision* Library Starfish Starfish Library Starfish Library. Starfish*Core Function Cache

Efficient  Concurrency  Supportfor  

Computer  Vision  Applications

Robert  LiKamWaLin  Zhong

Rice  University

Starfish

1

Page 2: Efficient(Concurrency(Supportroblkw.com/likamwa2015starfish-slides.pdf · faceDetect(img) Vision* Library Starfish Starfish Library Starfish Library. Starfish*Core Function Cache

In  the  year  2020...

2

Page 3: Efficient(Concurrency(Supportroblkw.com/likamwa2015starfish-slides.pdf · faceDetect(img) Vision* Library Starfish Starfish Library Starfish Library. Starfish*Core Function Cache

Where did I place my keys?

Oh, I haven't talked to Fred

recently...

3

Remind me to tell Lin to let me graduate!

Continuous  mobile  vision

?????

?????

Page 4: Efficient(Concurrency(Supportroblkw.com/likamwa2015starfish-slides.pdf · faceDetect(img) Vision* Library Starfish Starfish Library Starfish Library. Starfish*Core Function Cache

4

Continuous  mobile  vision

Insufficient  concurrency  support  for  vision!

Energy  consumption  overwhelms  wearable  battery!

Page 5: Efficient(Concurrency(Supportroblkw.com/likamwa2015starfish-slides.pdf · faceDetect(img) Vision* Library Starfish Starfish Library Starfish Library. Starfish*Core Function Cache

5

Application Capture  Service

Vision  Headers Vision  

Library

Page 6: Efficient(Concurrency(Supportroblkw.com/likamwa2015starfish-slides.pdf · faceDetect(img) Vision* Library Starfish Starfish Library Starfish Library. Starfish*Core Function Cache

6

Application Capture  Service

Vision  Headers Scale

Face

Page 7: Efficient(Concurrency(Supportroblkw.com/likamwa2015starfish-slides.pdf · faceDetect(img) Vision* Library Starfish Starfish Library Starfish Library. Starfish*Core Function Cache

VisionLibrary

7

VisionLibrary

VisionLibrary

We  need  efficient  concurrency  support!

Camera  is  overworkedComputation  is  overworked

200+  ms

200+  ms

200+  ms

Page 8: Efficient(Concurrency(Supportroblkw.com/likamwa2015starfish-slides.pdf · faceDetect(img) Vision* Library Starfish Starfish Library Starfish Library. Starfish*Core Function Cache

8

VisionLibrary

VisionLibrary

VisionLibrary

Observations:• Vision  apps  utilize  common  vision  library

Page 9: Efficient(Concurrency(Supportroblkw.com/likamwa2015starfish-slides.pdf · faceDetect(img) Vision* Library Starfish Starfish Library Starfish Library. Starfish*Core Function Cache

9

Face

SURF

Scale

Scale

Scale

Face

Observations:• Vision  apps  utilize  common  vision  library• Capture/Computation  is  redundant across  apps

common  primitives

Page 10: Efficient(Concurrency(Supportroblkw.com/likamwa2015starfish-slides.pdf · faceDetect(img) Vision* Library Starfish Starfish Library Starfish Library. Starfish*Core Function Cache

10

Face

SURF

Scale

Scale

Scale

Face

Observations:• Vision  apps  utilize  common  vision  library• Capture/Computation  is  redundant across  apps

common  primitives

Page 11: Efficient(Concurrency(Supportroblkw.com/likamwa2015starfish-slides.pdf · faceDetect(img) Vision* Library Starfish Starfish Library Starfish Library. Starfish*Core Function Cache

11

Share  computations,Reduce  redundancyKey  Idea:

Proposal:  Split-­‐process  design  for  centralized  control  

Scale

SURF

Face

Page 12: Efficient(Concurrency(Supportroblkw.com/likamwa2015starfish-slides.pdf · faceDetect(img) Vision* Library Starfish Starfish Library Starfish Library. Starfish*Core Function Cache

Starfish  CoreService

12

Vision  Library

Starfish  LibraryUses  vision  headers  

to  intercept  library  calls

Starfish  Core  ServiceExecutes,  tracks,  and  shareslibrary  call  computations

Vision  LibraryStarfishLibrary

StarfishLibrary

StarfishLibrary

Page 13: Efficient(Concurrency(Supportroblkw.com/likamwa2015starfish-slides.pdf · faceDetect(img) Vision* Library Starfish Starfish Library Starfish Library. Starfish*Core Function Cache

Starfish  CoreService

13

Vision  Library

Vision  LibraryStarfishLibrary

StarfishLibrary

StarfishLibrary

StarfishSplit-­‐process  library  solution  for  

efficient,  transparent  multi-­‐app  service

Page 14: Efficient(Concurrency(Supportroblkw.com/likamwa2015starfish-slides.pdf · faceDetect(img) Vision* Library Starfish Starfish Library Starfish Library. Starfish*Core Function Cache

14

First  CallExecute  call  (20-­‐200  ms)

Store  results

Subsequent  Call(s)Skip  execution (0  ms)

Retrieve  results

Starfish  Core

Vision  LibraryExec.

Function  CacheStorage/Retrieval

Function  Cache  Arg  Search

faceDetect(img)

faceDetect(img)

Vision  LibraryStarfishLibrary

StarfishLibrary

StarfishLibrary

Page 15: Efficient(Concurrency(Supportroblkw.com/likamwa2015starfish-slides.pdf · faceDetect(img) Vision* Library Starfish Starfish Library Starfish Library. Starfish*Core Function Cache

Starfish  Core

FunctionCache

15

StarfishShare  computations,Reduce  redundancy

Cache  Search

Vision  Library

Vision  LibraryStarfishLibrary

StarfishLibrary

StarfishLibrary

Page 16: Efficient(Concurrency(Supportroblkw.com/likamwa2015starfish-slides.pdf · faceDetect(img) Vision* Library Starfish Starfish Library Starfish Library. Starfish*Core Function Cache

16

Timing  optimizations:+  Reuse  "Fresh  Frames”  to  promote  cache  hits+  Delay  call  return  to  encourage  device  sleep

Starfish  Core

FunctionCache

Cache  Search

Vision  Library

Vision  LibraryStarfishLibrary

StarfishLibrary

StarfishLibrary

Page 17: Efficient(Concurrency(Supportroblkw.com/likamwa2015starfish-slides.pdf · faceDetect(img) Vision* Library Starfish Starfish Library Starfish Library. Starfish*Core Function Cache

Starfish  Core

FunctionCache

17

StarfishLibrary

StarfishLibrary

StarfishLibrary

Cache  Search

Vision  Library

+  Promote  sharing  by  reusing  "Fresh  Frames”+  Maintain  code  privacy  by  delaying  call  return+  Manage  concurrent  requests  through  fine-­‐grained  locks

Mitigating  Split-­‐Process  Overhead

Page 18: Efficient(Concurrency(Supportroblkw.com/likamwa2015starfish-slides.pdf · faceDetect(img) Vision* Library Starfish Starfish Library Starfish Library. Starfish*Core Function Cache

Starfish  Core

18

StarfishLibrary

StarfishLibrary

StarfishLibrary

FunctionCache

Cache  Search

Vision  Library

Reduce  expense  of  argument  passing

Avoid  library  object  modification

Page 19: Efficient(Concurrency(Supportroblkw.com/likamwa2015starfish-slides.pdf · faceDetect(img) Vision* Library Starfish Starfish Library Starfish Library. Starfish*Core Function Cache

Split-­‐Process  in  the  Literature

19

Zero-­‐copyRequire  library  redesign  

Object  trackingOptimized  for  code  offload

• Object  Redefinition• Lightweight  RPC• SUN  RPC

• Cloud  Transfer  • COMET• MAUI• CloneCloud

Page 20: Efficient(Concurrency(Supportroblkw.com/likamwa2015starfish-slides.pdf · faceDetect(img) Vision* Library Starfish Starfish Library Starfish Library. Starfish*Core Function Cache

Split-­‐Process  Argument  Passing

20

faceDetect( )

Goal:  Minimize  deep  copy  overhead

3.5  ms

Shared  MemoryStarfish  Library Starfish  Core

deep  copyis  expensive  

Vision  Library

Execution

Page 21: Efficient(Concurrency(Supportroblkw.com/likamwa2015starfish-slides.pdf · faceDetect(img) Vision* Library Starfish Starfish Library Starfish Library. Starfish*Core Function Cache

1)  Protected  shallow  copy

21

Issue  copy-­‐on-­‐write  (mprotect)  on  received  objects

faceDetect( )

Shared  MemoryStarfish  Library Starfish  Core

Vision  Library

Execution

CoW

CoW

Page 22: Efficient(Concurrency(Supportroblkw.com/likamwa2015starfish-slides.pdf · faceDetect(img) Vision* Library Starfish Starfish Library Starfish Library. Starfish*Core Function Cache

2)  Direct  output  marshalling

22

Write  new  data  directly into  shared  memory

faceDetect( )

Shared  MemoryStarfish  Library Starfish  Core

Vision  Library

Execution

CoW

CoW malloc()

Page 23: Efficient(Concurrency(Supportroblkw.com/likamwa2015starfish-slides.pdf · faceDetect(img) Vision* Library Starfish Starfish Library Starfish Library. Starfish*Core Function Cache

3)  Reuse  arguments  from  prior  calls

23

Track,  reuse  previous  inputs/outputs

faceDetect( )

Shared  MemoryStarfish  Library Starfish  Core

Vision  Library

Execution

CoW

CoW malloc()

Argument  Table (img,  shm_ptr)

Page 24: Efficient(Concurrency(Supportroblkw.com/likamwa2015starfish-slides.pdf · faceDetect(img) Vision* Library Starfish Starfish Library Starfish Library. Starfish*Core Function Cache

(Usually)  Zero-­‐Copy  Argument  Passing

24

+  Reduce  deep  copies  through  argument  reuse+  Reduce  allocation  through  buffer  reuse  

faceDetect( )

Shared  MemoryStarfish  Library Starfish  Core

Vision  Library

Execution

CoW

CoW malloc()

Argument  Table (img,  shm_ptr)

Page 25: Efficient(Concurrency(Supportroblkw.com/likamwa2015starfish-slides.pdf · faceDetect(img) Vision* Library Starfish Starfish Library Starfish Library. Starfish*Core Function Cache

Starfish  Optimizations• Share  Computations• Maintain  expectations    of  developers  &  users

• Increase  sharing  efficiency  by  relaxing  timing

• Decrease  redundancy• Reduce  argument  copy• Reuse  memory  buffers

25

Page 26: Efficient(Concurrency(Supportroblkw.com/likamwa2015starfish-slides.pdf · faceDetect(img) Vision* Library Starfish Starfish Library Starfish Library. Starfish*Core Function Cache

26

Experimental  Platform

Benchmarks:1)  Per-­‐call  micro-­‐benchmarks2)  Multi-­‐app  benchmarks

OMAP4430  dual-­‐core  Cortex-­‐A9  pinned  to  600  MHz

OpenCV  +  Android  +  Google  Glass  Monsoon  Power  Monitor

Page 27: Efficient(Concurrency(Supportroblkw.com/likamwa2015starfish-slides.pdf · faceDetect(img) Vision* Library Starfish Starfish Library Starfish Library. Starfish*Core Function Cache

27

First  CallNative : 21  ms

Unopt.  Starfish : 42  ms  Starfish : 24ms

Memory  optimizations  cut  Starfish  overhead  in  half

0" 5" 10" 15" 20" 25"

Receive&Outputs&

Prepare&Outputs&

Allocate&Outputs&

Exec.&Function&

Search&Cache&

Send&Inputs&

Prepare&Inputs&

Execution&time&(ms)&

Library'call'execution:'resize()Native&

Starfish&Unopt.&

Starfish&

Page 28: Efficient(Concurrency(Supportroblkw.com/likamwa2015starfish-slides.pdf · faceDetect(img) Vision* Library Starfish Starfish Library Starfish Library. Starfish*Core Function Cache

28

First  CallNative : 21  ms

Unopt.  Starfish : 42  ms  Starfish : 24ms

Second  CallNative : 21  ms

Unopt.  Starfish : 10  ms  Starfish : 6  ms  

Starfish  works  well  whennative  function  execution  >>  5  ms

0" 5" 10" 15" 20" 25"

Receive&Outputs&

Prepare&Outputs&

Allocate&Outputs&

Exec.&Function&

Search&Cache&

Send&Inputs&

Prepare&Inputs&

Execution&time&(ms)&

Library'call'execution:'resize()Native&

Starfish&Unopt.&

Starfish&

Memory  optimizations  cut  Starfish  overhead  in  half

Page 29: Efficient(Concurrency(Supportroblkw.com/likamwa2015starfish-slides.pdf · faceDetect(img) Vision* Library Starfish Starfish Library Starfish Library. Starfish*Core Function Cache

Places  where  Starfish  fails• 5-­‐6  ms  performance  overhead  per-­‐call:– Bad  for  fast  computations– Bad  with  large,  deep  arguments

• Non-­‐cacheable  functions– Random,  temporal,  external  dependencies– Functions  with  specific  parameters

29

But  great  for  many  high-­‐level  functions:Face  detect,  Corner  detect,  Image  resize

Page 30: Efficient(Concurrency(Supportroblkw.com/likamwa2015starfish-slides.pdf · faceDetect(img) Vision* Library Starfish Starfish Library Starfish Library. Starfish*Core Function Cache

Starfish  vs.  Multi-­‐App  Workload

30

Scale FaceDetect

FaceRecog

If  Lin  then  ThatSocial  LoggerFacebookGoogle+TwitterMySpaceWhatsapp

...

0.3  FPS

Page 31: Efficient(Concurrency(Supportroblkw.com/likamwa2015starfish-slides.pdf · faceDetect(img) Vision* Library Starfish Starfish Library Starfish Library. Starfish*Core Function Cache

0"0.5"1"

1.5"2"

2.5"3"

3.5"4"

4.5"5"

1" 2" 3" 4" 5" 6" 7" 8" 9" 10"

Frame&Rate&(FPS)&

Number&of&App&Instances&

Na/ve" Starfish"

When  running  multiple  apps,  Starfish  achieves  higher  performance

31

Higher  is  

better

Page 32: Efficient(Concurrency(Supportroblkw.com/likamwa2015starfish-slides.pdf · faceDetect(img) Vision* Library Starfish Starfish Library Starfish Library. Starfish*Core Function Cache

When  running  multiple  apps,Starfish  draws  single-­‐app  power

32

0"

500"

1000"

1500"

2000"

2500"

1" 2" 3" 4" 5" 6" 7" 8" 9" 10"

Power&Consump-on&

(mW)&

Number&of&App&Instances&

Na.ve" Starfish"

Loweris  

better

Page 33: Efficient(Concurrency(Supportroblkw.com/likamwa2015starfish-slides.pdf · faceDetect(img) Vision* Library Starfish Starfish Library Starfish Library. Starfish*Core Function Cache

When  running  multiple  apps,Starfish  draws  single-­‐app  power

33

0"

500"

1000"

1500"

2000"

2500"

1" 2" 3" 4" 5" 6" 7" 8" 9" 10"

Power&Consump-on&

(mW)&

Number&of&App&Instances&

Na.ve" Starfish"

Starfish  achieves  efficient  concurrency  support!  

Page 34: Efficient(Concurrency(Supportroblkw.com/likamwa2015starfish-slides.pdf · faceDetect(img) Vision* Library Starfish Starfish Library Starfish Library. Starfish*Core Function Cache

Starfish  Core

FunctionCache

34

StarfishShare  computations,Reduce  redundancy

Cache  Search

Vision  Library

Vision  LibraryStarfishLibrary

StarfishLibrary

StarfishLibrary

Page 35: Efficient(Concurrency(Supportroblkw.com/likamwa2015starfish-slides.pdf · faceDetect(img) Vision* Library Starfish Starfish Library Starfish Library. Starfish*Core Function Cache

35

Continuingmobile  vision

MitigatingAnalog

Signal  ChainBandwidth

Preserving

User/Subject

Privacy

DesigningEfficientSystems

Page 36: Efficient(Concurrency(Supportroblkw.com/likamwa2015starfish-slides.pdf · faceDetect(img) Vision* Library Starfish Starfish Library Starfish Library. Starfish*Core Function Cache

Starfish:  Share  computations,  Reduce  redundancy

36

Transparent,  efficient  library  call  caching• Timing  optimizations  for  

frame  freshness  and  performance  preservation• Memory  optimizations for  

minimal  deep  copy  and  small  cache  footprint

Google  Glass  Experiments:• Low  overhead  

Memory  optimizations  slash  overhead  in  half• Reduced  power  draw

Multi-­‐app  workloads  draw  Single  app    power