efficient(concurrency( facedetect(img) vision* library starfish starfish library starfish library....

Download Efficient(Concurrency( faceDetect(img) Vision* Library Starfish Starfish Library Starfish Library. Starfish*Core

Post on 10-Aug-2020

0 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • Efficient  Concurrency  Support for  

    Computer  Vision  Applications

    Robert  LiKamWa Lin  Zhong

    Rice  University

    Starfish

    1

  • In  the  year  2020...

    2

  • Where did I place my keys?

    Oh, I haven't talked to Fred

    recently...

    3

    Remind me to tell Lin to let me graduate!

    Continuous  mobile  vision

    ??? ??

    ??? ??

  • 4

    Continuous  mobile  vision

    Insufficient  concurrency  support  for  vision!

    Energy  consumption  overwhelms  wearable  battery!

  • 5

    Application Capture  Service

    Vision   Headers Vision  

    Library

  • 6

    Application Capture  Service

    Vision   Headers Scale

    Face

  • Vision Library

    7

    Vision Library

    Vision Library

    We  need  efficient  concurrency  support!

    Camera  is  overworked Computation  is  overworked

    200+  ms

    200+  ms

    200+  ms

  • 8

    Vision Library

    Vision Library

    Vision Library

    Observations: • Vision  apps  utilize  common  vision  library

  • 9

    Face

    SURF

    Scale

    Scale

    Scale

    Face

    Observations: • Vision  apps  utilize  common  vision  library • Capture/Computation  is  redundant across  apps

    common   primitives

  • 10

    Face

    SURF

    Scale

    Scale

    Scale

    Face

    Observations: • Vision  apps  utilize  common  vision  library • Capture/Computation  is  redundant across  apps

    common   primitives

  • 11

    Share  computations, Reduce  redundancyKey  Idea:

    Proposal:  Split-­‐process  design  for  centralized  control  

    Scale

    SURF

    Face

  • Starfish  Core Service

    12

    Vision   Library

    Starfish  Library Uses  vision  headers  

    to  intercept  library  calls

    Starfish  Core  Service Executes,  tracks,  and  shares library  call  computations

    Vision   Library Starfish Library

    Starfish Library

    Starfish Library

  • Starfish  Core Service

    13

    Vision   Library

    Vision   Library Starfish Library

    Starfish Library

    Starfish Library

    Starfish Split-­‐process  library  solution  for  

    efficient,  transparent  multi-­‐app  service

  • 14

    First  Call Execute  call  (20-­‐200  ms)

    Store  results

    Subsequent  Call(s) Skip  execution (0  ms)

    Retrieve  results

    Starfish  Core

    Vision   Library Exec.

    Function  Cache Storage/Retrieval

    Function  Cache   Arg  Search

    faceDetect(img)

    faceDetect(img)

    Vision   Library Starfish Library

    Starfish Library

    Starfish Library

  • Starfish  Core

    Function Cache

    15

    Starfish Share  computations, Reduce  redundancy

    Cache   Search

    Vision   Library

    Vision   Library Starfish Library

    Starfish Library

    Starfish Library

  • 16

    Timing  optimizations: +  Reuse  "Fresh  Frames”  to  promote  cache  hits +  Delay  call  return  to  encourage  device  sleep

    Starfish  Core

    Function Cache

    Cache   Search

    Vision   Library

    Vision   Library Starfish Library

    Starfish Library

    Starfish Library

  • Starfish  Core

    Function Cache

    17

    Starfish Library

    Starfish Library

    Starfish Library

    Cache   Search

    Vision   Library

    +  Promote  sharing  by  reusing  "Fresh  Frames” +  Maintain  code  privacy  by  delaying  call  return +  Manage  concurrent  requests  through  fine-­‐grained  locks

    Mitigating   Split-­‐Process  Overhead

  • Starfish  Core

    18

    Starfish Library

    Starfish Library

    Starfish Library

    Function Cache

    Cache   Search

    Vision   Library

    Reduce  expense  of  argument  passing

    Avoid  library  object  modification

  • Split-­‐Process  in  the  Literature

    19

    Zero-­‐copy Require  library  redesign  

    Object  tracking Optimized  for  code  offload

    • Object  Redefinition • Lightweight  RPC • SUN  RPC

    • Cloud  Transfer   • COMET • MAUI • CloneCloud

  • Split-­‐Process  Argument  Passing

    20

    faceDetect( )

    Goal:  Minimize  deep  copy  overhead

    3.5  ms

    Shared  MemoryStarfish  Library Starfish  Core

    deep  copy is  expensive  

    Vision   Library

    Execution

  • 1)  Protected  shallow  copy

    21

    Issue  copy-­‐on-­‐write  (mprotect)  on  received  objects

    faceDetect( )

    Shared  MemoryStarfish  Library Starfish  Core

    Vision   Library

    Execution

    CoW

    CoW

  • 2)  Direct  output  marshalling

    22

    Write  new  data  directly into  shared  memory

    faceDetect( )

    Shared  MemoryStarfish  Library Starfish  Core

    Vision   Library

    Execution

    CoW

    CoW malloc()

  • 3)  Reuse  arguments  from  prior  calls

    23

    Track,  reuse  previous  inputs/outputs

    faceDetect( )

    Shared  MemoryStarfish  Library Starfish  Core

    Vision   Library

    Execution

    CoW

    CoW malloc()

    Argument   Table (img,  shm_ptr)

  • (Usually)  Zero-­‐Copy  Argument  Passing

    24

    +  Reduce  deep  copies  through  argument  reuse +  Reduce  allocation  through  buffer  reuse  

    faceDetect( )

    Shared  MemoryStarfish  Library Starfish  Core

    Vision   Library

    Execution

    CoW

    CoW malloc()

    Argument   Table (img,  shm_ptr)

  • Starfish  Optimizations • Share  Computations • Maintain  expectations    of   developers  &  users

    • Increase  sharing  efficiency   by  relaxing  timing

    • Decrease  redundancy • Reduce  argument  copy • Reuse  memory  buffers

    25

  • 26

    Experimental  Platform

    Benchmarks: 1)  Per-­‐call  micro-­‐benchmarks 2)  Multi-­‐app  benchmarks

    OMAP4430  dual-­‐core  Cortex-­‐A9   pinned  to  600  MHz

    OpenCV  +  Android  +  Google  Glass   Monsoon  Power  Monitor

  • 27

    First  Call Native : 21  ms

    Unopt.  Starfish : 42  ms   Starfish : 24ms

    Memory  optimizations   cut  Starfish  overhead  in  half

    0" 5" 10" 15" 20" 25"

    Receive&Outputs&

    Prepare&Outputs&

    Allocate&Outputs&

    Exec.&Function&

    Search&Cache&

    Send&Inputs&

    Prepare&Inputs&

    Execution&time&(ms)&

    Library'call'execution:'resize() Native&

    Starfish&Unopt.&

    Starfish&

  • 28

    First  Call Native : 21  ms

    Unopt.  Starfish : 42  ms   Starfish : 24ms

    Second  Call Native : 21  ms

    Unopt.  Starfish : 10  ms   Starfish : 6  ms  

    Starfish  works  well  when native  function  execution  >>  5  ms

    0" 5" 10" 15" 20" 25"

    Receive&Outputs&

    Prepare&Outputs&

    Allocate&Outputs&

    Exec.&Function&

    Search&Cache&

    Send&Inputs&

    Prepare&Inputs&

    Execution&time&(ms)&

    Library'call'execution:'resize() Native&

    Starfish&Unopt.&

    Starfish&

    Memory  optimizations   cut  Starfish  overhead  in  half

  • Places  where  Starfish  fails • 5-­‐6  ms  performance  overhead  per-­‐call: – Bad  for  fast  computations – Bad  with  large,  deep  arguments

    • Non-­‐cacheable  functions – Random,  temporal,  external  dependencies – Functions  with  specific  parameters

    29

    But  great  for  many  high-­‐level  functions: Face  detect,  Corner  detect,  Image  resize

  • Starfish  vs.  Multi-­‐App  Workload

    30

    Scale FaceDetect Face Recog

    If  Lin  then  That Social  Logger Facebook Google+ Twitter MySpace Whatsapp

    ...

    0.3  FPS

  • 0" 0.5" 1"

    1.5" 2"

    2.5" 3"

    3.5" 4"

    4.5" 5"

    1" 2" 3" 4" 5" 6" 7" 8" 9" 10"

    Frame&Rate& (FPS)&

    Number&of&App&Instances&

    Na/ve" Starfish"

    When  running  multiple  apps,   Starfish  achieves  higher  performance

    31

    Higher   is  

    better

  • When  running  multiple  apps, Starfish  draws  single-­‐app  power

    32

    0"

    500"

    1000"

    1500"

    2000"

    2500"

    1" 2" 3" 4" 5" 6" 7" 8" 9" 10"

    Powe