advanced techniques: size | pebble developer retreat 2014
DESCRIPTION
You can find the video recording here: https://www.youtube.com/watch?v=8tOhdUXcSkw Heiko Behrens and Matthew Hungerford talk about advanced programming techniques for Pebble. This talk focused on size to optimize pebble apps for code size, heap space, and advice on use of floating point. The Mandelbrot demo and XKCD app were featured. Day 1 - Video 3ATRANSCRIPT
SIZE
MATT & HEIKO – DEVELOPER EXPERIENCE ENGINEERSON TWITTER AS @MTHUNGERFORD & @HBEHRENS
AVAILABLE MEMORY WHEN DEVELOPING FOR PEBBLE
app memory usage:=============Total footprint in RAM: 9596 bytes / ~24kbFree RAM available (heap): 14980 bytes
> pebble build
AVAILABLE MEMORY WHEN DEVELOPING FOR PEBBLE
app memory usage:=============Total footprint in RAM: 9596 bytes / ~24kbFree RAM available (heap): 14980 bytesR
AM
Heap
Code Static Variables
MEMORY – WHERE GOES WHAT?R
AM
Heap
Code Static Variables A
B
C
Stack
C
C
A
BA
static TextLayer *text_layer; !static void window_load(Window *window) { Layer *window_layer = window_get_root_layer(window); GRect bounds = layer_get_bounds(window_layer); ! text_layer = text_layer_create(…); text_layer_set_text(text_layer, "Press a button"); text_layer_set_text_alignment(text_layer, GTextAlignmentCenter); layer_add_child(window_layer, text_layer_get_layer(text_layer)); }
AVAILABLE MEMORY WHEN DEVELOPING FOR PEBBLE
Stack
Heap
Code Static Variables
function calls local variables arrays of dynamic size{
RA
M
RA
MStack
Heap
Code Static Variables
2k
24k}
processor 64MHzARM Cortex-M3 core 512 kilobytes flash128 kilobytes RAM
serial flash 4 megabytes bluetooth radio
(hidden on back) Bluetooth 2 + BLE
accelerometer
ambient light sensor
magnetometer
display connector
battery
vibrating motor
RAM
RA
M
Ker
nel
Data
Stacks
Heap
Wor
ker Stack
Code + HeapA
pp
App State
Stack
Code + Heap
84k
12k
32k2k24k
128K RAM – WHY ONLY 24K FOR MY APP?
Your App Background Worker
1.3k
96k N/A
RA
MFl
ash
Pho
neStack
Heap
Code Static Variables
Resources
Persistent Storage
JavaScript or
Native Code
2k
24k
4k
10k
∞ N/A
malloc/free and *_create/*_destroy
persist_read_data, persist_write_data
gbitmap_create_with_resource resource_load e.g. strings on CalTrain
app_message_register_inbox_received
RA
MFl
ash
Pho
neStack
Heap
Code Static Variables
Resources
Persistent Storage
JavaScript or
Native Code
CODE SIZE
RA
MFl
ash
Pho
neStack
Heap
Code Static Variables
Resources
Persistent Storage
JavaScript or
Native Code
GUESS THE SIZE
uses_int_mul(): a4: 2303 movs r3, #3 a6: 4358 muls r0, r3 a8: f7ff bff6 b.w 98 <use_value>
static void uses_int_mul(uint8_t i) { use_value(i * 3); }
app memory usage:=============Total footprint in RAM: 266 bytes / ~24kb
8 BYTES
> arm-none-eabi-objdump -dslw build/pebble-app.elf
i * 1.5
INTRODUCING DOUBLES
i *
GUESS THE SIZE
static void uses_doubles(uint8_t i) { use_value(i * 1.5); }
app memory usage:=============Total footprint in RAM: 2294 bytes / ~24kb
32 BYTESuses_doubles(): 878: b508 push {r3, lr} 87a: f7ff fd5f bl 33c <__aeabi_i2d> 87e: 2200 movs r2, #0 880: 4b04 ldr r3, [pc, #16] ; (894 <uses_doubles2+0x1c>) 882: f7ff fdc1 bl 408 <__aeabi_dmul> 886: f7ff ffd1 bl 82c <__aeabi_d2uiz> 88a: e8bd 4008 ldmia.w sp!, {r3, lr} 88e: f7ff bfed b.w 86c <use_value> 892: bf00 nop 894: 3ff80000 .word 0x3ff80000
__aeabi_i2d(): 33c: f090 0f00 teq r0, #0 340: bf04 itt eq 342: 2100 moveq r1, #0 344: 4770 bxeq lr 346: b530 push {r4, r5, lr} 348: f44f 6480 mov.w r4, #1024 ; 0x400 34c: f104 0432 add.w r4, r4, #50 ; 0x32 350: f010 4500 ands.w r5, r0, #2147483648 ; 0x80000000 354: bf48 it mi 356: 4240 negmi r0, r0 358: f04f 0100 mov.w r1, #0 35c: e73e b.n 1dc <__adddf3+0x138> 35e: bf00 nop
34 BYTES
a4: b530 push {r4, r5, lr} a6: ea4f 0441 mov.w r4, r1, lsl #1 aa: ea4f 0543 mov.w r5, r3, lsl #1 ae: ea94 0f05 teq r4, r5 b2: bf08 it eq b4: ea90 0f02 teqeq r0, r2 b8: bf1f itttt ne ba: ea54 0c00 orrsne.w ip, r4, r0 be: ea55 0c02 orrsne.w ip, r5, r2 c2: ea7f 5c64 mvnsne.w ip, r4, asr #21 c6: ea7f 5c65 mvnsne.w ip, r5, asr #21 ca: f000 80e2 beq.w 292 <__adddf3+0x1ee> ce: ea4f 5454 mov.w r4, r4, lsr #21 d2: ebd4 5555 rsbs r5, r4, r5, lsr #21 d6: bfb8 it lt d8: 426d neglt r5, r5 da: dd0c ble.n f6 <__adddf3+0x52> dc: 442c add r4, r5 de: ea80 0202 eor.w r2, r0, r2 e2: ea81 0303 eor.w r3, r1, r3 e6: ea82 0000 eor.w r0, r2, r0 ea: ea83 0101 eor.w r1, r3, r1 ee: ea80 0202 eor.w r2, r0, r2 f2: ea81 0303 eor.w r3, r1, r3 f6: 2d36 cmp r5, #54 ; 0x36 f8: bf88 it hi fa: bd30 pophi {r4, r5, pc} fc: f011 4f00 tst.w r1, #2147483648 ; 0x80000000 100: ea4f 3101 mov.w r1, r1, lsl #12 104: f44f 1c80 mov.w ip, #1048576 ; 0x100000 108: ea4c 3111 orr.w r1, ip, r1, lsr #12 10c: d002 beq.n 114 <__adddf3+0x70> 10e: 4240 negs r0, r0 110: eb61 0141 sbc.w r1, r1, r1, lsl #1 114: f013 4f00 tst.w r3, #2147483648 ; 0x80000000 118: ea4f 3303 mov.w r3, r3, lsl #12 11c: ea4c 3313 orr.w r3, ip, r3, lsr #12 120: d002 beq.n 128 <__adddf3+0x84> 122: 4252 negs r2, r2 124: eb63 0343 sbc.w r3, r3, r3, lsl #1 128: ea94 0f05 teq r4, r5 12c: f000 80a7 beq.w 27e <__adddf3+0x1da> 130: f1a4 0401 sub.w r4, r4, #1 134: f1d5 0e20 rsbs lr, r5, #32 138: db0d blt.n 156 <__adddf3+0xb2> 13a: fa02 fc0e lsl.w ip, r2, lr 13e: fa22 f205 lsr.w r2, r2, r5 142: 1880 adds r0, r0, r2 144: f141 0100 adc.w r1, r1, #0 148: fa03 f20e lsl.w r2, r3, lr 14c: 1880 adds r0, r0, r2 14e: fa43 f305 asr.w r3, r3, r5 152: 4159 adcs r1, r3 154: e00e b.n 174 <__adddf3+0xd0> 156: f1a5 0520 sub.w r5, r5, #32 15a: f10e 0e20 add.w lr, lr, #32 15e: 2a01 cmp r2, #1 160: fa03 fc0e lsl.w ip, r3, lr 164: bf28 it cs 166: f04c 0c02 orrcs.w ip, ip, #2 16a: fa43 f305 asr.w r3, r3, r5 16e: 18c0 adds r0, r0, r3 170: eb51 71e3 adcs.w r1, r1, r3, asr #31 174: f001 4500 and.w r5, r1, #2147483648 ; 0x80000000 178: d507 bpl.n 18a <__adddf3+0xe6> 17a: f04f 0e00 mov.w lr, #0 17e: f1dc 0c00 rsbs ip, ip, #0 182: eb7e 0000 sbcs.w r0, lr, r0 186: eb6e 0101 sbc.w r1, lr, r1 18a: f5b1 1f80 cmp.w r1, #1048576 ; 0x100000 18e: d31b bcc.n 1c8 <__adddf3+0x124> 190: f5b1 1f00 cmp.w r1, #2097152 ; 0x200000 194: d30c bcc.n 1b0 <__adddf3+0x10c> 196: 0849 lsrs r1, r1, #1 198: ea5f 0030 movs.w r0, r0, rrx 19c: ea4f 0c3c mov.w ip, ip, rrx 1a0: f104 0401 add.w r4, r4, #1 1a4: ea4f 5244 mov.w r2, r4, lsl #21 1a8: f512 0f80 cmn.w r2, #4194304 ; 0x400000 1ac: f080 809a bcs.w 2e4 <__adddf3+0x240> 1b0: f1bc 4f00 cmp.w ip, #2147483648 ; 0x80000000 1b4: bf08 it eq 1b6: ea5f 0c50 movseq.w ip, r0, lsr #1 1ba: f150 0000 adcs.w r0, r0, #0 1be: eb41 5104 adc.w r1, r1, r4, lsl #20 1c2: ea41 0105 orr.w r1, r1, r5 1c6: bd30 pop {r4, r5, pc} 1c8: ea5f 0c4c movs.w ip, ip, lsl #1 1cc: 4140 adcs r0, r0 1ce: eb41 0101 adc.w r1, r1, r1 1d2: f411 1f80 tst.w r1, #1048576 ; 0x100000 1d6: f1a4 0401 sub.w r4, r4, #1 1da: d1e9 bne.n 1b0 <__adddf3+0x10c> 1dc: f091 0f00 teq r1, #0 1e0: bf04 itt eq 1e2: 4601 moveq r1, r0 1e4: 2000 moveq r0, #0 1e6: fab1 f381 clz r3, r1 1ea: bf08 it eq 1ec: 3320 addeq r3, #32 1ee: f1a3 030b sub.w r3, r3, #11 1f2: f1b3 0220 subs.w r2, r3, #32 1f6: da0c bge.n 212 <__adddf3+0x16e> 1f8: 320c adds r2, #12 1fa: dd08 ble.n 20e <__adddf3+0x16a> 1fc: f102 0c14 add.w ip, r2, #20 200: f1c2 020c rsb r2, r2, #12 204: fa01 f00c lsl.w r0, r1, ip 208: fa21 f102 lsr.w r1, r1, r2 20c: e00c b.n 228 <__adddf3+0x184> 20e: f102 0214 add.w r2, r2, #20 212: bfd8 it le 214: f1c2 0c20 rsble ip, r2, #32 218: fa01 f102 lsl.w r1, r1, r2 21c: fa20 fc0c lsr.w ip, r0, ip 220: bfdc itt le 222: ea41 010c orrle.w r1, r1, ip 226: 4090 lslle r0, r2 228: 1ae4 subs r4, r4, r3 22a: bfa2 ittt ge 22c: eb01 5104 addge.w r1, r1, r4, lsl #20 230: 4329 orrge r1, r5 232: bd30 popge {r4, r5, pc} 234: ea6f 0404 mvn.w r4, r4 238: 3c1f subs r4, #31 23a: da1c bge.n 276 <__adddf3+0x1d2> 23c: 340c adds r4, #12 23e: dc0e bgt.n 25e <__adddf3+0x1ba> 240: f104 0414 add.w r4, r4, #20 244: f1c4 0220 rsb r2, r4, #32 248: fa20 f004 lsr.w r0, r0, r4 24c: fa01 f302 lsl.w r3, r1, r2 250: ea40 0003 orr.w r0, r0, r3 254: fa21 f304 lsr.w r3, r1, r4 258: ea45 0103 orr.w r1, r5, r3 25c: bd30 pop {r4, r5, pc} 25e: f1c4 040c rsb r4, r4, #12 262: f1c4 0220 rsb r2, r4, #32 266: fa20 f002 lsr.w r0, r0, r2 26a: fa01 f304 lsl.w r3, r1, r4 26e: ea40 0003 orr.w r0, r0, r3 272: 4629 mov r1, r5 274: bd30 pop {r4, r5, pc} 276: fa21 f004 lsr.w r0, r1, r4 27a: 4629 mov r1, r5 27c: bd30 pop {r4, r5, pc} 27e: f094 0f00 teq r4, #0 282: f483 1380 eor.w r3, r3, #1048576 ; 0x100000 286: bf06 itte eq 288: f481 1180 eoreq.w r1, r1, #1048576 ; 0x100000 28c: 3401 addeq r4, #1 28e: 3d01 subne r5, #1 290: e74e b.n 130 <__adddf3+0x8c> 292: ea7f 5c64 mvns.w ip, r4, asr #21 296: bf18 it ne 298: ea7f 5c65 mvnsne.w ip, r5, asr #21 29c: d029 beq.n 2f2 <__adddf3+0x24e> 29e: ea94 0f05 teq r4, r5 2a2: bf08 it eq 2a4: ea90 0f02 teqeq r0, r2 2a8: d005 beq.n 2b6 <__adddf3+0x212> 2aa: ea54 0c00 orrs.w ip, r4, r0 2ae: bf04 itt eq 2b0: 4619 moveq r1, r3 2b2: 4610 moveq r0, r2 2b4: bd30 pop {r4, r5, pc} 2b6: ea91 0f03 teq r1, r3 2ba: bf1e ittt ne 2bc: 2100 movne r1, #0 2be: 2000 movne r0, #0 2c0: bd30 popne {r4, r5, pc} 2c2: ea5f 5c54 movs.w ip, r4, lsr #21 2c6: d105 bne.n 2d4 <__adddf3+0x230> 2c8: 0040 lsls r0, r0, #1 2ca: 4149 adcs r1, r1 2cc: bf28 it cs 2ce: f041 4100 orrcs.w r1, r1, #2147483648 ; 0x80000000 2d2: bd30 pop {r4, r5, pc} 2d4: f514 0480 adds.w r4, r4, #4194304 ; 0x400000 2d8: bf3c itt cc 2da: f501 1180 addcc.w r1, r1, #1048576 ; 0x100000 2de: bd30 popcc {r4, r5, pc} 2e0: f001 4500 and.w r5, r1, #2147483648 ; 0x80000000 2e4: f045 41fe orr.w r1, r5, #2130706432 ; 0x7f000000 2e8: f441 0170 orr.w r1, r1, #15728640 ; 0xf00000 2ec: f04f 0000 mov.w r0, #0 2f0: bd30 pop {r4, r5, pc} 2f2: ea7f 5c64 mvns.w ip, r4, asr #21 2f6: bf1a itte ne 2f8: 4619 movne r1, r3 2fa: 4610 movne r0, r2 2fc: ea7f 5c65 mvnseq.w ip, r5, asr #21 300: bf1c itt ne 302: 460b movne r3, r1 304: 4602 movne r2, r0 306: ea50 3401 orrs.w r4, r0, r1, lsl #12 30a: bf06 itte eq 30c: ea52 3503 orrseq.w r5, r2, r3, lsl #12 310: ea91 0f03 teqeq r1, r3 314: f441 2100 orrne.w r1, r1, #524288 ; 0x80000 318: bd30 pop {r4, r5, pc} __adddf3(): 31a: bf00 nop
360 BYTES
__adddf3():
i * 1.5
AVOID FLOATS WHERE POSSIBLE
(i * 3) / 2
GUESS THE SIZE
static void uses_int_formula(uint8_t i) { use_value((i * 3) / 2); }
app memory usage:=============Total footprint in RAM: 268 bytes / ~24kb
10 BYTESuses_int_formula(): a4: 2303 movs r3, #3 a6: 4358 muls r0, r3 a8: 0840 lsrs r0, r0, #1 aa: f7ff bff5 b.w 98 <use_value>
CASE STUDY
CASE STUDY: MANDELBROT FRACTALS
Implements recursive fractal algorithm provided atrosettacode.org/wiki/Mandelbrot_set
https://github.com/mhungerford/pebble-mandelbrot-generator
FLOATING POINT MATH
Not supported in hardware on some ARM processors,including the Cortex-M3 used by Pebble !
All floating point operations (*, /, +, -, sin, cos, tan) must be emulated in software
source: http://en.wikipedia.org/wiki/IEEE_754-1985
FIXED POINT MATH
In fixed point, we position the decimal point at a fixed position, and everything above it is an integer, and below it a decimal value (fraction). !
We lose half our range if we do a 50/50 fixed point.
source: http://wiki.nycresistor.com/wiki/GB101:Fixed_point_math
SLL-MATH (FIXED-POINT)
Using fixed-point sll-math (64-bit signed long long), we retain large range, as well as use some ARM assembly to provide high performance.
https://code.google.com/p/al-lib3d/source/browse/trunk/math-sll.h
SLL-MATHSOFTFLOAT
FLOATING POINT VS FIXED POINT
http://github.com/mhungerford/pebble-mandelbrot-generator
Floating Point (softfloat)
Fixed Point(sll-math)
Code Size 10k 2.7k
Seconds Per Frame (not FPS) 20 S 2 S
MATH WRAPPER
Typically when using fixed-point (in this case SLL-Math), operators can't be used, as C doesn't support operator overloading.
float rotation = 15.5f; val = rotation * 0.75f;
sll rotation = float2sll(15.5f); val = sllmul(rotation, float2sll(0.75f));
Floating Point Fixed Point
MATH WRAPPER
For softfloat to work, the compiler uses calls to the external softfloat library with names such as __aeabi__fmul(float, float) !
By using doubles as a container and providing our own versions of __aeabi_dmul, we can essentially make double = sll type and do direct operations, as we are “overloading” the functions in a manner.
double rotation = dbl(15.5f); val = rotation * dbl(0.75f);
Math wrapper
MATH WRAPPER
Fractal Tree demo uses *, /, +, -, sin, cos to randomly generate different trees every frame.
Floating Point (softfloat)
Fixed Point(sll-wrap)
Code Size 9k 2.7kFPS 11 18
https://github.com/mhungerford/pebble-tinymath
CASE STUDY
https://github.com/mhungerford/pebble-mandelbrot-generator https://github.com/mhungerford/pebble-tinymath
app memory usage:=============Total footprint in RAM: 9596 bytes / ~24kbFree RAM available (heap): 14980 bytes
> pebble build
.bss: count 2 size 8 src/compass.c: size 4 src/data_provider.c: size 4.data: count 3 size 50 Unknown: size 38 src/data_provider.c: size 12.text: count 92 size 11420 Unknown: size 4882 src/compass_calibration_window.c: size 2030 src/compass_window.c: size 1746 src/data_provider.c: size 1210 src/ticks_layer.c: size 1022
> pebble analyze-size
long parameter lists app_log and string literals inlining !
prefer functions over macros / macros over functions
> pebble analyze-size > arm-none-eabi-objdump -dslw build/pebble-app.elf
THINGS TO AVOID…
static void __attribute__((noinline)) some_func(void) {}
…AFTER MEASURING
HEAP
RA
MFl
ash
Pho
neStack
Heap
Code Static Variables
Resources
Persistent Storage
JavaScript or
Native Code
ALLOCATING HEAP MEMORY…
Measure available heap space !
heap_bytes_free(void) and heap_bytes_used(void)
SDK functions may require additional heap space !gpath_draw_filled(GContext* ctx, GPath *path)
Always check the results of !
malloc(size_t size) and *_create(…)
ALLOCATING HEAP MEMORY… AND KNOW THE DETAILS
Unload resources (fonts, bitmaps) where possible Know the life cycle
Fri 15:30 to 16:00Tonight at Meetup
Ron and GrégoireBest-practices & Dynamically load code from flash
LEARN EVEN MORE ABOUT IT
ALWAYS CHECK
RESULTS
malloc(), *_create()
STACK
RA
MFl
ash
Pho
neStack
Heap
Code Static Variables
Resources
Persistent Storage
JavaScript or
Native Code
DYNAMIC STACK ALLOCATION
Dynamic stack allocation is a fast and convenient approach to access up to 2k of additional memory.
typedef struct { GPoint inner; GPoint mid; GPoint outer;} CompassCalibrationWindowHelperPoint;CompassCalibrationWindowHelperPoint points[num_segments];
for (int s = 0; s < num_segments; s++) { ...
compass source: https://github.com/pebble-hacks/pebble-compass
def build(ctx): ctx.load('pebble_sdk') ctx.env.CFLAGS += ["-fstack-usage"] ctx.pbl_program(source=ctx.path.ant_glob('src/**/*.c'), target='pebble-app.elf') ctx.pbl_bundle(elf='pebble-app.elf', js=ctx.path.ant_glob('src/js/**/*.js'))
MEASURE STACK SPACE AT COMPILE TIME
project/wscript
stack.c:9:13:select_click_handler 0 static stack.c:13:13:up_click_handler 0 static stack.c:17:13:down_click_handler 0 static stack.c:21:13:click_config_provider 8 static stack.c:27:13:window_update_proc 16 static stack.c:32:13:layer_update_proc 16 static stack.c:37:13:window_load 32 static stack.c:51:13:window_unload 0 static stack.c:72:5:main 40 static
project/build/src/stack.c.7.su
[INFO ] D stack.c:76 stack 1: 0x200187c8[INFO ] D stack.c:29 stack window_update_proc: 0x20018720 (168)[INFO ] D stack.c:34 stack layer_update_proc: 0x200186e0 (232)
> pebble logs
MEASURE USED STACK SPACE AT RUNTIMEstatic uint32_t stack_initial; !static void window_update_proc(Layer* layer, GContext *ctx) { register uint32_t sp __asm__ ("sp"); APP_LOG(APP_LOG_LEVEL_DEBUG, "stack window_update_proc: %p (%d)", (void*)sp, (int)(stack_initial-sp)); } !static void layer_update_proc(Layer* layer, GContext *ctx) { register uint32_t sp __asm__ ("sp"); APP_LOG(APP_LOG_LEVEL_DEBUG, "stack layer_update_proc: %p (%d)", (void*)sp, (int)(stack_initial-sp)); } !int main(void) { register uint32_t sp __asm__ ("sp"); stack_initial = sp; APP_LOG(APP_LOG_LEVEL_DEBUG, "stack 1: %p", (void*)sp); init(); app_event_loop(); deinit();}
RESOURCES
RA
MFl
ash
Pho
neStack
Heap
Code Static Variables
Resources
Persistent Storage
JavaScript or
Native Code
CASE STUDY
CASE STUDY: XKCD COMIC VIEWER
http://github.com/mhungerford/xkcd_comic
Pebble SDK converts images in resources directory to PBI format internally. PBI is Pebble Bitmap Image 1 bit per pixel, word-aligned !
((160x168) / 8) / 1024 = 3.3k
CASE STUDY: XKCD COMIC VIEWER
http://github.com/mhungerford/xkcd_comic
Resources Storage 96k !
PBI (uncompressed bitmap) xkcd.pbi
Resource size: 3.3k (max 29 images) 3.3k in memory !
PNG (compressed bitmap) xkcd.PNG
Resource size: 0.9k (max 106 images) 3.3k in memory
PNG SUPPORT
http://github.com/mhungerford/png_demo
uPNG port (single API call)
GBitmap* gbitmap_create_with_png_resource( uint32_t resource_id); !
matches standard SDK call for GBitmap
GBitmap* gbitmap_create_with_resource( uint32_t resource_id);
CASE STUDY
http://github.com/mhungerford/xkcd_comic http://github.com/mhungerford/png_demo
EXTERNAL
RA
MFl
ash
Pho
neStack
Heap
Code Static Variables
Resources
Persistent Storage
JavaScript or
Native Code
EXTERNALIZING YOUR DATA
Persistent Storage App MessagesPebble MarsCanvas for Pebble
good source for encoding/loading images: https://github.com/pebble-hacks/pebble-faces
alternative to heap for (dynamic) short-term data
measure and optimize
always check return values
can be abused
can be compressed
“backend” to the watch
RA
MFl
ash
Pho
neStack
Heap
Code Static Variables
Resources
Persistent Storage
JavaScript or
Native Code
A: “yes, the slides will be published”
Q&A