las16-307: benchmarking schedutil in android
TRANSCRIPT
LAS16-307Benchmarking Schedutil in Android
Steve Muckle
ENGINEERS AND DEVICES
WORKING TOGETHER
Overview● Creating an Android Perf/Power Suite
● Schedutil Comparison Results
ENGINEERS AND DEVICES
WORKING TOGETHER
Overview● Creating an Android Perf/Power Suite
● Schedutil Comparison Results
ENGINEERS AND DEVICESWORKING TOGETHER
Why is Android Important to Test?● Maintainers want real world impact
○ Unit tests/synthetic workloads do not always guarantee this
● By far the dominant mobile OS
○ 66% in US, ~75% in top five EU markets, 77% in Chinahttp://venturebeat.com/2016/05/11/android-is-eating-apples-ios-market-share-everywhere/
○ 1.4 Billion active devices as of Sept 2015
http://www.androidcentral.com/google-says-there-are-now-14-billion-active-android-devices-worldwide
● Importance of perf/power dynamic in this space
ENGINEERS AND DEVICESWORKING TOGETHER
How is Android Usually Profiled?● Custom internal-only platforms
● Expensive test equipment
● Commercial-quality software stack○ Mature drivers, userspace
○ Tuned for product level power and performance
● Carefully chosen benchmarks○ Relevant workloads
○ Industry acceptance
ENGINEERS AND DEVICESWORKING TOGETHER
Advantages of Commercial Entities● Manpower
● $$$
● Access to internal-only platforms
● Connections○ to bloggers
○ to authors of benchmarks
■ Note: access to source, ability to influence benchmark (!)
○ to chip vendors/handset vendors
○ to Google
ENGINEERS AND DEVICESWORKING TOGETHER
Hikey vs. Internal Platform● Access to platform
○ hikey: full community access to hardware, firmware, schematics and sources
○ typical: no access
● software stack○ hikey: untuned community-supported
○ typical: product quality
● power domain accessibility○ hikey: total SOC power
○ typical: CPU power (individual rails)
● measurement tools○ hikey: do-it-yourself w/sense resistor
○ typical: professionally calibrated/supported test equipment, thermal chambers
ENGINEERS AND DEVICESWORKING TOGETHER
Creating a Test Suite● Three areas:
○ Performance
○ Ux
○ Power
ENGINEERS AND DEVICESWORKING TOGETHER
Performance● Composite system benchmark
○ harder to analyze
○ someone’s perception of the right mixture of tests
○ Licensing :(
○ AnTuTu to start
■ 3D, UX, CPU, RAM
● TODO: I/O, gaming benchmarks
ENGINEERS AND DEVICESWORKING TOGETHER
Ux - What is a Jank?● UI ideally runs at 60fps
Why 60fps: https://www.youtube.com/watch?v=CaMTIgxCSqU
● Dropped/delayed frames = Jank
● Something took too long
ENGINEERS AND DEVICESWORKING TOGETHER
Ux● Recentfling test
○ Part of Android Tree
○ Measures flinging back and forth across recent apps
● New tests based on recentfling○ Email fling
○ Browser fling
○ Gallery fling
ENGINEERS AND DEVICESWORKING TOGETHER
Power● Idle home screen
● Mp3 playback
● Mpeg4 video playback
● Todo:○ Email browsing
○ Web browsing
ENGINEERS AND DEVICESWORKING TOGETHER
Automation● Reduce tedium
● Help with repeatability○ Fixed duration power measurements (energy)
○ Capture the same background activity (hopefully)
○ Precise touches and flings
ENGINEERS AND DEVICESWORKING TOGETHER
Workload Automation Suite● Apache v2.0 licensed
● Created by ARM
● Written in python
● Supports many popular workloads out of the box*
● Very modular
● http://workload-automation.readthedocs.io/en/latest/
● https://git.linaro.org/people/steve.muckle/wa.git
ENGINEERS AND DEVICESWORKING TOGETHER
Workload Support● WA support often lags app updates
● New app version = ?
● Few changes to support non-gapps Android env
ENGINEERS AND DEVICESWORKING TOGETHER
Challenges in Testing Power● Be careful reworking your dev boards
ENGINEERS AND DEVICESWORKING TOGETHER
Challenges in Testing Power● Random background activity
● Temperature
● Tolerance in shunt resistor
● Aliasing
● Great presentation on all this by Andy Green at
http://www.slideshare.net/linaroorg/how-to-measuresocpower
ENGINEERS AND DEVICESWORKING TOGETHER
Issues● Target stability
○ MAILBOX_QUEUE_LEN on Hikey
● Framework stability○ intermittent failures (“adb returned early” and others)
● Test stability○ Antutu failing to start
○ Inconsistent flings
ENGINEERS AND DEVICES
WORKING TOGETHER
Overview● Creating an Android Perf/Power Suite
● Schedutil Comparison Results
ENGINEERS AND DEVICESWORKING TOGETHER
Two Test Builds● EAS not upstream yet
● Comparisons with/without EAS
ENGINEERS AND DEVICESWORKING TOGETHER
Test Build 1● Android 4.4 hikey kernel pre-EAS
● Cpufreq tip from Rafael Wysocki’s bleeding-edge branch
● Schedutil patch to use rt-avg for RT instead of fmax
● OPP dependencies
● Fixes to interactive gov for tip cpufreq
● AOSP master as August 22
ENGINEERS AND DEVICESWORKING TOGETHER
Tuning● Interactive
above_hispeed_delay: 20000
boostpulse_duration: 1000000
go_hispeed_load: 99
hispeed_freq: 1200000
io_is_busy: 1
min_sample_time: 80000
target_loads: 65 729000:75 960000:85
timer_rate: 20000
timer_slack: 20000
ENGINEERS AND DEVICESWORKING TOGETHER
Tuning● Ondemand
ignore_nice_load: 0
io_is_busy: 0
min_sampling_rate: 10000
powersave_bias: 0
sampling_down_factor: 1
sampling_rate: 20000
up_threshold: 95
ENGINEERS AND DEVICESWORKING TOGETHER
Tuning● schedutil
rate_limit_us: 20000
ENGINEERS AND DEVICESWORKING TOGETHER
AnTuTu Total (higher is better)
trial 1 trial 2 trial 3 avg stddev
ondemand 31328 31478 31026 31277 230
interactive 31796 30724 30405 30975 728
schedutil 30874 30512 30526 30637 205
Respectable result for schedutil.
ENGINEERS AND DEVICESWORKING TOGETHER
recentfling (lower is better)
% janky frames
trial 1 trial 2 trial 3 avg stddev
ondemand 18% 19% 18% 18.3% 0.57
interactive 11% 17% 14% 14% 3
schedutil 33% 45% 43% 40.3% 6.4
Nasty regression for schedutil. Needs further investigation. PELT? WALT?
ENGINEERS AND DEVICESWORKING TOGETHER
emailfling (lower is better)
% janky frames
trial 1 trial 2 trial 3 avg stddev
ondemand 1% 0% 0% 0.33% 0.57
interactive 4% 3% 2% 3% 1
schedutil 0% 0% 1% 0.33% 0.57
Results in the noise.
ENGINEERS AND DEVICESWORKING TOGETHER
browserfling (lower is better)
pct janky frames
trial 1 trial 2 trial 3 avg stddev
ondemand 0% 0% 0% 0% 0
interactive 8% 8% 7% 7.66% 0.57
schedutil 0% 0% 0% 0% 0
Good result for schedutil here...
ENGINEERS AND DEVICESWORKING TOGETHER
galleryfling (lower is better)
% janky frames
trial 1 trial 2 trial 3 avg stddev
ondemand 0% 0% 0% 0% 0
interactive 1% 1% 1% 1% 0
schedutil 1% 1% 1% 1% 0
Results in noise.
ENGINEERS AND DEVICESWORKING TOGETHER
Idle home screen energy (lower is better)
J trial 1 trial 2 trial 3 avg stddev
ondemand 12.6 12.935 12.955 12.83 0.20
interactive 13.734 13.717 13.71 13.72 0.01
schedutil 14.017 13.875 13.893 13.93 0.07
Schedutil is competitive.
ENGINEERS AND DEVICESWORKING TOGETHER
Mp3 playback energy (lower is better)
J trial 1 trial 2 trial 3 avg stddev
ondemand 22.888 23.127 23.004 23.01 0.12
interactive 23.871 23.928 23.467 23.76 0.25
schedutil 23.953 23.866 23.746 23.86 0.10
Schedutil is competitive.
ENGINEERS AND DEVICESWORKING TOGETHER
720p mpeg4 playback energy (lower is better)
J trial 1 trial 2 trial 3 avg stddev
ondemand 20.375 20.485 20.739 20.53 0.18
interactive 19.505 20.095 19.331 19.64 0.40
schedutil 21.103 20.879 20.766 20.92 0.17
Schedutil is competitive.
ENGINEERS AND DEVICESWORKING TOGETHER
Test Build 2● Android 4.4 hikey kernel + EAS 5.2 - schedfreq/schedtune + schedutil/cpufreq
from test build 1○ Thanks to Juri Lelli for providing this branch
● AOSP master as August 22
ENGINEERS AND DEVICESWORKING TOGETHER
Tuning● Interactive
above_hispeed_delay: 20000
boostpulse_duration: 80000
go_hispeed_load: 99
hispeed_freq: 1200000
io_is_busy: 0
min_sample_time: 80000
target_loads: 90
timer_rate: 20000
timer_slack: 80000
ENGINEERS AND DEVICESWORKING TOGETHER
Tuning● Ondemand
ignore_nice_load: 0
io_is_busy: 0
min_sampling_rate: 10000
powersave_bias: 0
sampling_down_factor: 1
sampling_rate: 20000
up_threshold: 95
ENGINEERS AND DEVICESWORKING TOGETHER
Tuning● schedutil
rate_limit_us: 20000
ENGINEERS AND DEVICESWORKING TOGETHER
AnTuTu Total (higher is better)
trial 1 trial 2 trial 3 avg stddev
ondemand 29959 30066 30108 30044 77
interactive 31050 31083 30863 30998 118
schedutil 29606 29279 28942 29275 332
schedutil-w 30364 30607 30447 30472 123
perf 31688 31413 31548 31549 137
WALT clearly helps schedutil on AnTuTu.
ENGINEERS AND DEVICESWORKING TOGETHER
recentfling (lower is better)
% janky frames
trial 1 trial 2 trial 3 avg stddev
ondemand 30% 30% 31% 30.3% 0.57
interactive 18% 17% 21% 18.6% 2.08
schedutil 39% 44% 42% 41.6% 2.5
schedutil-w 22% 32% 36% 30% 7.2
perf 23% 16% 14% 17.6% 4.7
WALT also helps with recentfling regression (not completely though).
ENGINEERS AND DEVICESWORKING TOGETHER
emailfling (lower is better)
% janky frames
trial 1 trial 2 trial 3 avg stddev
ondemand 0% 0% 0% 0% 0
interactive 0% 1% 1% 0.66% 0.57
schedutil 5% 4% 5% 4.67% 0.57
schedutil-w 6% 6% 4% 5.33% 1.15
perf 0% 0% 0% 0% 0
Respectable result for schedutil.
ENGINEERS AND DEVICESWORKING TOGETHER
browserfling (lower is better)
pct janky frames
trial 1 trial 2 trial 3 avg stddev
ondemand 0% 0% 0% 0% 0
interactive 2% 1% 1% 1.3% 0.57
schedutil 9% 7% 5% 7% 2
schedutil-w 3% 4% 8% 5% 2.64
perf 0% 0% 0% 0% 0
Possible small regression in browserfling.
ENGINEERS AND DEVICESWORKING TOGETHER
galleryfling (lower is better)
% janky frames
trial 1 trial 2 trial 3 avg stddev
ondemand 0% 0% 1% 0.33% 0.57
interactive 1% 1% 2% 1.33% 0.57
schedutil 20% 21% 27% 22.66% 3.79
schedutil-w 5% 6% 6% 5.66% 0.57
perf 4% 4% 4% 4% 0
Regression with schedutil.
ENGINEERS AND DEVICESWORKING TOGETHER
Idle home screen energy (lower is better)
J trial 1 trial 2 trial 3 avg stddev
ondemand 11.704 11.761 11.730 11.73 0.03
interactive 10.685 10.753 10.844 10.76 0.08
schedutil 10.47 11.554 10.613 10.87 0.59
schedutil-w 12.146 12.144 12.127 12.14 0.01
perf 12.999 13.187 12.822 13.00 0.18
Energy regression with WALT.
ENGINEERS AND DEVICESWORKING TOGETHER
Mp3 playback energy (lower is better)
J trial 1 trial 2 trial 3 avg stddev
ondemand 20.044 20.992 21.178 20.74 0.61
interactive 21.699 21.933 21.667 21.76 0.15
schedutil 18.458 18.446 18.471 18.46 0.01
schedutil-w 22.87 22.914 22.995 22.93 0.01
perf 22.539 23.445 22.869 22.951 0.46
Energy regression with WALT.
ENGINEERS AND DEVICESWORKING TOGETHER
720p mpeg4 playback energy (lower is better)
J trial 1 trial 2 trial 3 avg stddev
ondemand 19.932 19.443 19.578 19.651 0.25
interactive 19.34 19.408 19.295 19.347 0.06
schedutil 17.48 17.508 17.485 17.491 0.01
schedutil-w 21.313 21.028 21.218 21.186 0.15
perf 21.574 20.979 21.263 21.272 0.30
Energy regression with WALT.
ENGINEERS AND DEVICESWORKING TOGETHER
Next Steps● test framework
○ repeatability
○ WA stability
● deeper result analysis○ recentfling regression
○ possible galleryfling regression
○ energy regression with WALT
● incorporate schedtune
Thank You
#LAS16For further information: www.linaro.org
LAS16 keynotes and videos on: connect.linaro.org