optcarrot: a pure-ruby nes emulator

34

Upload: mametter

Post on 16-Apr-2017

6.503 views

Category:

Entertainment & Humor


1 download

TRANSCRIPT

Page 1: Optcarrot: A Pure-Ruby NES Emulator
Page 2: Optcarrot: A Pure-Ruby NES Emulator

• A NES Emulator written in Ruby

Demo

2

Page 3: Optcarrot: A Pure-Ruby NES Emulator

• To drive “Ruby3x3”

– Matz said “Ruby 3 will be 3 times faster than Ruby 2.0”

– Optcarrot is a CPU-intensive, real-life benchmark

• Currently works at 20 fps in Ruby 2.0 60 fps in 3.0!

• A carrot to let horses (Ruby committers) optimize Ruby

• To challenge Ruby’s limit

– NES video resolution: 256 x 240 pixels / 60 fps

– We need to do all other tasks in 0.8 sec.? Impossible?

(256*240*60).times do |i|ary[0] = 0

end0.2 sec.

3

Page 4: Optcarrot: A Pure-Ruby NES Emulator

• Famicom programming with Ruby

(takkaw, 2007)

– Presentation NES ROM by Ruby

• MRI's incremental GC

(authornari, 2008)

– Mario-like game "Nario" is used

to demonstrate the real-time GC

• Burn (remore, 2014)

– A framework to create NES ROM

in Ruby

4

Page 5: Optcarrot: A Pure-Ruby NES Emulator

• NES architecture in three minutes

• How I achieved 20 fps

• Ruby interpreters’ benchmark

• Towards 60 fps

• Speaker's award & Conclusion

5

Page 6: Optcarrot: A Pure-Ruby NES Emulator

• The details of NES architecture

– In short: “See http://wiki.nesdev.com/ !”

• How to find the bottleneck

– In short: “Use stackprof!”

6

川崎Ruby会議01

(2016/08/20)

• I’ll talk these topics at

“Kawasaki Ruby Kaigi 01”

Page 7: Optcarrot: A Pure-Ruby NES Emulator

• NES Architecture in three minutes

• How I achieved 20 fps

• Ruby interpreters’ benchmark

• Towards 60 fps

• Speaker's award & Conclusion

7

Page 8: Optcarrot: A Pure-Ruby NES Emulator

CPU GPU

Program ROM Bitmap ROM

Cartridge

NES

RAM(2 kB)

VRAM(2 kB)

control

read

read/write

read

render

read/write

To be precise: GPU is called as “PPU” (Picture Processing Unit) in NES

interrupt

8

Page 9: Optcarrot: A Pure-Ruby NES Emulator

GPU80%

CPU10%

others10%

Execution time ratio

• Why does GPU emulation

take so much?

– GPU runs at higher

clock speed than CPU

• GPU: 5.3 MHz

• CPU: 1.8 MHz

– GPU does many

complex tasks

• Background rendering

• Sprite rendering

• Scrolling

• Conflict detection

• Interrupts

9

Page 10: Optcarrot: A Pure-Ruby NES Emulator

• Per-pixel tasks (i.e. 256 x 240 x 60 = 3.7M times per second)

1. Identify what bitmap is shown here

2. Read attribute data (color, flip flag, z-index)

3. Read bitmap data from the ROM

4. Assemble them into video signal

Background map

Attribute map

VRAM

GPU2

1

3

4

Target

pixel

To be precise: These tasks are actually done per eight pixels10

Bitmap ROM

Cartridge

Page 11: Optcarrot: A Pure-Ruby NES Emulator

• Terribly complex

http://wiki.nesdev.com/w/index.php/File:Ntsc_timing.png

11

Page 12: Optcarrot: A Pure-Ruby NES Emulator

• NES Architecture in three minutes

• How I achieved 20 fps

– How to emulate CPU-GPU parallelism

– How to optimize GPU emulation

• Ruby interpreters’ benchmark

• Towards 60 fps

• Speaker's award & Conclusion

12

Page 13: Optcarrot: A Pure-Ruby NES Emulator

• Naïve approach: emulate CPU & GPU per clock

1. Run the CPU for one clock

2. Run the GPU for three clocks

3. Repeat 1 and 2

– Simple and accurate

– Very slow (~ 3 fps) because of too many method calls

CPU step

step

step

step

step

step

step

step

step

step

step

step

step

step

step

step

clock

GPU

13

Page 14: Optcarrot: A Pure-Ruby NES Emulator

• “Catch-up” method: emulate CPU&GPU per control

1. Run the CPU until it tries to control the GPU

2. Run the GPU until it catch up with the CPU

3. Repeat 1 and 2

– Accurate and fast (~ 10 fps)

CPU run

catchup

run

catchup

run

clock

GPU CPU attempts to

control GPU

14

Page 15: Optcarrot: A Pure-Ruby NES Emulator

• Naïve approach: per-pixel emulation

– Just as like the actual hardware

Bitmap ROM

Background map

Attribute map

VRAM

GPU2

1

3

4

This calculation is done for each iteration Slow!

15

Cartridge

Page 16: Optcarrot: A Pure-Ruby NES Emulator

• Pre-render the screen and update it on demand

Background map

Attribute map

VRAM

GPU

screen buffer

When VRAM is

modified by CPU,

Only invalidated pixels

is updated

Transported to TV

per frame

This explanation is over exaggerated!

Actually, the GPU emulation loop is not removed completely.16

Bitmap ROM

Cartridge

Page 17: Optcarrot: A Pure-Ruby NES Emulator

• Intel® Core™ i7-4500U @ 2.40 GHz

• Ubuntu 16.04

17

Page 18: Optcarrot: A Pure-Ruby NES Emulator

• NES Architecture in three minutes

• How I achieved 20 fps

• Ruby interpreters’ benchmark

• Towards 60 fps

• Speaker's award & Conclusion

18

Page 19: Optcarrot: A Pure-Ruby NES Emulator

• Is not so big: <5000 lines of code

– cf. redmine: >30000 LOC

• Requires no library (in no-GUI mode)

– It works on miniruby

– ruby-ffi is used for GUI (SDL2)

• Uses only basic Ruby features

– It works on ruby 1.8 / mruby / topaz / opal(with shim and/or systematic modification of source code)

19

Page 20: Optcarrot: A Pure-Ruby NES Emulator

28.7

28.1

25.5

26.6

25.0

21.4

5.83

21.9

39.2

25.0

4.10

7.48

27.0

0.0287

0.0 10.0 20.0 30.0 40.0

trunk

ruby23

ruby22

ruby21

ruby20

ruby193

ruby187

omrpreview

jruby9k

jruby17

rubinius

mruby

topaz

opal

20

MRI has been improved

(1.81.92.02.3)

OMR preview isn’t fast?

(MRI 2.2 w/ JIT)

JRuby9k is the fastest

ruby 2.0 achives >20 fps

(important for Ruby3x3)

Optcarrot works on

subset Ruby impls.

Page 21: Optcarrot: A Pure-Ruby NES Emulator

• JRuby 9k is the fastest:

“Deoptimization” looks a promising approach

– At first, an optimized byte-code is generated with

ignoring rare/pathological cases

– When needed, it is discarded and a naïve byte-code is

regenerated– BTW: JRuby‘s boot time is too bad

• OMR is not so fast?

– JIT has no advantage?

• Method calls and built-in methods may be still bottleneck

– OMR seems not to support opt_case_dispatch yet

• i.e., a case statement is not optimized well?21

Page 22: Optcarrot: A Pure-Ruby NES Emulator

• NES Architecture in three minutes

• How I achieved 20 fps

• Ruby interpreters’ benchmark

• Towards 60 fps

• Speaker's award & Conclusion

22

Page 23: Optcarrot: A Pure-Ruby NES Emulator

• We have kept the code reasonably clean so far

• Now, we use any means to achieve the speed

• CAUTION: Casual Ruby programmers MUST NOT

use the following ProTips™

– This is an experiment to study how to improve Ruby

implementation

23

Page 24: Optcarrot: A Pure-Ruby NES Emulator

• Method call is slow

– Replace it with its method definition

while catchup?inc_addr

end

while catchup?@addr += 1

end

28 fps 40 fps24

Page 25: Optcarrot: A Pure-Ruby NES Emulator

• Instance variable access is slow

– Replace it with local variable

– Note: the variable must not be used out of this method

while catchup?@addr += 1

end

beginaddr = @addrwhile catchup?addr += 1

endensure@addr = addr

end

40 fps 47 fps25

Page 26: Optcarrot: A Pure-Ruby NES Emulator

• Batch multiple frequent

actions across some clocks

™ while catchup?if can_be_fast?

# fast-pathdo_Ado_Bdo_C@clock += 3

elsecase @clockwhen 1 then do_Awhen 2 then do_Bwhen 3 then do_C...end@clock += 1

endend

while catchup?case @clockwhen 1 then do_Awhen 2 then do_Bwhen 3 then do_C...end@clock += 1

end

47 fps 63 fps26

Page 27: Optcarrot: A Pure-Ruby NES Emulator

29.4

40.3

46.6

62.7

68.8

83.2

0.0 20.0 40.0 60.0 80.0

base

method inlining

ivar localization

fastpath

misc

CPU misc

ProTip™ 1

ProTip™ 2

ProTip™ 3

27

Page 28: Optcarrot: A Pure-Ruby NES Emulator

• Used Regexp to systematically rewrite the code

– instead of hand-rewriting

• Used Welch’s t-test to confirm each optimization

src = File.read(__FILE__)src.gsub!(/.../) { ... } # method inlining

src.gsub!(/.../) { ... } # ivar localization

eval(src)

28

Page 29: Optcarrot: A Pure-Ruby NES Emulator

29

Page 30: Optcarrot: A Pure-Ruby NES Emulator

28.6

28.0

25.2

26.9

26.1

21.4

5.87

22.8

39.3

25.3

3.97

7.02

29.3

0.0285

84.0

82.9

78.2

79.6

68.1

64.0

1.46

69.0

2.12

6.13

2.43

0.754

0.0501

0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0 90.0

trunk

ruby23

ruby22

ruby21

ruby20

ruby193

ruby187

omrpreview

jruby9k

jruby17

rubinius

mruby

topaz

opal

default mode optimized mode

The generated program is

too large to fit

JVM 64k bytecode limit

30

Page 31: Optcarrot: A Pure-Ruby NES Emulator

• NES Architecture in three minutes

• How I achieved 20 fps

• Ruby interpreters’ benchmark

• Towards 60 fps

• Speaker's award & Conclusion

31

Page 32: Optcarrot: A Pure-Ruby NES Emulator

• The first person who

improved MRI performance

by using Optcarrot

– Instance variable access has

been improved about 10%

[Bug #12274]

• Optcarrot has already

started to improve Ruby!

32

Page 33: Optcarrot: A Pure-Ruby NES Emulator

• Optcarrot, a pure-Ruby NES emulator

– Non-trivial benchmark for Ruby implementations

• Wide-range Ruby implementation benchmark

– AFAIK, this is the first real-life benchmark to compare

MRI / Jruby / Rubinius / mruby / topaz / opal

• ProTips™ for boosting a Ruby program

– Need to improve method calls and instance variables

instead of JIT?

• More details?

33

川崎Ruby会議01

(2016/08/20)

Page 34: Optcarrot: A Pure-Ruby NES Emulator

34

¥2,680 + tax ¥5,440 + tax