plugin-based software design with ruby and rubygems

78
Plugin-based software design with Ruby and RubyGems Sadayuki Furuhashi Founder & Software Architect RubyKaigi 2015

Upload: sadayuki-furuhashi

Post on 09-Jan-2017

5.399 views

Category:

Software


3 download

TRANSCRIPT

Page 1: Plugin-based software design with Ruby and RubyGems

Plugin-based software design with Ruby and RubyGems

Sadayuki Furuhashi Founder & Software Architect

RubyKaigi 2015

Page 2: Plugin-based software design with Ruby and RubyGems

A little about me…

Sadayuki Furuhashigithub: @frsyuki

Fluentd - Unifid log collection infrastracture

Embulk - Plugin-based parallel ETL Founder & Software Architect

Page 3: Plugin-based software design with Ruby and RubyGems

It's like JSON. but fast and small.

A little about me…

Page 4: Plugin-based software design with Ruby and RubyGems

What’s Plugin Architecture?

Page 5: Plugin-based software design with Ruby and RubyGems
Page 6: Plugin-based software design with Ruby and RubyGems
Page 7: Plugin-based software design with Ruby and RubyGems
Page 8: Plugin-based software design with Ruby and RubyGems
Page 9: Plugin-based software design with Ruby and RubyGems
Page 10: Plugin-based software design with Ruby and RubyGems
Page 11: Plugin-based software design with Ruby and RubyGems

Benefits of Plugin Architecture> Plugins bring many features > Plugins keep core software simple > Plugins are easy to test > Plugins builds active developer community

Page 12: Plugin-based software design with Ruby and RubyGems

Benefits of Plugin Architecture> Plugins bring many features > Plugins keep core software simple > Plugins are easy to test > Plugins builds active developer community

> “…if it’s designed well”.

Page 13: Plugin-based software design with Ruby and RubyGems

plugin architecture?How to design

Page 14: Plugin-based software design with Ruby and RubyGems

plugin architecture?

How did I designHow to design

Page 15: Plugin-based software design with Ruby and RubyGems

Today’s topic> Plugin Architecture Design Patterns > Plugin Architecture of Fluentd > Plugin Architecture of Embulk > Pitfalls & Challenges

Page 16: Plugin-based software design with Ruby and RubyGems

Plugin ArchitectureDesign Patterns

Page 17: Plugin-based software design with Ruby and RubyGems

Plugin Architecture Design Patternsa) Traditional Extensible Software Architecture

b) Plugin-based Software Architecture

Page 18: Plugin-based software design with Ruby and RubyGems

Traditional Extensible Software Architecture

Host Application

Plugin

Plugin

Register plugins to extension points

To add more extensibility, add more extension points.

Page 19: Plugin-based software design with Ruby and RubyGems

Plugin-based software architecture

Core

Plugin

Plugin

Plugin Plugin Plugin

Plugin Plugin

Application

Page 20: Plugin-based software design with Ruby and RubyGems

Plugin-based software architecture• Application as a network of plugins.

> Plugins: provide features. > Core: framework to implement plugins.

• More flexibility != More complexity. • Application must be designed as modularized.

> It’s hard to design :( > Optimizing performance is difficult :(

• Loosely-coupled API often makes performance worse.

Page 21: Plugin-based software design with Ruby and RubyGems

Design Pattern 1: Dependency Injection

Core

class

interface

class interface interface

class class A component is an interface or a class.

Each component publishes API:

Page 22: Plugin-based software design with Ruby and RubyGems

Design Pattern 1: Dependency Injection

Core

class

Plugin

Plugin Plugin Plugin

class Plugin

When application runs:

A DI container replaces objects with plugins when application runs

Page 23: Plugin-based software design with Ruby and RubyGems

Replace classes with mocks for unit tests

Design Pattern 1: Dependency Injection

Core

dummy

dummy

dummy dummy dummy

Plugin dummy

Testing the application

Page 24: Plugin-based software design with Ruby and RubyGems

Dependency Injection (Java)public interface Store{ void store(String data);}

public class Module{ @Inject Module(Store store) { store.store(); }}

public class DummyStore implements Store{ void store(String data) { }}

public class MainModule implements Module{ public void configure( Binder binder) { binder.bind(Store.class) .to(DummyStore.class); }}

interface → implementationmapping

From source code, implementation is black box. It’s replaced at runtime.

Page 25: Plugin-based software design with Ruby and RubyGems

Dependency Injection (Ruby)

Ruby?(What’s a good way to use DI in Ruby?) (Please tell me if you know)

Page 26: Plugin-based software design with Ruby and RubyGems

Dependency Injection (Ruby)

class Module def initialize(store: DummyStore.new) store.store(”data”) endend

class DummyStore def store(data) endend

injector = Injector.new. bind(store: DBStore)object = injector.get(Module)

class DBStore def initialize(db: DBM.new) @db = db end

def store(data) @db.insert(data) endend

injector = Injector.new. bind(store: DBStore). bind(db: SqliteDBImpl)object = injector.get(Module)

I want to do this: Keyword arguments

{:keyword => class} mappingat runtime

Page 27: Plugin-based software design with Ruby and RubyGems

Design Pattern 2: Dynamic Plugin Loader

Core

Plugin Plugin

Calls Plugin loader to load plugins

Plugin Loader

Page 28: Plugin-based software design with Ruby and RubyGems

Design Pattern 2: Dynamic Plugin Loader

Core

Plugin Plugin

Plugins also call Plugin Loader. Plugins create an ecosystem.

Plugin Loader

Plugin Plugin

Page 29: Plugin-based software design with Ruby and RubyGems

Design Pattern 3: Combination

Core

class

Plugin

class Plugin Plugin

class class

Plugin Loader Plugin

Plugin Plugin

Plugin Plugin

Dependency Injection + Plugin Loader

Page 30: Plugin-based software design with Ruby and RubyGems

Plugin Architecture Design Patternsa) Traditional Extensible Software Architecture b) Plugin-based Software Architecture

> Dependency Injection (DI) > Dynamic Plugin Loader > Combination of those

There’re trade-offs > Choose the best solution for each project

Page 31: Plugin-based software design with Ruby and RubyGems

Plugin Architectureof Fluentd

Page 32: Plugin-based software design with Ruby and RubyGems

What’s Fluentd?> Data collector for unified logging layer

> Streaming data transfer based on JSON

> Written in C & Ruby > Plugin Marketplace on RubyGems

> http://www.fluentd.org/plugins > Working in production

> http://www.fluentd.org/testimonials

Page 33: Plugin-based software design with Ruby and RubyGems

Deployment of Fluentd

Page 34: Plugin-based software design with Ruby and RubyGems

Deployment of Fluentd

Page 35: Plugin-based software design with Ruby and RubyGems

The problems around log collection…

Page 36: Plugin-based software design with Ruby and RubyGems

Solution: N × M → N + Mplugins

Page 37: Plugin-based software design with Ruby and RubyGems

# logs from a file<source> type tail path /var/log/httpd.log pos_file /tmp/pos_file format apache2 tag web.access</source>

# logs from client libraries<source> type forward port 24224</source>

# store logs to ES and HDFS<match web.*> type copy <store> type elasticsearch logstash_format true </store> <store> type s3 bucket s3-event-archive </store></match>

<match metrics.*> type nagios host watch-server</match>

Page 38: Plugin-based software design with Ruby and RubyGems

Example: Simple forwarding

Page 39: Plugin-based software design with Ruby and RubyGems

Example: HA & High performance

- HA (fail over)- Load-balancing- Choice of at-most-once or at-least-once

Page 40: Plugin-based software design with Ruby and RubyGems

Example: Realtime search + Batch Analytics combo

All data

Hot data

Page 41: Plugin-based software design with Ruby and RubyGems

Fluentd Core

EventRouter

Input Plugin

Output Plugin

Filter Plugin

Buffer Plugin

Output Plugin

Input Plugin

Plugin Architecture of Fluentd

Plugin Loader

Page 42: Plugin-based software design with Ruby and RubyGems

Fluentd Core

EventRouter

Input Plugin

Output Plugin

Filter Plugin

Buffer Plugin

Output Plugin

Input Plugin

Plugin Marketplace using RubyGems.org

$ gem install fluent-plugin-s3Plugin

Loader

/gems/

RubyGems.org

Page 43: Plugin-based software design with Ruby and RubyGems
Page 44: Plugin-based software design with Ruby and RubyGems

Fluentd’s Plugin Architecture• Fluentd is a plugin-based event collector.

> Fluentd core: takes care of message routing between plugins.

> Plugins: do all other things! • 300+ plugins released on RubyGems.org • Fluentd loads plugins using Gem API.

Page 45: Plugin-based software design with Ruby and RubyGems

Plugin Architectureof Embulk

Page 46: Plugin-based software design with Ruby and RubyGems

Embulk: Open-source Bulk Data Loader written in Java & JRuby

Page 47: Plugin-based software design with Ruby and RubyGems

Amazon S3

MySQL

FTP

CSV Files

Access Logs

Salesforce.com

Elasticsearch

Cassandra

Hive

Redis

Reliable framework :-)

Parallel execution, transaction, auto guess, …and many by plugins.

Page 48: Plugin-based software design with Ruby and RubyGems

Demo

Page 49: Plugin-based software design with Ruby and RubyGems

Use case 1: Sync MySQL to Elasticsearch

embulk-input-mysql

embulk-filter-kuromoji

embulk-output-elasticsearch

MySQL

kuromoji

Elasticsearch

Page 50: Plugin-based software design with Ruby and RubyGems

Use case 2: Load from S3 to Analytics

embulk-parser-csv

embulk-decoder-gzip

embulk-input-s3

csv.gz on S3

Treasure Data BigQuery Redshift

+

+embulk-output-td embulk-output-bigquery embulk-output-redshift

embulk-executor-mapreduce

Page 51: Plugin-based software design with Ruby and RubyGems

Use case 3: Embulk as a Service at Treasure Data

Page 52: Plugin-based software design with Ruby and RubyGems

Use case 3: Embulk as a Service at Treasure Data

REST API to load/export data to/from Treasure Data

Page 53: Plugin-based software design with Ruby and RubyGems

Input Output

Embulk’s Plugin Architecture

Embulk Core

Executor Plugin

Filter Filter

Guess

Page 54: Plugin-based software design with Ruby and RubyGems

Output

Embulk’s Plugin Architecture

Embulk Core

Executor Plugin

Filter Filter

GuessFileInput

Parser

Decoder

Page 55: Plugin-based software design with Ruby and RubyGems

Guess

Embulk’s Plugin Architecture

Embulk Core

FileInput

Executor Plugin

Parser

Decoder

FileOutput

Formatter

Encoder

Filter Filter

Page 56: Plugin-based software design with Ruby and RubyGems

Embulk’s Plugin Architecture

Embulk Core

PluginManager

Executor Plugin

InjectedPluginSource

ParserPlugin

JRubyPluginLoader

FormatterPlugin

JRuby Plugin Loader Plugin

FilterPlugin

OutputPluginInputPlugin

JRuby RuntimeJava Runtime

Page 57: Plugin-based software design with Ruby and RubyGems

Plugin Marketplace using RubyGems.org

Embulk Core

PluginManager

Executor Plugin

InjectedPluginSource

ParserPlugin FormatterPluginFilterPlugin

OutputPluginInputPlugin

JRuby RuntimeJava Runtime

$ embulk gem install embulk-input-oracle

/gems/

RubyGems.org

JRubyPluginLoader

JRuby Plugin Loader Plugin

Page 58: Plugin-based software design with Ruby and RubyGems

Plugin Package Structureembulk-input-s3.gem+- build.gradle|+- src/main/java/org/embulk/input/s3| \- S3FileInputPlugin.java| AwsCredentials.java|+- classpath/| \- embulk-input-s3-0.2.6.jar| aws-java-sdk-s3-1.10.33.jar| httpclient-4.3.6.jar|+- lib/embulk/input/ \- s3.rb

Java source files

Compiled jar file

All dependent jar files

Ruby script toload the jar files

Page 59: Plugin-based software design with Ruby and RubyGems

Embulk Plugin Load Sequence

Bundler.setup_environmentEmbulk::Runner = Embulk::Runner.new( .embulk.EmbulkEmbed::Bootstrap.new.initialize)Embulk::Runner.run(ARGV)

Java

JRuby

Java

org.embulk.cli.Main.main(String[] args) { org.jruby.Main.main( "embulk.jar!/embulk/command/embulk_bundle.rb", args);}

org.embulk.exec.BulkLoader.run(…)

org.embulk.plugin.PluginManager.newPlugin(…)

Page 60: Plugin-based software design with Ruby and RubyGems

{ jruby = org.jruby.embed.ScriptingContainer()

rubyObj = jruby.runScriptlet("Embulk::Plugin") jruby.callMethod(rubyObj, "new_java_input", "s3")}

Embulk Plugin Load Sequence

def new_java_input(type) rubyPluginClass = lookup(:input, type) return rubyPluginClass.new_javaend

Java

JRuby

org.embulk.plugin.PluginManager.newPlugin(…)

Page 61: Plugin-based software design with Ruby and RubyGems

Embulk Plugin Load Sequence

def new_java jars = Dir["classpath/**/*.jar"] factory = org.embulk.embulk.plugin.PluginClassLoaderFactory.new classloader = factory.create(jars) return classloader.loadClass("org.embulk.input.s3.S3InputPlugin")end

Java

JRuby

PluginClassLoaderFactory.create(URL[] jarPaths) { return new PluginClassLoader(jarPaths); }

Page 62: Plugin-based software design with Ruby and RubyGems

Embulk• Embulk is a plugin-based parallel bulk data loader.

• Guess plugins suggest you what plugins are necessary, and how to configure the plugins.

• Executor plugins run plugins in parallel. • Embulk core takes care of message passing

between plugins. • Embulk loads plugins using JRuby and Gem API.

Page 63: Plugin-based software design with Ruby and RubyGems

./embulk.jar

$ ./embulk.jar guess example.yml

executable jar!

Page 64: Plugin-based software design with Ruby and RubyGems

Header of embulk.jar

: <<BAT@echo offsetlocalset this=%~f0set java_args=

rem ...

java %java_args% -jar %this% %args%exit /b %ERRORLEVEL%BAT

# ...

exec java $java_args -jar "$0" "$@"exit 127

PK...

Page 65: Plugin-based software design with Ruby and RubyGems

embulk.jar is a shell script

: <<BAT@echo offsetlocalset this=%~f0set java_args=

rem ...

java %java_args% -jar %this% %args%exit /b %ERRORLEVEL%BAT

# ...

exec java $java_args -jar "$0" "$@"exit 127

PK...

argument of “:” command (heredoc). “:” is a command that does nothing.

#!/bin/sh is optional. Empty first line means a shell script.

java -jar $0

shell script exits here (following data is ignored)

Page 66: Plugin-based software design with Ruby and RubyGems

embulk.jar is a bat file

: <<BAT@echo offsetlocalset this=%~f0set java_args=

rem ...

java %java_args% -jar %this% %args%exit /b %ERRORLEVEL%BAT

# ...

exec java $java_args -jar "$0" "$@"exit 127

PK...

.bat exits here (following lines are ignored)

“:” means a comment-line

Page 67: Plugin-based software design with Ruby and RubyGems

embulk.jar is a jar file

: <<BAT@echo offsetlocalset this=%~f0set java_args=

rem ...

java %java_args% -jar %this% %args%exit /b %ERRORLEVEL%BAT

# ...

exec java $java_args -jar "$0" "$@"exit 127

PK...

jar (zip) format ignores headers (file entries are in footer)

Page 68: Plugin-based software design with Ruby and RubyGems

Pitfalls & Challenges

Page 69: Plugin-based software design with Ruby and RubyGems

Pitfalls & Challenges• Plugin version conflicts • Performance impact due to loosely-coupled API

Page 70: Plugin-based software design with Ruby and RubyGems

Plugin Version Conflicts

Embulk Core

Java Runtime

aws-sdk.jar v1.9

embulk-input-s3.jar

Version conflicts!

aws-sdk.jar v1.10

embulk-output-redshift.jar

Page 71: Plugin-based software design with Ruby and RubyGems

Multiple Classloaders in JVM

Embulk Core

Java Runtime

aws-sdk.jar v1.9

embulk-input-s3.jar

Isolated environments

aws-sdk.jar v1.10

embulk-output-redshift.jar

Class Loader 1

Class Loader 2

Page 72: Plugin-based software design with Ruby and RubyGems

Version conflicts in a JRuby Runtime

Embulk Core

Java Runtime

httpclient 2.5.0

embulk-input-sfdc.gem

Version conflicts!

httpclient v2.6.0

embulk-input-marketo.gem

JRuby Runtime

Page 73: Plugin-based software design with Ruby and RubyGems

Java Runtime

Multiple JRuby Runtime?

Fluentd Core

activerecord ~> 3.4

fluentd-plugin-sql.gem

Isolated environments?

activerecord ~> 4.2

fluent-plugin-presto.gem ?

Sub VM 1?

Sub VM 2?

Page 74: Plugin-based software design with Ruby and RubyGems

Version conflicts in Fluentd

Fluentd Core

CRuby Runtime

activerecord ~> 3.4

fluentd-plugin-sql.gem

Version conflicts!

activerecord ~> 4.2

fluent-plugin-presto.gem ?

Page 75: Plugin-based software design with Ruby and RubyGems

Challenges• Version conflict is not completely solved.

• Java can use multiple ClassLoader • I haven’t figured out hot to do the same thing in

Ruby • I don’t have clear ideas to solve performance impact

• Write more code to learn?

Page 76: Plugin-based software design with Ruby and RubyGems

Wrapping Up

Page 77: Plugin-based software design with Ruby and RubyGems

“How did I build Plugin Architecture?”• I built Fluentd using dynamic plugin loader.

• “Plugin calls Plugins” • Most of features are provided by the ecosystem of plugins.

• I built Embulk using combination of: • Dependency Injection, • JRuby to implement a Dynamic Plugin Loader, • Java VM and nested ClassLoaders to load multiple versions

of plugins. • But some problems are not solved yet:

• Version conflicts in a Ruby VM. • Design patterns of plugins AND high performance.

Page 78: Plugin-based software design with Ruby and RubyGems

What’s Next?• You build plugin-based software architecture!

• And you’ll talk to me how you did :-) • I’m working on another project: a distributed

workflow engine • Java VM + Python

Thank You!Sadayuki Furuhashi

Founder & Software Architect