plugin-based software design with ruby and rubygems

Post on 09-Jan-2017

5.399 Views

Category:

Software

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Plugin-based software design with Ruby and RubyGems

Sadayuki Furuhashi Founder & Software Architect

RubyKaigi 2015

A little about me…

Sadayuki Furuhashigithub: @frsyuki

Fluentd - Unifid log collection infrastracture

Embulk - Plugin-based parallel ETL Founder & Software Architect

It's like JSON. but fast and small.

A little about me…

What’s Plugin Architecture?

Benefits of Plugin Architecture> Plugins bring many features > Plugins keep core software simple > Plugins are easy to test > Plugins builds active developer community

Benefits of Plugin Architecture> Plugins bring many features > Plugins keep core software simple > Plugins are easy to test > Plugins builds active developer community

> “…if it’s designed well”.

plugin architecture?How to design

plugin architecture?

How did I designHow to design

Today’s topic> Plugin Architecture Design Patterns > Plugin Architecture of Fluentd > Plugin Architecture of Embulk > Pitfalls & Challenges

Plugin ArchitectureDesign Patterns

Plugin Architecture Design Patternsa) Traditional Extensible Software Architecture

b) Plugin-based Software Architecture

Traditional Extensible Software Architecture

Host Application

Plugin

Plugin

Register plugins to extension points

To add more extensibility, add more extension points.

Plugin-based software architecture

Core

Plugin

Plugin

Plugin Plugin Plugin

Plugin Plugin

Application

Plugin-based software architecture• Application as a network of plugins.

> Plugins: provide features. > Core: framework to implement plugins.

• More flexibility != More complexity. • Application must be designed as modularized.

> It’s hard to design :( > Optimizing performance is difficult :(

• Loosely-coupled API often makes performance worse.

Design Pattern 1: Dependency Injection

Core

class

interface

class interface interface

class class A component is an interface or a class.

Each component publishes API:

Design Pattern 1: Dependency Injection

Core

class

Plugin

Plugin Plugin Plugin

class Plugin

When application runs:

A DI container replaces objects with plugins when application runs

Replace classes with mocks for unit tests

Design Pattern 1: Dependency Injection

Core

dummy

dummy

dummy dummy dummy

Plugin dummy

Testing the application

Dependency Injection (Java)public interface Store{ void store(String data);}

public class Module{ @Inject Module(Store store) { store.store(); }}

public class DummyStore implements Store{ void store(String data) { }}

public class MainModule implements Module{ public void configure( Binder binder) { binder.bind(Store.class) .to(DummyStore.class); }}

interface → implementationmapping

From source code, implementation is black box. It’s replaced at runtime.

Dependency Injection (Ruby)

Ruby?(What’s a good way to use DI in Ruby?) (Please tell me if you know)

Dependency Injection (Ruby)

class Module def initialize(store: DummyStore.new) store.store(”data”) endend

class DummyStore def store(data) endend

injector = Injector.new. bind(store: DBStore)object = injector.get(Module)

class DBStore def initialize(db: DBM.new) @db = db end

def store(data) @db.insert(data) endend

injector = Injector.new. bind(store: DBStore). bind(db: SqliteDBImpl)object = injector.get(Module)

I want to do this: Keyword arguments

{:keyword => class} mappingat runtime

Design Pattern 2: Dynamic Plugin Loader

Core

Plugin Plugin

Calls Plugin loader to load plugins

Plugin Loader

Design Pattern 2: Dynamic Plugin Loader

Core

Plugin Plugin

Plugins also call Plugin Loader. Plugins create an ecosystem.

Plugin Loader

Plugin Plugin

Design Pattern 3: Combination

Core

class

Plugin

class Plugin Plugin

class class

Plugin Loader Plugin

Plugin Plugin

Plugin Plugin

Dependency Injection + Plugin Loader

Plugin Architecture Design Patternsa) Traditional Extensible Software Architecture b) Plugin-based Software Architecture

> Dependency Injection (DI) > Dynamic Plugin Loader > Combination of those

There’re trade-offs > Choose the best solution for each project

Plugin Architectureof Fluentd

What’s Fluentd?> Data collector for unified logging layer

> Streaming data transfer based on JSON

> Written in C & Ruby > Plugin Marketplace on RubyGems

> http://www.fluentd.org/plugins > Working in production

> http://www.fluentd.org/testimonials

Deployment of Fluentd

Deployment of Fluentd

The problems around log collection…

Solution: N × M → N + Mplugins

# logs from a file<source> type tail path /var/log/httpd.log pos_file /tmp/pos_file format apache2 tag web.access</source>

# logs from client libraries<source> type forward port 24224</source>

# store logs to ES and HDFS<match web.*> type copy <store> type elasticsearch logstash_format true </store> <store> type s3 bucket s3-event-archive </store></match>

<match metrics.*> type nagios host watch-server</match>

Example: Simple forwarding

Example: HA & High performance

- HA (fail over)- Load-balancing- Choice of at-most-once or at-least-once

Example: Realtime search + Batch Analytics combo

All data

Hot data

Fluentd Core

EventRouter

Input Plugin

Output Plugin

Filter Plugin

Buffer Plugin

Output Plugin

Input Plugin

Plugin Architecture of Fluentd

Plugin Loader

Fluentd Core

EventRouter

Input Plugin

Output Plugin

Filter Plugin

Buffer Plugin

Output Plugin

Input Plugin

Plugin Marketplace using RubyGems.org

$ gem install fluent-plugin-s3Plugin

Loader

/gems/

RubyGems.org

Fluentd’s Plugin Architecture• Fluentd is a plugin-based event collector.

> Fluentd core: takes care of message routing between plugins.

> Plugins: do all other things! • 300+ plugins released on RubyGems.org • Fluentd loads plugins using Gem API.

Plugin Architectureof Embulk

Embulk: Open-source Bulk Data Loader written in Java & JRuby

Amazon S3

MySQL

FTP

CSV Files

Access Logs

Salesforce.com

Elasticsearch

Cassandra

Hive

Redis

Reliable framework :-)

Parallel execution, transaction, auto guess, …and many by plugins.

Demo

Use case 1: Sync MySQL to Elasticsearch

embulk-input-mysql

embulk-filter-kuromoji

embulk-output-elasticsearch

MySQL

kuromoji

Elasticsearch

Use case 2: Load from S3 to Analytics

embulk-parser-csv

embulk-decoder-gzip

embulk-input-s3

csv.gz on S3

Treasure Data BigQuery Redshift

+

+embulk-output-td embulk-output-bigquery embulk-output-redshift

embulk-executor-mapreduce

Use case 3: Embulk as a Service at Treasure Data

Use case 3: Embulk as a Service at Treasure Data

REST API to load/export data to/from Treasure Data

Input Output

Embulk’s Plugin Architecture

Embulk Core

Executor Plugin

Filter Filter

Guess

Output

Embulk’s Plugin Architecture

Embulk Core

Executor Plugin

Filter Filter

GuessFileInput

Parser

Decoder

Guess

Embulk’s Plugin Architecture

Embulk Core

FileInput

Executor Plugin

Parser

Decoder

FileOutput

Formatter

Encoder

Filter Filter

Embulk’s Plugin Architecture

Embulk Core

PluginManager

Executor Plugin

InjectedPluginSource

ParserPlugin

JRubyPluginLoader

FormatterPlugin

JRuby Plugin Loader Plugin

FilterPlugin

OutputPluginInputPlugin

JRuby RuntimeJava Runtime

Plugin Marketplace using RubyGems.org

Embulk Core

PluginManager

Executor Plugin

InjectedPluginSource

ParserPlugin FormatterPluginFilterPlugin

OutputPluginInputPlugin

JRuby RuntimeJava Runtime

$ embulk gem install embulk-input-oracle

/gems/

RubyGems.org

JRubyPluginLoader

JRuby Plugin Loader Plugin

Plugin Package Structureembulk-input-s3.gem+- build.gradle|+- src/main/java/org/embulk/input/s3| \- S3FileInputPlugin.java| AwsCredentials.java|+- classpath/| \- embulk-input-s3-0.2.6.jar| aws-java-sdk-s3-1.10.33.jar| httpclient-4.3.6.jar|+- lib/embulk/input/ \- s3.rb

Java source files

Compiled jar file

All dependent jar files

Ruby script toload the jar files

Embulk Plugin Load Sequence

Bundler.setup_environmentEmbulk::Runner = Embulk::Runner.new( .embulk.EmbulkEmbed::Bootstrap.new.initialize)Embulk::Runner.run(ARGV)

Java

JRuby

Java

org.embulk.cli.Main.main(String[] args) { org.jruby.Main.main( "embulk.jar!/embulk/command/embulk_bundle.rb", args);}

org.embulk.exec.BulkLoader.run(…)

org.embulk.plugin.PluginManager.newPlugin(…)

{ jruby = org.jruby.embed.ScriptingContainer()

rubyObj = jruby.runScriptlet("Embulk::Plugin") jruby.callMethod(rubyObj, "new_java_input", "s3")}

Embulk Plugin Load Sequence

def new_java_input(type) rubyPluginClass = lookup(:input, type) return rubyPluginClass.new_javaend

Java

JRuby

org.embulk.plugin.PluginManager.newPlugin(…)

Embulk Plugin Load Sequence

def new_java jars = Dir["classpath/**/*.jar"] factory = org.embulk.embulk.plugin.PluginClassLoaderFactory.new classloader = factory.create(jars) return classloader.loadClass("org.embulk.input.s3.S3InputPlugin")end

Java

JRuby

PluginClassLoaderFactory.create(URL[] jarPaths) { return new PluginClassLoader(jarPaths); }

Embulk• Embulk is a plugin-based parallel bulk data loader.

• Guess plugins suggest you what plugins are necessary, and how to configure the plugins.

• Executor plugins run plugins in parallel. • Embulk core takes care of message passing

between plugins. • Embulk loads plugins using JRuby and Gem API.

./embulk.jar

$ ./embulk.jar guess example.yml

executable jar!

Header of embulk.jar

: <<BAT@echo offsetlocalset this=%~f0set java_args=

rem ...

java %java_args% -jar %this% %args%exit /b %ERRORLEVEL%BAT

# ...

exec java $java_args -jar "$0" "$@"exit 127

PK...

embulk.jar is a shell script

: <<BAT@echo offsetlocalset this=%~f0set java_args=

rem ...

java %java_args% -jar %this% %args%exit /b %ERRORLEVEL%BAT

# ...

exec java $java_args -jar "$0" "$@"exit 127

PK...

argument of “:” command (heredoc). “:” is a command that does nothing.

#!/bin/sh is optional. Empty first line means a shell script.

java -jar $0

shell script exits here (following data is ignored)

embulk.jar is a bat file

: <<BAT@echo offsetlocalset this=%~f0set java_args=

rem ...

java %java_args% -jar %this% %args%exit /b %ERRORLEVEL%BAT

# ...

exec java $java_args -jar "$0" "$@"exit 127

PK...

.bat exits here (following lines are ignored)

“:” means a comment-line

embulk.jar is a jar file

: <<BAT@echo offsetlocalset this=%~f0set java_args=

rem ...

java %java_args% -jar %this% %args%exit /b %ERRORLEVEL%BAT

# ...

exec java $java_args -jar "$0" "$@"exit 127

PK...

jar (zip) format ignores headers (file entries are in footer)

Pitfalls & Challenges

Pitfalls & Challenges• Plugin version conflicts • Performance impact due to loosely-coupled API

Plugin Version Conflicts

Embulk Core

Java Runtime

aws-sdk.jar v1.9

embulk-input-s3.jar

Version conflicts!

aws-sdk.jar v1.10

embulk-output-redshift.jar

Multiple Classloaders in JVM

Embulk Core

Java Runtime

aws-sdk.jar v1.9

embulk-input-s3.jar

Isolated environments

aws-sdk.jar v1.10

embulk-output-redshift.jar

Class Loader 1

Class Loader 2

Version conflicts in a JRuby Runtime

Embulk Core

Java Runtime

httpclient 2.5.0

embulk-input-sfdc.gem

Version conflicts!

httpclient v2.6.0

embulk-input-marketo.gem

JRuby Runtime

Java Runtime

Multiple JRuby Runtime?

Fluentd Core

activerecord ~> 3.4

fluentd-plugin-sql.gem

Isolated environments?

activerecord ~> 4.2

fluent-plugin-presto.gem ?

Sub VM 1?

Sub VM 2?

Version conflicts in Fluentd

Fluentd Core

CRuby Runtime

activerecord ~> 3.4

fluentd-plugin-sql.gem

Version conflicts!

activerecord ~> 4.2

fluent-plugin-presto.gem ?

Challenges• Version conflict is not completely solved.

• Java can use multiple ClassLoader • I haven’t figured out hot to do the same thing in

Ruby • I don’t have clear ideas to solve performance impact

• Write more code to learn?

Wrapping Up

“How did I build Plugin Architecture?”• I built Fluentd using dynamic plugin loader.

• “Plugin calls Plugins” • Most of features are provided by the ecosystem of plugins.

• I built Embulk using combination of: • Dependency Injection, • JRuby to implement a Dynamic Plugin Loader, • Java VM and nested ClassLoaders to load multiple versions

of plugins. • But some problems are not solved yet:

• Version conflicts in a Ruby VM. • Design patterns of plugins AND high performance.

What’s Next?• You build plugin-based software architecture!

• And you’ll talk to me how you did :-) • I’m working on another project: a distributed

workflow engine • Java VM + Python

Thank You!Sadayuki Furuhashi

Founder & Software Architect

top related