pentaho and nosql

39
1 JaMU – Jakarta 7 Maret 2014 Pentaho and NoSQL Java Meet Up (JaMU), Jakarta 7 th March, 2014 Feris Thia [email protected] 08176-474-525

Upload: feris-thia

Post on 05-Dec-2014

1.088 views

Category:

Documents


6 download

DESCRIPTION

This is the powerpoint slide presentation given during Jakarta Java Meet Up, 7th March 2014 at BliBli.com.

TRANSCRIPT

Page 1: Pentaho and NoSQL

1JaMU – Jakarta 7 Maret 2014

Pentaho and NoSQLJava Meet Up (JaMU), Jakarta

7th March, 2014

Feris [email protected]

Page 2: Pentaho and NoSQL

2JaMU – Jakarta 7 Maret 2014

ABOUT ME

Founder

2007 2013Feris Thia

PHI-Integration

Page 3: Pentaho and NoSQL

3JaMU – Jakarta 7 Maret 2014

ABOUT ME

Book Author

Feris Thia

November 2013

Page 4: Pentaho and NoSQL

4JaMU – Jakarta 7 Maret 2014

ABOUT ME

Community Manager

Feris Thia

Excel Indonesia User Group (EIUG)

Pentaho User Group Indonesia (Pentaho-

ID)2008(~1000 members)

2013(~5000

members)

Page 5: Pentaho and NoSQL

5JaMU – Jakarta 7 Maret 2014

ABOUT MEPHI-Integration Clients

Community Manager

Feris Thia

Page 6: Pentaho and NoSQL

6JaMU – Jakarta 7 Maret 2014

AGENDA

DATA PREPARATIONWhat and why it is

important?

PENTAHO DATA INTEGRATIONPopular Open Source ETL

NOSQLAn Emerging Non Relational

DatabaseTechnology

Page 7: Pentaho and NoSQL

7JaMU – Jakarta 7 Maret 2014

PROBLEMS?

Page 8: Pentaho and NoSQL

8JaMU – Jakarta 7 Maret 2014

image source: http://www.huntbigsales.com/winning-in-the-meeting-after-the-meeting/

What cause sales increase in this area? Is

there something unusual happen?

WHAT?? So we cannot make any decisions until the data ready.

We need some times to prepare additional data to

answer that.

Yes, sir….

Page 9: Pentaho and NoSQL

9JaMU – Jakarta 7 Maret 2014

Image Source: http://wrapbootstrap.com/preview/WB0KDM51J/

TYPICAL SOLUTION

SOPHISTICATED REPORTING OR DASHBOARD APPLICATION!

Page 10: Pentaho and NoSQL

10JaMU – Jakarta 7 Maret 2014

Image Source: http://reallybadboss.com/wp-content/uploads/2012/02/frustration.jpg

PROBLEMS REMAIN…

Page 11: Pentaho and NoSQL

11JaMU – Jakarta 7 Maret 2014

Time Spent on Data Preparation

80 %

Data Quality

50%

Extract, Transformation & Load

30%

Page 12: Pentaho and NoSQL

12JaMU – Jakarta 7 Maret 2014

Page 13: Pentaho and NoSQL

13JaMU – Jakarta 7 Maret 2014

DATA PREPARATION IS THE KEY

Entry Systems Data PreparationReportingBasic Data

Presentation

Performance Dashboard

(Visualization)

1 2 3 4

Notes: Data preparation is often undermine.

Page 14: Pentaho and NoSQL

14JaMU – Jakarta 7 Maret 2014

DATA WAREHOUSE

Entry Systems Data Warehouse BusinessIntelligence

1 2 3

Page 15: Pentaho and NoSQL

15JaMU – Jakarta 7 Maret 2014

DATA WAREHOUSE

Page 16: Pentaho and NoSQL

16JaMU – Jakarta 7 Maret 2014

CHALLENGES

Page 17: Pentaho and NoSQL

17JaMU – Jakarta 7 Maret 2014

INTEGRATIONof many data sources

INCREMENTALExtract only changes

DATA SIZEBig data

INFRASTRUCTUREnetwork failure, high latency, slow i/o, etc.

DATA QUALITYmissing data, conversion etc.

PROTOCOLdriver availability, reliability, etc.

EXTRACT

Page 18: Pentaho and NoSQL

18JaMU – Jakarta 7 Maret 2014

NORMALIZE

DENORMALIZE

SPLIT / MERGE

DATA REDUCTION

(Aggregate, etc)

TRANSPOSE

TEXT PARSING

TRANSFORM

Page 19: Pentaho and NoSQL

19JaMU – Jakarta 7 Maret 2014

PERFORMANCEof many data sources

CHANGESstructure, data type, column

size, etc

DATA SIZEBig data

INFRASTRUCTUREnetwork failure, high latency, slow i/o, etc.

DATA MAPPINGsync with correlated data

Output FormatExcel, PDF, HTML, RDBMS, etc.

LOAD

Page 20: Pentaho and NoSQL

20JaMU – Jakarta 7 Maret 2014

DEMOData structure changes to increase SQL query performance.

Page 21: Pentaho and NoSQL

21JaMU – Jakarta 7 Maret 2014

Pentaho Data IntegrationOpen Source ETL

Page 22: Pentaho and NoSQL

22JaMU – Jakarta 7 Maret 2014

FEATURES AND BENEFITS

• Open Source

• Cost Efficient

• More than 200 modules

• Multi OS Platform

• Working with emerging Big Data platforms

• Low Learning Curve

Page 23: Pentaho and NoSQL

23JaMU – Jakarta 7 Maret 2014

DEMO

Basic Extract and

TransformaionMore I/O Helper Table

(Closure)

1 2 3

Page 24: Pentaho and NoSQL

24JaMU – Jakarta 7 Maret 2014

NoSQLNot only SQL

Page 25: Pentaho and NoSQL

25JaMU – Jakarta 7 Maret 2014

2009Redis Initial Release

TIMELINEEmergence of open source NoSQL

2004 2006 2007 2008 2009 2011 2012 2013 2014

2007MongoDB Started,

Neo4J Initial Release

2004Google’s Map Reduce Paper

Published

2012Google Spanner

PaperPublished

1998

1998NoSQL coined

2006HadoopStarted

2008Apache Hbase,

Apache Cassandra

Page 26: Pentaho and NoSQL

26JaMU – Jakarta 7 Maret 2014

NOSQL GROUPS

DOCUMENTMongoDB, CouchDB,

Riak

WIDE COLUMNCassandra, Hbase,

Hypertable

GRAPHNeo4J, OrientDB

KEY - VALUERedis, MemcacheDB,

SimpleDB

<K, V>

Page 27: Pentaho and NoSQL

27JaMU – Jakarta 7 Maret 2014

NOSQL VS SQL

http://gigaom.com/2010/07/12/nosql-pioneers-are-driving-the-webs-manifest-destiny/

Data Store Type Use Cases Advantages Disadvantages Key Product

Key-Value In-memory cache, web-site analytics, log file analysis

Simple, replication, versioning, locking, transactions, and sorting web-accessible, schema-less, distributed

Simple, small set of data types, limited transaction support

Redis, Scalaris, Tokyo Cabinet

Tabular or Columnar Data mining, analytics Rapid data aggregation, scalable, versioning, locking, web-accessible, schema-less, distributed

Limited transaction support Google BigTable, Hbase or HyperTable, Cassandra

Document Store Document management CRM, Business continuity

Stores and retrieves unstructured documents, map/reduce, web- accessible, schema-less, distributed

Limited transaction support CouchDB, MongoDB, Riak

Traditional Relational Transaction processing, typical corporate workloads

Well documented and supported, mature code, widely implemented in production

Cost, vertical scaling, increased complexity

Oracle, Microsoft SQL Server, MySQL Cluster

Page 28: Pentaho and NoSQL

28JaMU – Jakarta 7 Maret 2014

Nosql VS SQL

• Schema are much more flexible

• Non relational (no joins)

• Horizontal Scalability

• Master – Slave

• Peer-to-peer

• Data Pipeline

– Expressions

– Functional Programming

• ACID (Atomicity, Consistency, Isolation, Durability)

• BASE (Basic Availability, Soft-state, Eventual consistency)

• CAP (Consistency, Availability, Partition Tolerance)

Page 29: Pentaho and NoSQL

29JaMU – Jakarta 7 Maret 2014

DB-ENGINES.COM DB RANKINGPER 7 MARCH 2014

Rank Last Month DBMS Database Model Score Changes

1 1Oracle Relational DBMS 1491.8 -8.43

2 2MySQL Relational DBMS 1290.21 1.83

3 3Microsoft SQL Server Relational DBMS 1205.28 -8.99

4 4PostgreSQL Relational DBMS 235.06 4.61

5 5MongoDB Document store 199.99 4.81

6 6DB2 Relational DBMS 187.32 -1.14

7 7Microsoft Access Relational DBMS 146.48 -6.4

8 8SQLite Relational DBMS 92.98 -0.03

9 9Sybase ASE Relational DBMS 81.55 -6.33

10 10Cassandra Wide column store 78.09 -2.23

Page 30: Pentaho and NoSQL

30JaMU – Jakarta 7 Maret 2014

MongoDBDocument Oriented Database

• Schemaless

• Distributed

• Auto Sharding

• Map Reduce Capabilities

• Multi Platform

• Structures

– Database

– Collections

– Documents

• Document

– A record is a document

– Similar to JSON Objects

Page 31: Pentaho and NoSQL

31JaMU – Jakarta 7 Maret 2014

MongoDB

• MongoDB Shell

• Insertdb.koleksi.insert( {nama: “PHI-Integration”, type: “Company”})

• Insert / Updatedb. koleksi.update( {nama: “PHI-Integration”}, {name: “Lightora”},

{upsert:true})

• Deletedb. koleksi.remove( {nama: “PHI-Integration”, type: “Company”})

• Read / Query

db. koleksi.find( {nama: “PHI-Integration”, $and [ {posting: {$gt : 100}} , posting: {$lt: 200}}])

Basic Commands & Expressions

Page 32: Pentaho and NoSQL

32JaMU – Jakarta 7 Maret 2014

MONGODB DEMO

Basic Commands

PDI ExtractandLoad

Aggregation Framework

1 2 3

Page 33: Pentaho and NoSQL

33JaMU – Jakarta 7 Maret 2014

Neo4jGraph Database

Properties

RelationshipCypher

Node

Page 34: Pentaho and NoSQL

34JaMU – Jakarta 7 Maret 2014

Neo4J

• Neo4J Web Admin

• Create Node

CREATE (n {property_name :“property_value" })

• Create Relation

CREATE n-[:RELATION]->m

• Where:

– n, m is identifier

– :RELATION is relation name

Basic Utility, Commands & Expressions

Page 35: Pentaho and NoSQL

35JaMU – Jakarta 7 Maret 2014

Neo4J

• Matching and Returning Objects

START emil=node:people(name='Emil')

MATCH emil-[:MARRIED_TO]-madde

RETURN madde

Basic Commands & Expressions

Page 36: Pentaho and NoSQL

36JaMU – Jakarta 7 Maret 2014

HIERARCHICAL MODELNeo4j Case Demo

Root

Child 3 Child 4Child 2Child 1 Child 5

Page 37: Pentaho and NoSQL

37JaMU – Jakarta 7 Maret 2014

Q&A

Page 38: Pentaho and NoSQL

38JaMU – Jakarta 7 Maret 2014

Universitas Multimedia NusantaraNew Media Tower, Lv.12Scientia Boulevard St.Tangerang, Banten, 15811

+6221-7038-7738 (phone)+ 628176-474-525 (mobile)

https://www.facebook.com/feris.thia

@FerisThia

[email protected]

CONTACT ME

Page 39: Pentaho and NoSQL

39JaMU – Jakarta 7 Maret 2014

BIGTHANK YOU !