building hybrid data cluster using postgresql and mongodb

Post on 15-Jul-2015

809 Views

Category:

Software

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Building a Hybrid Data Cluster with MongoDB

and PostgresA solution based on PostgreSQL’s Foreign Data Wrapper

27 April 2015

Context and Customer scenario

Customer Requirements for Hybrid Cluster

- More and more unstructured data being generated

- Increasing use and requirements of noSQL databases –because of

- usage scenario- ability to scale horizontally

- Challenges- A lot of Admin and Developer still prefer SQL as easy and

intutive tool to query information out of available data- Not many noSQL databases support complex queries as SQL

does e.g. JOINs, Sub-query etc

3

Real Life Use Cases

- noSQL as Archive store of RDBMS- RDBMS being used to store the operational and transactional data

- while noSQL may act as an archive store for historical data

- noSQL for receiving write stream- noSQL databases being used to accumulate data from various sources

with high write throughput across multiple shards

- while RDBMS is used to store the filtered data after it has been transformed into proper structures

- RDBMS makes it easier for the users to query data using SQLs and JOINs

4

Hybrid Data Cluster is the ‘need of hour’

- Most Advanced Open Source Database

- Supports Relational model of storing database

- Supports ACID features of Transactions- Multi Version Concurrency Control

- Write Ahead WAL files

- Scalability with Tablespaces and Partitions/child tables

- Supports unstructured data-types (JSON, JSONB, HSTORE) and full text search features

PostgreSQL

6

- Most popular noSQL Database for vast set of workloads

- Best for storing un-structured data

- Horizontal Scalability with sharding capability

- Provision for secondary indexes

- Aggregation and Map-reduce features

MongoDB

7

- Get the best out of both the worlds

- Based on SQL/MED – Management of External Data

- Allows you to create FOREIGN TABLES which maps to external entities

- These entities could be - Table in RDBMS- collection in MongoDB- Or can be mapped respective entities in HDFS or File System

- More about FDW in Postgres: https://wiki.postgresql.org/wiki/Foreign_data_wrappers

Foreign Data Wrappers of PostgreSQL

8

FDW for MongoDB

- Started by CitusDB and then forked by EnterpriseDB

- More details - https://github.com/EnterpriseDB/mongo_fdw

- The example we will discuss here is based on a Blogpost from EnterpriseDB -http://www.enterprisedb.com/postgres-plus-edb-blog/jason-davis/tales-trenches-new-mongodb-fdw

- Let’s go through the Demo

MongoDB FDW

10

Preparing the MongoDB

- Platform: Windows 7- Create the directories that you will need

- cd d:\mongodb- mkdir a0- mkdir b0- mkdir c0- mkdir c1- mkdir c2- mkdir d0- mkdir d1- mkdir d2- mkdir cfg0- mkdir cfg1- mkdir cfg2

Prepare for a MongoDB Cluster

12

mongod --configsvr --dbpath d:\mongodb\cfg0 --port 26050 --install --logpathd:\mongodb\cfg0.log --serviceName new_mongod_cfg0 --serviceDisplayNamenew_mongod_cfg0

net start new_mongod_cfg0

mongod --configsvr --dbpath d:\mongodb\cfg1 --port 26051 --install --logpathd:\mongodb\cfg1.log --serviceName new_mongod_cfg1 --serviceDisplayNamenew_mongod_cfg1

net start new_mongod_cfg1

mongod --configsvr --dbpath d:\mongodb\cfg2 --port 26052 --install --logpathd:\mongodb\cfg2.log --serviceName new_mongod_cfg2 --serviceDisplayNamenew_mongod_cfg2

net start new_mongod_cfg2

Create the services for MongoDB Cluster: ConfigServer

13

mongod --shardsvr --replSet a --dbpath d:\mongodb\a0 --logpath d:\mongodb\a0.log --port 27000 --smallfiles --oplogSize 50 --install --serviceName new_mongod_shrd_a0 --serviceDisplayName new_mongod_shrd_a0

net start new_mongod_shrd_a0

mongod --shardsvr --replSet b --dbpath d:\mongodb\b0 --logpath d:\mongodb\b0.log --port 27100 --smallfiles --oplogSize 50 --install --serviceName new_mongod_shrd_b0 --serviceDisplayName new_mongod_shrd_b0

net start new_mongod_shrd_b0

mongod --shardsvr --replSet c --dbpath d:\mongodb\c0 --logpath d:\mongodb\c0.log --port 27200 --smallfiles --oplogSize 50 --install --serviceName new_mongod_shrd_c0 --serviceDisplayName new_mongod_shrd_c0

net start new_mongod_shrd_c0

Create the services for MongoDB Cluster: Create Shards

14

- Though here for simplicity we have skipped the creation of replica set you can do that

- e.g. - mkdir a1

- mongod --shardsvr --replSet a --dbpath d:\mongodb\a1 --logpathd:\mongodb\a0.log --port 27001 --smallfiles --oplogSize 50 --install --serviceName new_mongod_shrd_a1 --serviceDisplayNamenew_mongod_shrd_a1

- net start new_mongod_shrd_a1

Create the services for MongoDB Cluster: Optionally Create the Replicas

15

- mongos --configdbsameer:26050,sameer:26051,sameer:26052 --install --serviceName new_mongos_svc0 --serviceDisplayNamenew_mongos_svc0 --logpath d:\mongodb\mongos0.log --port 26060

- net start new_mongos_svc0

Initiate the Mongos

16

- I am going to initiate 1 member replica set for all my shards

Initiate the Replica Set

17

- Shard Amongo --port 27000> rs.initiate()a:OTHER> rs.conf()a:PRIMARY> exit

- Shard Bmongo --port 27100> rs.initiate()b:OTHER> rs.conf()b:PRIMARY> exit

- Shard Cmongo --port 27200> rs.initiate()c:OTHER> rs.conf()c:PRIMARY> exit

mongo --port 26060 test

mongos> sh.addShard("sameer:27100")

mongos> sh.addShard("sameer:27200")

mongos> sh.addShard("sameer:27000")

mongos> sh.enableSharding("db")

mongos> sh.shardCollection("db.warehouse",{warehouse_created:1},true)

Setup Sharding

18

mongos> use db

mongos> db.createUser(

... {

... user: "superuser",

... pwd: "password",

... roles: [ { role: "root", db: "admin" } ]

... }

... )

Setup Users and Security

19

Creating FDW Extension in Postgres

- Download MongoDB FDW from Github

- Installation is quite easy when you use autogen.sh- Cd $PATH_WHERE_FDW_IS_EXTRACTED- ./autogen.sh

- It will automatically install all the required components- libbson- libmongoc

- Once installation is done then you can make and install- make -f Makefile.meta && make -f Makefile.meta install

Build MongoDB FDW

21

- Allows you to build with Legacy Driver or Master Branch

- Has read and write capability for the foreign table

- Connection Pooling which uses the same MongoDB connection for queries in same session

- Build with MongoDB's legacy branch driver- autogen.sh --with-legacy

- Build MongoDB's master branch driver- autogen.sh --with-master

Features of mongo_fdw

22

- Create Extension for mongo_fdw in PostgreSQL database

- You may create the table in template database

- Create a Foreign Data Server

- Create a user mapping a MongoDB user in Postgres

- Create Foreign Table which maps to a MongoDB Collection

Using mongo_fdw

23

- psql=# CREATE EXTENSION mongo_fdw;

- psql=# CREATE SERVER mongo_server

FOREIGN DATA WRAPPER mongo_fdw

OPTIONS (address '192.168.160.1', port '26060');

- psql=# CREATE USER MAPPING FOR postgres

SERVER mongo_server

OPTIONS (username 'superuser',

password 'password');

Create Foreign Server: Example

24

- psql=# CREATE FOREIGN TABLE warehouse(

_id NAME,

warehouse_id int,

warehouse_name text,

warehouse_created timestamptz)

SERVER mongo_server

OPTIONS (database 'db', collection 'warehouse');

Create Foreign Table: Example

25

- It stores a unique Object ID

- By default if you skip this column MongoDB will insert a 12 Byte BSON Object ID

- While inserting data into MongoDB you may choose the value of this field

- In mongo_fdw you have to define _id column with its data type as “NAME”

- mongo_fdw will ignore the value inserted in _id column and let MongoDB

‘_id’ column of MongoDB

26

- INSERT INTO warehouse values (0, 1, 'UPS', '2014-12-12T07:12:10Z');

- INSERT INTO warehouse values (0, 2, 'EMS', '2013-12-12T07:12:10Z');

- INSERT INTO warehouse values (0, 3, 'ASX', '2013-11-12T07:12:10Z');

- UPDATE warehouse set warehouse_name = 'UPS_NEW' where warehouse_id = 1;

DML on Foreign Tables

27

- Connect to MongoDB- mongo --port 26060 --username superuser --password password

- Check the data in collection- db.warehouse.find()

Operations on MongoDB

28

- You can run analyze on the foreign Table to collect statistics

- You can fire queries with “where” clause

- You may fire JOIN queries with other FOREIGN TABLE or NATIVE PostgreSQL Tables

Operations in Postgres on Foreign Data

29

Live walkthrough of the Hybrid Cluster

Leverage upon complex SQLs with Sharded MongoDB

Benefits of this Setup

- Build a sharded MongoDB cluster with SQL Interface

- Query MongoDB data using SQL

- Join MongoDB collections with each other or with tables in Postgres

- Combine and process MongoDB data with data from other data source with help of respective FDW e.g. Hadoop, Oracle, MySQL etc

- Add more shards on the go

- Add Replica for MongoDB on the go

- Use Postgres as front end to insert/update/delete data in MongoDB using SQL

31

Send us your suggestions and questions

success@ashnik.com

Stay Tuned!

Website: www.ashnik.com

top related