data serialization using google protocol buffers

Post on 17-Dec-2014

317 Views

Category:

Software

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

An easier and more flexible way to structure data for Platform and language independent transportation and storage

TRANSCRIPT

DATA SERIALIZATION WITH GOOGLE PROTOCOL BUFFERS

By: William Kibira

What is Data Serialization

● The process of translating a data structure and its object state into a format that can be stored in a memory buffer, file or transported on a network.

● End goal being that it can be reconstructed in another computer environment.

Reasons as To why We do this

● Persist Objects [Store and later Retrieve them]● Perform Remote Procedural Calls● Create Distributed Objects [Corba , JavaRMI,

ICE]

Key Words

● Computer Environment

- Programming Languages

- Operating Systems

- Architectures and processors

● Platform Independent Solutions

Popular Platform Independent Solutions

● JSON and XML● BSON and Binary XML● Google Protocol Buffer , Thrift , Avro

Ref

http://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats

JSON AND XML

● Most popular● Easily Human Readable to some extent● Most Web based APIs use it by default● Lots of generators for this stuff

How to works

● You write an IDL [Interface Description Language] . Kinda like CORBA IDLs but , much cleaner and more flexible.

● Pass it through a C++ based code generator● Get your Boiler plate code in a given language

you specified

GOOGLE PROTOCOL BUFFERS

● This is a platform independent language independent data serialization solution similar to XML in structure but much smaller in size and easier to structure .

● Been there since 2001 , made open in 2008

JSON BINARY FORMATS

● JSON is darn easy to read , If you can read binary , you definitely need to see a doctor.

● JSON [Gets fat even on little Data], Binary really compact{"deposit_money": "12345678"}

JSON BINARY

'0x6d', '0x6f', '0x6e', '0x01', '0xBC614E'

'0x65', '0x79', '0x31',

'0x32', '0x33', '0x34',

'0x35', '0x36', '0x37',

'0x38'

SPEED AT PARSING

● JSON is Fairly fast but , Binary is close to machine speed since it is readily parse-able.

FLOW

Schema / IDL

C++ Code Generator

C++ JAVA Python JavaScript

Server /Client application bases

What does a Schema Look Like ?

Howto Generate the Code

● Use the protobuffer compiler by specifying the language you want out put and the file.proto

● Protoc -I=/DIR_to_Schema/ --out_language=FOLDER_TO_Buffer/ DIR_TO_Schema/file.proto

A Look in my Terminal

What is Inside My XX.java

SIZE COMPARISON

RMI

GPB

JSON

XML

0 100 200 300 400 500 600 700 800 900 1000

905

250

559

836

Runtime Performance

Server CPU AVG Client CPU AVG Time

Protobuf 30.0% 37.75% 01:19:48

JSON 20.0% 75.00% 04:44:83

XML 12.00 80.75% 05:27:45

Versioning

● This is to do with backward compatibility between Protocol buffers that are old or new

● Old server new Client and Vice Versa

Even if a field has changed , the data will be parsed

Other Protocol Buffers

● MessagePack [.Net]● Thrift [Facebook]● Avro

Reasons To use Protocol Buffers

● They are smaller to push around over networks

● Easier [If Not easiest] to structure● Give a sense object oriented structuring

Reasons Not To use it

● Well, you will have to maintain both the server and clients .

● They may in most cases not be easy to learn● They are not an industry standard.● I am just trying to be fair here :)

SIMPLE DEMO CHAT APPS

● Simple chat application working on both desktops, laptops and Also on different Operating systems

● Partial Inspiration from the Fifth Estate

THE END

● Links to Check out

Google Protocol Buffers Main Page

https://developers.google.com/protocol-buffers/

● Apache Thrift

https://thrift.apache.org/

top related