data serialization using google protocol buffers

23
DATA SERIALIZATION WITH GOOGLE PROTOCOL BUFFERS By: William Kibira

Upload: william-kibira

Post on 17-Dec-2014

317 views

Category:

Software


1 download

DESCRIPTION

An easier and more flexible way to structure data for Platform and language independent transportation and storage

TRANSCRIPT

Page 1: Data Serialization Using Google Protocol Buffers

DATA SERIALIZATION WITH GOOGLE PROTOCOL BUFFERS

By: William Kibira

Page 2: Data Serialization Using Google Protocol Buffers

What is Data Serialization

● The process of translating a data structure and its object state into a format that can be stored in a memory buffer, file or transported on a network.

● End goal being that it can be reconstructed in another computer environment.

Page 3: Data Serialization Using Google Protocol Buffers

Reasons as To why We do this

● Persist Objects [Store and later Retrieve them]● Perform Remote Procedural Calls● Create Distributed Objects [Corba , JavaRMI,

ICE]

Page 4: Data Serialization Using Google Protocol Buffers

Key Words

● Computer Environment

- Programming Languages

- Operating Systems

- Architectures and processors

● Platform Independent Solutions

Page 5: Data Serialization Using Google Protocol Buffers

Popular Platform Independent Solutions

● JSON and XML● BSON and Binary XML● Google Protocol Buffer , Thrift , Avro

Ref

http://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats

Page 6: Data Serialization Using Google Protocol Buffers

JSON AND XML

● Most popular● Easily Human Readable to some extent● Most Web based APIs use it by default● Lots of generators for this stuff

Page 7: Data Serialization Using Google Protocol Buffers

How to works

● You write an IDL [Interface Description Language] . Kinda like CORBA IDLs but , much cleaner and more flexible.

● Pass it through a C++ based code generator● Get your Boiler plate code in a given language

you specified

Page 8: Data Serialization Using Google Protocol Buffers

GOOGLE PROTOCOL BUFFERS

● This is a platform independent language independent data serialization solution similar to XML in structure but much smaller in size and easier to structure .

● Been there since 2001 , made open in 2008

Page 9: Data Serialization Using Google Protocol Buffers

JSON BINARY FORMATS

● JSON is darn easy to read , If you can read binary , you definitely need to see a doctor.

● JSON [Gets fat even on little Data], Binary really compact{"deposit_money": "12345678"}

JSON BINARY

'0x6d', '0x6f', '0x6e', '0x01', '0xBC614E'

'0x65', '0x79', '0x31',

'0x32', '0x33', '0x34',

'0x35', '0x36', '0x37',

'0x38'

Page 10: Data Serialization Using Google Protocol Buffers

SPEED AT PARSING

● JSON is Fairly fast but , Binary is close to machine speed since it is readily parse-able.

Page 11: Data Serialization Using Google Protocol Buffers

FLOW

Schema / IDL

C++ Code Generator

C++ JAVA Python JavaScript

Server /Client application bases

Page 12: Data Serialization Using Google Protocol Buffers

What does a Schema Look Like ?

Page 13: Data Serialization Using Google Protocol Buffers

Howto Generate the Code

● Use the protobuffer compiler by specifying the language you want out put and the file.proto

● Protoc -I=/DIR_to_Schema/ --out_language=FOLDER_TO_Buffer/ DIR_TO_Schema/file.proto

Page 14: Data Serialization Using Google Protocol Buffers

A Look in my Terminal

Page 15: Data Serialization Using Google Protocol Buffers

What is Inside My XX.java

Page 16: Data Serialization Using Google Protocol Buffers

SIZE COMPARISON

RMI

GPB

JSON

XML

0 100 200 300 400 500 600 700 800 900 1000

905

250

559

836

Page 17: Data Serialization Using Google Protocol Buffers

Runtime Performance

Server CPU AVG Client CPU AVG Time

Protobuf 30.0% 37.75% 01:19:48

JSON 20.0% 75.00% 04:44:83

XML 12.00 80.75% 05:27:45

Page 18: Data Serialization Using Google Protocol Buffers

Versioning

● This is to do with backward compatibility between Protocol buffers that are old or new

● Old server new Client and Vice Versa

Even if a field has changed , the data will be parsed

Page 19: Data Serialization Using Google Protocol Buffers

Other Protocol Buffers

● MessagePack [.Net]● Thrift [Facebook]● Avro

Page 20: Data Serialization Using Google Protocol Buffers

Reasons To use Protocol Buffers

● They are smaller to push around over networks

● Easier [If Not easiest] to structure● Give a sense object oriented structuring

Page 21: Data Serialization Using Google Protocol Buffers

Reasons Not To use it

● Well, you will have to maintain both the server and clients .

● They may in most cases not be easy to learn● They are not an industry standard.● I am just trying to be fair here :)

Page 22: Data Serialization Using Google Protocol Buffers

SIMPLE DEMO CHAT APPS

● Simple chat application working on both desktops, laptops and Also on different Operating systems

● Partial Inspiration from the Fifth Estate

Page 23: Data Serialization Using Google Protocol Buffers

THE END

● Links to Check out

Google Protocol Buffers Main Page

https://developers.google.com/protocol-buffers/

● Apache Thrift

https://thrift.apache.org/