data serialization using google protocol buffers
DESCRIPTION
An easier and more flexible way to structure data for Platform and language independent transportation and storageTRANSCRIPT
DATA SERIALIZATION WITH GOOGLE PROTOCOL BUFFERS
By: William Kibira
What is Data Serialization
● The process of translating a data structure and its object state into a format that can be stored in a memory buffer, file or transported on a network.
● End goal being that it can be reconstructed in another computer environment.
Reasons as To why We do this
● Persist Objects [Store and later Retrieve them]● Perform Remote Procedural Calls● Create Distributed Objects [Corba , JavaRMI,
ICE]
Key Words
● Computer Environment
- Programming Languages
- Operating Systems
- Architectures and processors
● Platform Independent Solutions
Popular Platform Independent Solutions
● JSON and XML● BSON and Binary XML● Google Protocol Buffer , Thrift , Avro
Ref
http://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats
JSON AND XML
● Most popular● Easily Human Readable to some extent● Most Web based APIs use it by default● Lots of generators for this stuff
How to works
● You write an IDL [Interface Description Language] . Kinda like CORBA IDLs but , much cleaner and more flexible.
● Pass it through a C++ based code generator● Get your Boiler plate code in a given language
you specified
GOOGLE PROTOCOL BUFFERS
● This is a platform independent language independent data serialization solution similar to XML in structure but much smaller in size and easier to structure .
● Been there since 2001 , made open in 2008
JSON BINARY FORMATS
● JSON is darn easy to read , If you can read binary , you definitely need to see a doctor.
● JSON [Gets fat even on little Data], Binary really compact{"deposit_money": "12345678"}
JSON BINARY
'0x6d', '0x6f', '0x6e', '0x01', '0xBC614E'
'0x65', '0x79', '0x31',
'0x32', '0x33', '0x34',
'0x35', '0x36', '0x37',
'0x38'
SPEED AT PARSING
● JSON is Fairly fast but , Binary is close to machine speed since it is readily parse-able.
FLOW
Schema / IDL
C++ Code Generator
C++ JAVA Python JavaScript
Server /Client application bases
What does a Schema Look Like ?
Howto Generate the Code
● Use the protobuffer compiler by specifying the language you want out put and the file.proto
● Protoc -I=/DIR_to_Schema/ --out_language=FOLDER_TO_Buffer/ DIR_TO_Schema/file.proto
A Look in my Terminal
What is Inside My XX.java
SIZE COMPARISON
RMI
GPB
JSON
XML
0 100 200 300 400 500 600 700 800 900 1000
905
250
559
836
Runtime Performance
Server CPU AVG Client CPU AVG Time
Protobuf 30.0% 37.75% 01:19:48
JSON 20.0% 75.00% 04:44:83
XML 12.00 80.75% 05:27:45
Versioning
● This is to do with backward compatibility between Protocol buffers that are old or new
● Old server new Client and Vice Versa
Even if a field has changed , the data will be parsed
Other Protocol Buffers
● MessagePack [.Net]● Thrift [Facebook]● Avro
Reasons To use Protocol Buffers
● They are smaller to push around over networks
● Easier [If Not easiest] to structure● Give a sense object oriented structuring
Reasons Not To use it
● Well, you will have to maintain both the server and clients .
● They may in most cases not be easy to learn● They are not an industry standard.● I am just trying to be fair here :)
SIMPLE DEMO CHAT APPS
● Simple chat application working on both desktops, laptops and Also on different Operating systems
● Partial Inspiration from the Fifth Estate
THE END
● Links to Check out
Google Protocol Buffers Main Page
https://developers.google.com/protocol-buffers/
● Apache Thrift
https://thrift.apache.org/