protocol buffers

26
Protocol Buffers Overview Fabrício Epaminondas - @fabricioepa Senior Software Engineer, Signove

Upload: fabricio-epaminondas

Post on 16-Jul-2015

433 views

Category:

Software


2 download

TRANSCRIPT

Page 1: Protocol buffers

Protocol Buffers OverviewFabrício Epaminondas - @fabricioepa

Senior Software Engineer, Signove

Page 2: Protocol buffers

About me

BSc in Computer Science at Federal University of Campina Grande, UFCG.

Recent activities

• Implementation of IEEE Data Exchange Protocol 11073 part 20601

• Data modeling for Bluetooth services• Data synchronization using REST

services

Page 3: Protocol buffers

Agenda

Background

What are Protocol Buffers?

How do they work?

Why use Protocol Buffers?

Techniques

Questions

Quick Links

Page 4: Protocol buffers

Background

Data Formats in Information Technology

• Typing/interpretation, transmission, storage

Popular data formats...

Page 5: Protocol buffers

CSV

• Simple to read/write by

application

• Tabular data structure

• Flat

• No validation

Name, Age, Phone

Fabricio, 26, +558388000000

Kaka, 28, +558388000001

Cafu, 40, +558388000002

Pele, 70, +558388000003

Page 6: Protocol buffers

XML

• Markup language for Documents

• Hierarchical structure

• Data validation

• A common standard with great acceptance

<person>

<name>Fabricio</name>

<age>26</age>

<contacts>

<email>

[email protected]

</email>

<phone>999</phone>

</contacts>

</person>

Page 7: Protocol buffers

JSON

• Lightweight data-interchange format

• Browser support

• Alternative to XML

person {

name: “Fabricio”

age: 26

contacts: {

email: “[email protected]

phone: “999”

}

}

Page 8: Protocol buffers

Comparison

CSV

XML

JSON

Parsing efficiency

ReusableModel Update

Hierarchical Small Size

Page 9: Protocol buffers

Google's Data Interchange

RequirementsWe use literally thousands of different data formats to represent:

• networked messages between servers• index records in repositories• geospatial datasets

Most of these formats are structured, not flat. This raises an important question…

Page 10: Protocol buffers

How do we encode it all?

Requirements:

Hierarchical data structure

Small data size

Parsing performance

Model update: add/ignore fields, modify parser code...

Backwards compatible

Page 11: Protocol buffers

What are Protocol Buffers?

A language-neutral, platform-neutral, extensible way of serializing structured data for use in communications protocols, data storage, and more.

It was initially developed at Google to deal with an index server request/response protocol

Page 12: Protocol buffers

How do they work?

You define how your structured data format is a descriptor file

Generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages.

You can even update your data structure without breaking deployed programs that are compiled against the "old" format.

Page 13: Protocol buffers

Writing some code…

.proto C++message Person {

required string name = 1;required int32 id = 2;optional string email = 3;

enum PhoneType {MOBILE = 0;HOME = 1;WORK = 2;

}

message PhoneNumber {required string number = 1;optional PhoneType type = 2 [default = HOME];

}

repeated PhoneNumber phone = 4;}

Person person;person.set_name("John Doe");person.set_id(1234);person.set_email("[email protected]");fstream output("myfile", ios::out | ios::binary);//Writeperson.SerializeToOstream(&output);

//Readfstream input("myfile", ios::in | ios::binary);Person person;person.ParseFromIstream(&input);cout << "Name: " << person.name() << endl;cout << "E-mail: " << person.email() << endl;

Page 14: Protocol buffers

Generated code

Messages

• Immutable

Builders

Enums and Nested Classes

• C++: Person:: Mobile• Java: Person.PhoneType.MOBILE

Parsing and Serialization

Page 15: Protocol buffers

Why use Protocol Buffers?

Protocol Buffers’ major design goals is simplicity

Protocol buffers are the flexible, efficient

PB are 3 to 10 times smaller than XML

PB are 20 to 100 times faster than XML

Page 16: Protocol buffers

Comparison

CSV

XML

JSON

PB

Parsing efficiency

ReusableModel Update

Hierarchical Small Size

Page 17: Protocol buffers

Why use Protocol Buffers?

Use object serialization (like in Java) causes interoperability problems.

In C/C++ the raw in-memory data structures can be sent/saved in binary form, but is hard to extend.

Page 18: Protocol buffers

Alternatives

Thrift

ASN1

Java Externalizable

Others IDL...

• WSDL, XSD, XML• CORBA, Java-IDL, etc…

Page 19: Protocol buffers

Techniques

Backward/Forward compatibility

Updating Message Types

O-O Design

Page 20: Protocol buffers

Backward/Forward compatibility

You must not change the tag numbers of any existing fields.

You must not add or delete any required fields.

Consider writing application-specific custom validation routines instead of required fields

You may delete optional or repeated fields.

You may add new optional or repeated fields but you must use fresh tag numbers…

(i.e. tag numbers that were never used in this protocol buffer, not even by deleted fields).

Page 21: Protocol buffers

Backward/Forward compatibility

Old code will simply ignore new fields, for deleted fields it will read default values

Unknown fields are not discarded, and if the message is later serialized, the unknown fields are serialized along with it

Changing a default value is generally OK, but remember default values are never sent over the wire

Receiver will NOT see the default value that was defined in the sender's code.

New code will also transparently read old messages

Page 22: Protocol buffers

Updating Message Types

Don't change the numeric tags for any existing fields.

Despite of non-required fields can be removed, it’s better to rename the field instead to something like “DEPRECATED_...”

int32, uint32, int64, uint64, and bool are all compatible. It does not breaks forwards- or backwards-compatibility.

string and bytes are compatible as long as the bytes are valid UTF-8.

More issues in protobuf manual

Page 23: Protocol buffers

O-O Design

Generated source code of message objects should not be modified

Use wrappers to encapsulate messages

Do not inherit from message objects

Page 24: Protocol buffers

Questions…

Page 25: Protocol buffers
Page 26: Protocol buffers

Quick Links

• API

▫ http://code.google.com/apis/protocolbuffers/

• Post By Kenton Varda, Protocol Buffers Team▫ http://google-opensource.blogspot.com/2008/07/protocol-buffers-googles-data.html

• Kevin Weil, Analytics Lead, Twitter

▫ http://www.slideshare.net/kevinweil/protocol-buffers-and-hadoop-at-twitter

• Benchmarks

▫ http://code.google.com/p/thrift-protobuf-compare/wiki/Benchmarking

• Computer World Article

▫ http://www.computerworld.com/s/article/9191098/Twitter_solves_its_data_formatting_challenge