Using Protobuf, Part 1: Introducing Protobuf

If you’re interested in network protocols and data serialization, you may have heard of Google’s protocol buffers, a.k.a. “protobuf”.

A little while back, I worked on a system that communicates with protobuf. My mission: to make this system more robust. In the process, I learned a lot about what protobuf is, what it isn’t, and what we can make it do.

An Introduction to protobuf

Protobuf lives in the same space as JSON and common applications of XML. It’s a method to serialize data into a series of bytes and deserialize that series of bytes back to data your program can work with.

If you’re writing network protocols, or you’re saving or loading data to disk, you’re relying on serialization. Serialization requires agreement between the code doing the serializing and the code doing the deserializing to work.

Like JSON and XML, protobuf can serialize arbitrary data. It sets itself apart with small, tightly-packed message sizes; speed; and its opinionated attitude toward compatibility with messages that are older or newer than the ones your application expects.

That opinionated attitude, in particular, is important to understand. Without that understanding, you may find your application is quietly consuming data that isn’t what you expect.

But let’s start with the basics for today.

Note: we worked with the newest specification, proto3. proto2 is a more complex specification and, while it can do more, wasn’t supported by some of the systems we were working with—so it wasn’t an option.

Defining the Message

Applications that produce and consume protobuf generally do so by way of a .proto file. This file defines messages, which are analogous to objects in the world of protobuf:

syntax = "proto3";

package cafe;

message Coffee {
  bool cream = 1;
}

There are a number of things going on here. Let’s break it down:

  • syntax = "proto3"; tells the protobuf compiler that we’re using version 3 of protobuf.
  • package cafe; sets a namespace for our messages. This translates into, for example, Java packages.
  • message Coffee { … } defines a message called Coffee.
  • And finally, bool cream = 1; defines a field called cream, which is boolean, and is assigned field number 1.
    • Field Number

      If you’re used to JSON and XML, the concept of field numbers is likely new to you, since both use names to identify data elements.

      Field names in protobuf don’t exist inside the serialized messages themselves. Instead, the .proto file packs data in with field numbers.

      The protoc tool that ships with the protobuf distribution can illustrate this. In addition to generating protobuf-handling code for many languages, it is capable of decoding a message—either with or without a .proto file. In the latter case, it shows you exactly what’s in the serialized file — and no more.

      For this example, I’ve used a short Python script to serialize a Coffee object with cream set to True, and saved the output to coffee.protobuf.

      First, let’s decode it raw, without a .proto file:

      % protoc --decode_raw <coffee.protobuf
      1: 1
      

      The protobuf file itself is just two bytes long, and that’s all that’s inside it: field number 1 has a value of 1.

      It looks a bit more like we expect when we ask protoc to use the .proto file to decode the message:

      % protoc --decode=cafe.Coffee \
               --proto_path=. \
               cafe.proto <coffee.protobuf
      cream: true
      

      Now that protoc has all the information it needs to give fields names and types, it’s able to determine that field 1 is named cream, and it has a true boolean value.

      Next Time: Forward and Backward Compatibility

      We’ve gone over the very basics of protobuf here, and hopefully, this peek under the hood has piqued your interest.

      In the next part, we’re going to talk about how protobuf builds on this to handle decoding older versions of serialized messages with newer versions of .proto files that define them, and vice versa. We’ll also talk about how those strategies can impact your use of protobuf, and how to use it smartly.