Protocol Buffers: Beyond JSON for Modern Data Exchange

Published: Dec 23, 2024

By: Paolo Galeotti

Origins and Evolution

Protocol Buffers (protobuf) emerged from Google's internal development needs in the early 2000s. Faced with the challenge of efficiently serializing structured data across multiple services and languages, Google engineers developed this language-agnostic, platform-neutral mechanism. Released as open-source in 2008, protobuf has since become a cornerstone of efficient data interchange in distributed systems.

What are Protocol Buffers?

Protocol Buffers are a method of serializing structured data for use in communications protocols and data storage. Unlike text-based formats like JSON or XML, protobuf uses a binary format, resulting in smaller data sizes and faster parsing. At its core, protobuf relies on schema definitions (.proto files) that specify the structure of your data:

syntax = "proto3";

message Person {
  string name = 1;
  int32 age = 2;
  repeated string hobbies = 3;
}

This schema-first approach provides several advantages:

Type Safety: The schema ensures data consistency across data consumers and producers to avoid runtime errors
Code Generation: Automated generation of types and models in multiple languages at compile-time (nearly all languages are supported by protobuf)
Versioning Support: Built-in mechanisms for evolving data structures

When working with protobuf, the workflow is as follows: define proto files → generate code → use type-safe generated serializers and deserializers in your language

Note that when defining messages, you need to specify a number for each property. That represents the actual identifier in the binary encoding. There are several things to keep in mind, you can read more in the official guide.

Use Cases

Protobuf excels in scenarios requiring:

High-performance data transfers
Strict schema validation
Cross-language compatibility
Optimized size for data storage

Google also used Protocol Buffers as the main data transfer format for its own RPC framework. gRPC is a high-performance RPC framework mainly used for server-to-server communication (but it can also be used in other applications, even in browsers), ensuring:

Strongly typed service contracts
Efficient binary communication
Automatic client/server code generation

The nice thing about gRPC is that it is tightly integrated with .proto files. You can define services, request, responses directly in your contracts and the protobuf compiler will generate type-safe clients and servers for any language:

syntax = "proto3";

service TodoService {
  rpc CreateTodo (CreateTodoRequest) returns (TodoResponse);
  rpc GetTodo (GetTodoRequest) returns (TodoResponse);
  rpc ListTodos (Empty) returns (TodoListResponse);
  rpc DeleteTodo (DeleteTodoRequest) returns (Empty);
}

message Todo {
  string id = 1;
  string title = 2;
  string description = 3;
  bool completed = 4;
}

message CreateTodoRequest {
  string title = 1;
  string description = 2;
}

message GetTodoRequest {
  string id = 1;
}

message DeleteTodoRequest {
  string id = 1;
}

message TodoResponse {
  Todo todo = 1;
}

message TodoListResponse {
  repeated Todo todos = 1;
}

message Empty {}

This example shows a classic CRUD application defined all in a single .proto file. This will generate functions and methods that you can use in any language supported to implement clients and servers, all at compile time.

Our use case: High-Frequency WebSocket Communication

At Quinck, we leveraged protobuf for a unique use case: enabling high-frequency, type-safe binary WebSocket communication between browsers (TypeScript) and servers (Go). Since we exchanged data at pretty high frequency (around 100 messages/second), we didn’t want the overhead of the size and parse performance of JSON.

It is very easy to implement: just use a standard WebSocket server and client, set up both for binary communication and generate at build time the correct types for each one from the single proto file.

Also, we needed to scale WebSockets horizontally. For this, we decided to use Redis pub/sub to route messages between instances. Given that redis supports binary payloads, it can of course transfer our safe encoded protobuf messages!

Key Benefits We Achieved:

Reduced Bandwidth: Binary format minimized data transfer (by a lot, around 70% of the original message size)
Type Safety: End-to-end type checking prevented runtime errors
Performance: Fast serialization/deserialization improved real-time capabilities and frontend performance

Challenges and Considerations

While Protocol Buffers offered significant advantages in several use cases, we also encountered some challenges. The binary format of protobufs made manual debugging and inspecting data harder than with simply using JSON. Also, for our use case, we needed additional tooling and dependencies (protoc compiler and plugins) to generate models and deserializers for TypeScript on the frontend and Go on the backend, in contrast to JSON that is standard in both languages.

Conclusion

Protocol Buffers and gRPC have proven to be invaluable tools in our development stack. While JSON remains excellent for many use cases, protobuf's efficiency and type safety make it the superior choice for type safety, high-frequency, performance-critical applications. Our successful implementation in browser-server communication demonstrates its versatility beyond traditional backend services.

Share on socials: