Revision as of 00:22, 7 May 2024

Internal

Protocol Buffers

Overview

The main use case for Protocol Buffers is sharing data across programming languages. Data can be written and serialized in one language, sent over the network and then and deserialized and interpreted in a different programming language, without compatibility problems.

Protocol Buffers comes with the following advantages:

It allows defining types, and the data is fully typed when exchanged. We know the type of data in transit.
Data is compressed automatically.
Serialization/deserialization is efficient.
Comes with a schema (the .proto) file, which is used to generate code that writes and reads the data.
Schema supports embedded documentation.
Schema can evolve over time in a safe manner. The implementation can be maintained to be backward and forward compatible.

One of the disadvantages is that the data is encoded in a binary format, so it can't be visualized with a text editor.

The typical workflow consist in defining the data types, called messages, then automatically generating the data structures to support and validate the data types, in the programming language of choice. In Go, the messages are represented as structs. With the help of a framework like gRPC, which uses Protocol Buffers as default serialization format and native mechanism to exchange data, client and server code can also be automatically generated. This renders Protocol Buffer convenient for use as serialization format for microservices.

The current version of the protocol is 3, released mid 2016.

Message

A message is a data type in Protocol Buffers. One or more messages are declared in .proto text file with the following format:

syntax = "proto3";

/* Person is used to identify
   a user in the system.
*/
message Person {
  // the age as of person's creation
  int32 age = 1; 
  string first_name = 2;
  string last_name = 3;
  bytes small_picture = 4; // a small JPEG file
  bool is_profile_verified = 5;
  float height = 6;
  repeated string phone_numbers = 7;
}

A message has fields.

Fields

Each field has a field type, a field name and a field tag

 <field_type> <field_name> = <field_tag>;

Unless explicitly set up by the program, each field takes a default value.

Field Names

Fields names are not important when the message is serialized/deserialized. Only the tags matter.

Field Types

Protocol Buffer Types

string	bool
int32	uint32	sint32	fixed32	sfixed32
int64	uint64	sint64	fixed64	sfixed64
float	double
bytes

Bytes

 some_bytes = 1;

The default value is the empty byte array.

List (Array)

repeated <some_other_type> <field_name> = <tag>;

Example:

repeated int32 sizes = 1;
repeated string names = 1;

The default value is the empty list.

Enum

The default value is the first value.

Field Tags

A tag is an integral value between 1 and 2²⁹-1 (536,879,911). The numbers between 19,000 and 19,999 cannot be used.

Tags from 1 to 15 use one byte of space, so use them for frequently populated fields. The tags from 16 to 2047 use 2 bytes.

Default Value

Every field has a default value, which is defined by the field's type, unless the field is explicitly set up by the program. There's no such a concept as "required" field or "optional" field. If the field is not explicitly set in the program, it takes the default value.

Comments

// this is a single-line comment

/* This is
   a multi-line
   comment.
*/

Data Type Support Code Generation

The code that supports the data types defined in the .proto files are generated as follows:

Data Evolution with Protocol Buffers

A message is actually a type. "Message" is used probably because the instances of the types defines as such are mainly intended to be sent over the wire.

@@ Line 69: / Line 69: @@
 | <font type=menlo>[[Protocol_Buffer_Types#float|float]]</font> || <font type=menlo>[[Protocol_Buffer_Types#double|double]]</font> || || ||
 |-
-| <font type=menlo>[[Protocol_Buffer_Types#bytes|bytes]]</font> || || ||  ||
+| <font type=menlo>[[Protocol_Buffer_Types#Bytes|bytes]]</font> || || ||  ||
 |-
 |}

Protocol Buffer Concepts: Difference between revisions

Revision as of 00:22, 7 May 2024

Contents

Internal

Overview

Message

Fields

Field Names

Field Types

Bytes

List (Array)

Enum

Field Tags

Default Value

Comments

Data Type Support Code Generation

Data Evolution with Protocol Buffers

Navigation menu

Protocol Buffer Concepts: Difference between revisions

Revision as of 00:22, 7 May 2024

Internal

Overview

Message

Fields

Field Names

Field Types

Bytes

List (Array)

Enum

Field Tags

Default Value

Comments

Data Type Support Code Generation

Data Evolution with Protocol Buffers

Navigation menu

Search