Protocol Buffer Concepts: Difference between revisions
(49 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
=External= | |||
* https://protobuf.dev/programming-guides/proto3/ | |||
=Internal= | =Internal= | ||
* [[Protocol_Buffers#Subjects|Protocol Buffers]] | * [[Protocol_Buffers#Subjects|Protocol Buffers]] | ||
=Overview= | =Overview= | ||
The main use case for Protocol Buffers is sharing data across programming languages. Data can be written and serialized in one language, sent over the network and then and deserialized and interpreted in a different programming language | Protocol Buffers is a language-neutral, platform-neutral, extensible mechanism for serializing structured data. The schema is define once, and it used to generate code that serializes and deserializes schema-conforming data. The main use case for Protocol Buffers is sharing data across programming languages. Data can be written and serialized in one language, sent over the network and then and deserialized and interpreted in a different programming language. | ||
Protocol Buffers | Protocol Buffers offers the following advantages: | ||
* It allows defining types, and the data is fully typed when exchanged. We know the type of data in transit. | * It allows defining types, and the data is fully typed when exchanged. We know the type of data in transit. | ||
* Data is compressed automatically. | * Data is compressed automatically. | ||
* Serialization/deserialization is efficient. | * Serialization/deserialization is efficient. | ||
* Comes with a schema | * Comes with a schema, in form of the <code>.proto</code> files, which is used to generate code that writes and reads the data. | ||
* Schema supports embedded documentation. | * Schema supports embedded documentation. | ||
* Schema [[#Data_Evolution_with_Protocol_Buffers|can evolve over time in a safe manner]]. The | * Schema [[#Data_Evolution_with_Protocol_Buffers|can evolve over time in a safe manner]]. The implementations that rely on schema can stay backward and forward compatible. | ||
One of the disadvantages is that the data is encoded in a binary format, so it can't be visualized with a text editor. | One of the disadvantages is that the data is encoded in a binary format, so it can't be visualized with a text editor. | ||
The typical workflow consist in defining the data types, called [[#Message|messages]], then automatically generating the data structures to support and validate the data types, in the programming language of choice. In Go, the messages are represented as <code>[[Go_Structs#Overview|structs]]</code>. With the help of a framework like [[Grpc|gRPC]], which uses Protocol Buffers as default serialization format and native mechanism to exchange data, client and server code can also be automatically generated. This renders Protocol Buffer convenient for use as serialization format for [[Microservices|microservices]]. | The typical workflow consist in defining the data types, called [[#Message|messages]], in Protocol Buffer [[#proto_File|<code>.proto</code> files]], then automatically generating the data structures to support and validate the data types, in the programming language of choice. In Go, the messages are represented as <code>[[Go_Structs#Overview|structs]]</code>. With the help of a framework like [[Grpc|gRPC]], which uses Protocol Buffers as default serialization format and native mechanism to exchange data, client and server code can also be automatically generated. This renders Protocol Buffer convenient for use as serialization format for [[Microservices|microservices]]. | ||
The current version of the protocol is 3, released mid 2016. | The current version of the protocol is 3, released mid 2016. | ||
Line 20: | Line 22: | ||
=Message= | =Message= | ||
Protocol Buffers represent arbitrary '''data types''' as '''messages'''. A message has [[#Field|fields]]. One or more messages are declared in <code>.proto</code> text file with the following format: | |||
<syntaxhighlight lang='protobuf'> | <syntaxhighlight lang='protobuf'> | ||
syntax = "proto3"; | syntax = "proto3"; | ||
/* Person is used to identify | |||
a user in the system. | |||
*/ | |||
message Person { | message Person { | ||
int32 age = 1; | // the age as of person's creation | ||
int32 age = 1; | |||
string first_name = 2; | string first_name = 2; | ||
string last_name = 3; | string last_name = 3; | ||
bytes small_picture = 4; | bytes small_picture = 4; // a small JPEG file | ||
bool is_profile_verified = 5; | bool is_profile_verified = 5; | ||
float height = 6; | float height = 6; | ||
repeated string phone_numbers = 7; | |||
} | } | ||
</syntaxhighlight> | </syntaxhighlight> | ||
A '' | A message (type) can reference other type: | ||
<syntaxhighlight lang='protobuf'> | |||
message Something { | |||
... | |||
} | |||
message SomethingElse { | |||
Something something = 1; | |||
} | |||
</syntaxhighlight> | |||
Messages can be nested: | |||
<syntaxhighlight lang='protobuf'> | |||
message Something { | |||
... | |||
message SomethingElse { | |||
... | |||
} | |||
} | |||
</syntaxhighlight> | |||
=<span id='Field'></span>Fields= | =<span id='Field'></span>Fields= | ||
Line 44: | Line 70: | ||
<[[#Field_Type|field_type]]> <[[#Field_Name|field_name]]> = <[[#Field_Tag|field_tag]]>; | <[[#Field_Type|field_type]]> <[[#Field_Name|field_name]]> = <[[#Field_Tag|field_tag]]>; | ||
</font> | </font> | ||
Unless explicitly set up by the program, every field is initialized with the [[#Default_Value|default type value]] in serialized messages. | |||
==<span id='Field_Name'></span>Field Names== | ==<span id='Field_Name'></span>Field Names== | ||
Line 51: | Line 79: | ||
==<span id='Field_Type'></span>Field Types== | ==<span id='Field_Type'></span>Field Types== | ||
{{Internal|Protocol_Buffer_Types#Overview|Protocol Buffer Types}} | {{Internal|Protocol_Buffer_Types#Overview|Protocol Buffer Types}} | ||
::{| class="wikitable" style="text-align: left;" | |||
|- | |||
| <font type=menlo>[[Protocol_Buffer_Types#string|string]]</font> || || || || | |||
|- | |||
| <font type=menlo>[[Protocol_Buffer_Types#bool|bool]]</font> || || || || | |||
|- | |||
| <font type=menlo>[[Protocol_Buffer_Types#Bytes|bytes]]</font> || || || || | |||
|- | |||
| <font type=menlo>[[Protocol_Buffer_Types#int32|int32]]</font> || <font type=menlo>[[Protocol_Buffer_Types#uint32|uint32]]</font> || <font type=menlo>[[Protocol_Buffer_Types#sint32|sint32]]</font> || <font type=menlo>[[Protocol_Buffer_Types#fixed32|fixed32]]</font> || <font type=menlo>[[Protocol_Buffer_Types#sfixed32|sfixed32]]</font> | |||
|- | |||
| <font type=menlo>[[Protocol_Buffer_Types#int64|int64]]</font> || <font type=menlo>[[Protocol_Buffer_Types#uint64|uint64]]</font> || <font type=menlo>[[Protocol_Buffer_Types#sint64|sint64]]</font> || <font type=menlo>[[Protocol_Buffer_Types#fixed64|fixed64]]</font> || <font type=menlo>[[Protocol_Buffer_Types#sfixed64|sfixed64]]</font> | |||
|- | |||
| <font type=menlo>[[Protocol_Buffer_Types#float|float]]</font> || <font type=menlo>[[Protocol_Buffer_Types#double|double]]</font> || || || | |||
|- | |||
|<font type=menlo>[[Protocol_Buffer_Types#repeated|repeated]]</font> || || || || | |||
|- | |||
| <font type=menlo>[[Protocol_Buffer_Types#Enum|enum]]</font> || || || || | |||
|- | |||
|} | |||
==<span id='Field_Tag'></span><span id='Tag'></span>Field Tags== | ==<span id='Field_Tag'></span><span id='Tag'></span>Field Tags== | ||
A tag is an integral value between 1 and 2<sup>29</sup>-1 (536,879,911). The numbers between 19,000 and 19,999 | A tag is an integral value between 1 and 2<sup>29</sup>-1 (536,879,911). The numbers between 19,000 and 19,999 cannot be used. The tag of an <code>[[Protocol_Buffer_Types#enum|enum]]</code>'s first element must be 0. | ||
Tags from 1 to 15 use one byte of space, so use them for frequently populated fields. | Tags from 1 to 15 use one byte of space, so use them for frequently populated fields. The tags from 16 to 2047 use 2 bytes. | ||
= | ==Default Value== | ||
Every field has a default value, which is defined by the field's type. The default value is always used unless the field is explicitly set up by the program. There's no such a concept as "required" field or "optional" field. If the field is not explicitly set in the program, it takes the default value. | |||
==Comments== | |||
<syntaxhighlight lang='protobuf'> | |||
// this is a single-line comment | |||
/* This is | |||
a multi-line | |||
comment. | |||
*/ | |||
</syntaxhighlight> | </syntaxhighlight> | ||
=<span id='proto_File></span><tt>.proto</tt> Files= | |||
Multiple messages can be defined in the same <code>.proto</code> file. | |||
=Imports= | |||
Different messages can live in different <code>.proto</code> files and can be imported. This feature encourages modularization and sharing. | |||
=<span id='Data_Type_Code_Generation'></span>Go Code Generation= | |||
{{Internal|Protocol_Buffers_Data_Type_Go_Code_Generation#Overview|Go Code Generation}} | |||
=Data Evolution with Protocol Buffers= | =Data Evolution with Protocol Buffers= |
Revision as of 23:29, 7 May 2024
External
Internal
Overview
Protocol Buffers is a language-neutral, platform-neutral, extensible mechanism for serializing structured data. The schema is define once, and it used to generate code that serializes and deserializes schema-conforming data. The main use case for Protocol Buffers is sharing data across programming languages. Data can be written and serialized in one language, sent over the network and then and deserialized and interpreted in a different programming language.
Protocol Buffers offers the following advantages:
- It allows defining types, and the data is fully typed when exchanged. We know the type of data in transit.
- Data is compressed automatically.
- Serialization/deserialization is efficient.
- Comes with a schema, in form of the
.proto
files, which is used to generate code that writes and reads the data. - Schema supports embedded documentation.
- Schema can evolve over time in a safe manner. The implementations that rely on schema can stay backward and forward compatible.
One of the disadvantages is that the data is encoded in a binary format, so it can't be visualized with a text editor.
The typical workflow consist in defining the data types, called messages, in Protocol Buffer .proto
files, then automatically generating the data structures to support and validate the data types, in the programming language of choice. In Go, the messages are represented as structs
. With the help of a framework like gRPC, which uses Protocol Buffers as default serialization format and native mechanism to exchange data, client and server code can also be automatically generated. This renders Protocol Buffer convenient for use as serialization format for microservices.
The current version of the protocol is 3, released mid 2016.
Message
Protocol Buffers represent arbitrary data types as messages. A message has fields. One or more messages are declared in .proto
text file with the following format:
syntax = "proto3";
/* Person is used to identify
a user in the system.
*/
message Person {
// the age as of person's creation
int32 age = 1;
string first_name = 2;
string last_name = 3;
bytes small_picture = 4; // a small JPEG file
bool is_profile_verified = 5;
float height = 6;
repeated string phone_numbers = 7;
}
A message (type) can reference other type:
message Something {
...
}
message SomethingElse {
Something something = 1;
}
Messages can be nested:
message Something {
...
message SomethingElse {
...
}
}
Fields
Each field has a field type, a field name and a field tag
<field_type> <field_name> = <field_tag>;
Unless explicitly set up by the program, every field is initialized with the default type value in serialized messages.
Field Names
Fields names are not important when the message is serialized/deserialized. Only the tags matter.
Field Types
Field Tags
A tag is an integral value between 1 and 229-1 (536,879,911). The numbers between 19,000 and 19,999 cannot be used. The tag of an enum
's first element must be 0.
Tags from 1 to 15 use one byte of space, so use them for frequently populated fields. The tags from 16 to 2047 use 2 bytes.
Default Value
Every field has a default value, which is defined by the field's type. The default value is always used unless the field is explicitly set up by the program. There's no such a concept as "required" field or "optional" field. If the field is not explicitly set in the program, it takes the default value.
Comments
// this is a single-line comment
/* This is
a multi-line
comment.
*/
.proto Files
Multiple messages can be defined in the same .proto
file.
Imports
Different messages can live in different .proto
files and can be imported. This feature encourages modularization and sharing.
Go Code Generation
Data Evolution with Protocol Buffers
A message is actually a type. "Message" is used probably because the instances of the types defines as such are mainly intended to be sent over the wire.