Description
Is your feature request related to a problem? Please describe.
An important property of go is the ability to compose structs in order to reduce the number of allocation and improve access locality. This is at odds with languages like Java which (until project Valhalla) required that each instance live independently on the heap. Java's runtime specialization and JIT compilation helps it to mitigate that overhead. Protobuf fields (as of proto3, at least) can be omitted in serializations, a very important property for forwards compatibility. In order to support the optional nature of fields, protoc-gen-go
generates fields as pointers. This (combined with storing unknown fields) allows a message to round-trip through deserialization back to serialization without losing information.
The language guide indicates specific zero values for types other than messages which permits the compiler to use non-pointer types for fields of those types.
For message fields, the field is not set. Its exact value is language-dependent. See the generated code guide for details.
Consider the following contrived example:
message Person {
message Name {
string first = 1;
string last = 2;
}
Name name = 1;
repeated Person friends = 2;
}
This will generate the below struct:
type Person struct {
state protoimpl.MessageState
sizeCache protoimpl.SizeCache
unknownFields protoimpl.UnknownFields
Name *Person_Name `protobuf:"bytes,1,opt,name=name,proto3" json:"name,omitempty"`
Friends []*Person `protobuf:"bytes,2,rep,name=friends,proto3" json:"friends,omitempty"`
}
This issue considers two different cases in which the protoc-gen-go compiler generates struct pointer types as opposed to struct value types in generated structs: message fields, collection fields which utilize messages (repeated, map). The above generated struct is unfortunate for two reasons.
The first is that the Name
field has a pointer. As discussed above, this, today, is important so that upon marshaling the library can know whether to serialize a zero-value message for the field or to not serialize the message at all. In the solutions section we'll discuss how we can store information as to whether this field is populated elsewhere in the Person
struct.
The second is that the Friends
field is a []*Person
rather than []Person
. The motivation for this decision is not clear to me. It is illegal for an entry in that slice to be nil for the purposes of serialization. Calling proto.Marshal
with a nil entry will result in an error proto: repeated field Friends has nil element
.
Describe the solution you'd like
The proposal here is for messages and fields to have an option, which for now we'll call (go.nullable)
which can be set to false to indicate that pointers should not be generated. For repeated
and map
fields this should be trivial to implement and does not seem to break any semantics. For regular message fields, we'll need to have some mechanism to (a) determine at serialization time whether a field is zero-value but exists or should not be serialized at all (b) to mark a zero-valued field as not existing when creating a message in memory.
For (a) I propose that we augment the unexported state of messages to include a bitmask to indicate whether a field is missing. At serialization time, if the library detects that a message has a zero value, it will consult this bitmask. The previous sentence is intentionally imprecise. In terms of implementation details, in the new google.golang.org/protobuf
world, we have some flexibility to control the implementation of protoreflect.Message. What is critical is that when a "root" message exposes its fields through the Range
method, that it provide Value
implementations which return the proper value for IsValid
. This can be done by constructing the relevant Message
values by telling them whether they are valid.
For (b) I propose generating a method for each message field which is generated as a non-pointer to set its validity. Note that the validity only applies if the value of the message is zero. This will mean that in this case, IsValid
will be more expensive as it will need to determine whether all fields of a given message are zero before consulting its validity bit.
Describe alternatives you've considered
One discussion related to this has been to allow the protobuf library to pool allocations rather than to try to eliminate those allocations: #1192
Additional context
This topic has been discussed at least twice before. In #1142 the discussion hinged upon internals to the implementation rather than the more abstract problem. That discussion went nowhere. I re-raised this issue in the context of #52 (comment) which was the incorrect place to do so.
Another important thing to note is that the demand for this functionality is not theoretical. This is deemed as the most critical improvement of the now defunct https://github.com/gogo/protobuf/ library by engineers on https://github.com/cockroachdb/cockroach. Cockroach seems to not be the only project which is struggling to move off of that deprecated and unsupported fork (firecracker-microvm/firecracker-containerd#452).