Skip to content

protoc-gen-go: support generating structs with non-pointer message fields #1225

Open
@ajwerner

Description

@ajwerner

Is your feature request related to a problem? Please describe.

An important property of go is the ability to compose structs in order to reduce the number of allocation and improve access locality. This is at odds with languages like Java which (until project Valhalla) required that each instance live independently on the heap. Java's runtime specialization and JIT compilation helps it to mitigate that overhead. Protobuf fields (as of proto3, at least) can be omitted in serializations, a very important property for forwards compatibility. In order to support the optional nature of fields, protoc-gen-go generates fields as pointers. This (combined with storing unknown fields) allows a message to round-trip through deserialization back to serialization without losing information.

The language guide indicates specific zero values for types other than messages which permits the compiler to use non-pointer types for fields of those types.

For message fields, the field is not set. Its exact value is language-dependent. See the generated code guide for details.

Consider the following contrived example:

message Person {
  message Name {
    string first = 1;
    string last = 2;
  }

  Name name = 1;

  repeated Person friends = 2;
}

This will generate the below struct:

type Person struct {
        state         protoimpl.MessageState
        sizeCache     protoimpl.SizeCache
        unknownFields protoimpl.UnknownFields

        Name    *Person_Name `protobuf:"bytes,1,opt,name=name,proto3" json:"name,omitempty"`
        Friends []*Person    `protobuf:"bytes,2,rep,name=friends,proto3" json:"friends,omitempty"`
}

This issue considers two different cases in which the protoc-gen-go compiler generates struct pointer types as opposed to struct value types in generated structs: message fields, collection fields which utilize messages (repeated, map). The above generated struct is unfortunate for two reasons.

The first is that the Name field has a pointer. As discussed above, this, today, is important so that upon marshaling the library can know whether to serialize a zero-value message for the field or to not serialize the message at all. In the solutions section we'll discuss how we can store information as to whether this field is populated elsewhere in the Person struct.

The second is that the Friends field is a []*Person rather than []Person. The motivation for this decision is not clear to me. It is illegal for an entry in that slice to be nil for the purposes of serialization. Calling proto.Marshal with a nil entry will result in an error proto: repeated field Friends has nil element.

Describe the solution you'd like

The proposal here is for messages and fields to have an option, which for now we'll call (go.nullable) which can be set to false to indicate that pointers should not be generated. For repeated and map fields this should be trivial to implement and does not seem to break any semantics. For regular message fields, we'll need to have some mechanism to (a) determine at serialization time whether a field is zero-value but exists or should not be serialized at all (b) to mark a zero-valued field as not existing when creating a message in memory.

For (a) I propose that we augment the unexported state of messages to include a bitmask to indicate whether a field is missing. At serialization time, if the library detects that a message has a zero value, it will consult this bitmask. The previous sentence is intentionally imprecise. In terms of implementation details, in the new google.golang.org/protobuf world, we have some flexibility to control the implementation of protoreflect.Message. What is critical is that when a "root" message exposes its fields through the Range method, that it provide Value implementations which return the proper value for IsValid. This can be done by constructing the relevant Message values by telling them whether they are valid.

For (b) I propose generating a method for each message field which is generated as a non-pointer to set its validity. Note that the validity only applies if the value of the message is zero. This will mean that in this case, IsValid will be more expensive as it will need to determine whether all fields of a given message are zero before consulting its validity bit.

Describe alternatives you've considered

One discussion related to this has been to allow the protobuf library to pool allocations rather than to try to eliminate those allocations: #1192

Additional context

This topic has been discussed at least twice before. In #1142 the discussion hinged upon internals to the implementation rather than the more abstract problem. That discussion went nowhere. I re-raised this issue in the context of #52 (comment) which was the incorrect place to do so.

Another important thing to note is that the demand for this functionality is not theoretical. This is deemed as the most critical improvement of the now defunct https://github.com/gogo/protobuf/ library by engineers on https://github.com/cockroachdb/cockroach. Cockroach seems to not be the only project which is struggling to move off of that deprecated and unsupported fork (firecracker-microvm/firecracker-containerd#452).

Metadata

Metadata

Assignees

No one assigned

    Labels

    generator-proto-optioninvolves generators respecting a proto option to control generated source outputperformance

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions