Skip to content

feat(grpc): Add protobuf codegen #2320

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 28 commits into from
Jul 30, 2025
Merged

Conversation

arjan-bal
Copy link
Collaborator

@arjan-bal arjan-bal commented Jul 2, 2025

This PR includes the following:

  1. A tonic codec that uses protobuf-rust.
  2. A protoc plugin that generates tonic compatible client code.
  3. A grpc-build crate that helps generate code during cargo builds.
  4. Interop tests that use the new client codgen running against a Go server. This is used as an integration test for the new codegen.

To keep the CI fast, the protoc plugin binary for each OS is cached. The plugin binaries are re-built only if the plugin's code or build files are updated.

Notes

  • The codegen is generating code for tonic. It will be changed when the gRPC API design is finalized and the channel implementation is complete.

@arjan-bal arjan-bal changed the title gRPC client codegen [WIP] gRPC client codegen Jul 2, 2025
@arjan-bal arjan-bal changed the title [WIP] gRPC client codegen gRPC client codegen Jul 2, 2025
@arjan-bal arjan-bal marked this pull request as draft July 2, 2025 18:33
@LucioFranco LucioFranco changed the title gRPC client codegen feat(grpc): Add protobuf codegen Jul 2, 2025
@arjan-bal arjan-bal force-pushed the grpc-codegen branch 4 times, most recently from d39e0f5 to de384b4 Compare July 4, 2025 08:54
@arjan-bal arjan-bal force-pushed the grpc-codegen branch 17 times, most recently from 9e45ab4 to dfc5213 Compare July 4, 2025 22:43
Copy link
Member

@LucioFranco LucioFranco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, lets get these last few things addressed and CI passing then lets merge.

# Share repository cache between workflows.
repository-cache: true
module-root: ./protoc-gen-rust-grpc
- name: Build protoc plugin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this step take a long time? Generally, speaking I would rather that we build the plugin in each test job to ensure we are building the latest code and there is no caching issues. Could you explain the reasoning for breaking it up like this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment in the yaml. Building the protoc plugin from scratch takes 6–14 minutes, depending on the OS. This delays the execution of workflows that use the plugin in build.rs files.

Example workflow execution: https://github.com/hyperium/tonic/actions/runs/16217230891/job/45789317103

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay yeah if this works its fine for now, we probably want to improve this in the future..

.unwrap();
}
let crate_mapping_path = if self.generate_message_code {
self.output_dir.join("crate_mapping.txt")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to have multiple codegens override the same file? Do we need to add a warning incase say you do that by accident and its surprising?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to have multiple codegens override the same file?

Yes it's possible. A bigger problem when having multiple protoc invocations using the same output directory is the generated.rs file getting overwritten. The generated.rs file exports the generated message symbols for all the proto files that were part of the input protos list.

Do we need to add a warning incase say you do that by accident and its surprising?

@acozzette wanted to get your views on this. Have you considered this issue?

};

// Generate the service code.
let mut cmd = std::process::Command::new("protoc");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to allow users to set the path to protoc via an env var as well. We do this in the prost code so we should follow how it does that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are the relevant prost docs: https://docs.rs/prost-build/latest/prost_build/index.html#sourcing-protoc

The message codegen also doesn't support setting the path to protoc using an env var. I think protobuf-codegen should support this before gRPC. Do you want me to file a feature request for protobuf?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that would be good I think it shouldn't be too contentious.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type Error = Status;

fn encode(&mut self, item: Self::Item, buf: &mut EncodeBuf<'_>) -> Result<(), Self::Error> {
let serialized = item.serialize().map_err(from_decode_error)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a todo here mentioning that we want to figure out how to remove this extra copy that happens.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment. Do you want me to file an issue for the protobuf-rust?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might be good, we discussed this with @acozzette and co last week but probably good to track it somewhere as well. I would be okay if this was also just a tonic issue.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened a tonic issue with context about why protobuf doesn't provide the required API: #2345

Comment on lines +109 to +110
let item = U::parse(slice).map_err(from_decode_error)?;
buf.advance(slice.len());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its possible that the incoming buf slice is larger than the actual message, we should see if we can pull the len from the decoded amount of bytes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding, each field in an encoded proto message either has a length prefix or a fixed length. The parser should continue reading fields until it reaches the end of the buffer. Fields not present in the proto descriptor should be ignored to ensure forward compatibility—for example, when a new field is added to the proto, but the receiver is using an older version. A parsing error should be returned if the bytes cannot be parsed. Since there is no length prefix for the entire message, the parser must consume all the bytes it is given.

I think its possible that the incoming buf slice is larger than the actual message

Based on the above, I believe it may be incorrect to pass a slice to a parser but not consume the entire buffer. Can you clarify when this situation arises?

Copy link
Member

@LucioFranco LucioFranco Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm that is a good question, I will look into it, maybe we also want to actually provide some type safe wrapper that enforces this invariant. I wouldn't worry about this for now I think your assumption is probably right on what we actually do.

@arjan-bal arjan-bal requested a review from LucioFranco July 17, 2025 18:00
Copy link
Member

@LucioFranco LucioFranco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I tested this locally, the absolute paths is what solved it for me. We can merge this but I want to merge #2321 first and then rebase this ontop of that before we merge this.

copybara-service bot pushed a commit to protocolbuffers/protobuf that referenced this pull request Jul 24, 2025
This check was meant to enforce that the version of the protobuf crate exactly
matches the version of the codegen crate, but this is not strictly necessary.
Tonic would also like to re-export our crate from the `tonic-protobuf` crate,
and the `DEP_UPB_VERSION` check is causing problems with that since the
environment variable is set only for crates that directly depend on the
`protobuf` crate (see
[discussion](hyperium/tonic#2320 (comment))).

While I was at it I also removed the code in the protobuf build script that
sets the `DEP_UPB_INCLUDE` variable, which is unnecessary now that we no longer
generate any C code which would need to know where to find upb's headers.

PiperOrigin-RevId: 786745312
copybara-service bot pushed a commit to protocolbuffers/protobuf that referenced this pull request Jul 24, 2025
This check was meant to enforce that the version of the protobuf crate exactly
matches the version of the codegen crate, but this is not strictly necessary.
Tonic would also like to re-export our crate from the `tonic-protobuf` crate,
and the `DEP_UPB_VERSION` check is causing problems with that since the
environment variable is set only for crates that directly depend on the
`protobuf` crate (see
[discussion](hyperium/tonic#2320 (comment))).

While I was at it I also removed the code in the protobuf build script that
sets the `DEP_UPB_INCLUDE` variable, which is unnecessary now that we no longer
generate any C code which would need to know where to find upb's headers.

PiperOrigin-RevId: 786745312
copybara-service bot pushed a commit to protocolbuffers/protobuf that referenced this pull request Jul 24, 2025
This check was meant to enforce that the version of the protobuf crate exactly
matches the version of the codegen crate, but this is not strictly necessary.
Tonic would also like to re-export our crate from the `tonic-protobuf` crate,
and the `DEP_UPB_VERSION` check is causing problems with that since the
environment variable is set only for crates that directly depend on the
`protobuf` crate (see
[discussion](hyperium/tonic#2320 (comment))).

While I was at it I also removed the code in the protobuf build script that
sets the `DEP_UPB_INCLUDE` variable, which is unnecessary now that we no longer
generate any C code which would need to know where to find upb's headers.

PiperOrigin-RevId: 786760264
acozzette added a commit to acozzette/protobuf that referenced this pull request Jul 24, 2025
This check was meant to enforce that the version of the protobuf crate exactly
matches the version of the codegen crate, but this is not strictly necessary.
Tonic would also like to re-export our crate from the `tonic-protobuf` crate,
and the `DEP_UPB_VERSION` check is causing problems with that since the
environment variable is set only for crates that directly depend on the
`protobuf` crate (see
[discussion](hyperium/tonic#2320 (comment))).

While I was at it I also removed the code in the protobuf build script that
sets the `DEP_UPB_INCLUDE` variable, which is unnecessary now that we no longer
generate any C code which would need to know where to find upb's headers.

PiperOrigin-RevId: 786760264
acozzette added a commit to protocolbuffers/protobuf that referenced this pull request Jul 24, 2025
This check was meant to enforce that the version of the protobuf crate exactly
matches the version of the codegen crate, but this is not strictly necessary.
Tonic would also like to re-export our crate from the `tonic-protobuf` crate,
and the `DEP_UPB_VERSION` check is causing problems with that since the
environment variable is set only for crates that directly depend on the
`protobuf` crate (see
[discussion](hyperium/tonic#2320 (comment))).

While I was at it I also removed the code in the protobuf build script that
sets the `DEP_UPB_INCLUDE` variable, which is unnecessary now that we no longer
generate any C code which would need to know where to find upb's headers.

PiperOrigin-RevId: 786760264
@arjan-bal
Copy link
Collaborator Author

Hi @dfawley, @LucioFranco can you please have another look?

@dfawley dfawley merged commit d244567 into hyperium:master Jul 30, 2025
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants