-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat(grpc): Add protobuf
codegen
#2320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
compiler/README.md
Outdated
@@ -0,0 +1,29 @@ | |||
## Usage example |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we rename this containg folder? Compiler seems quite ambiguous?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
d39e0f5
to
de384b4
Compare
9e45ab4
to
dfc5213
Compare
@dfawley please have a look |
grpc/src/codec/protobuf.rs
Outdated
let serialized = item.serialize().map_err(from_decode_error)?; | ||
buf.put_slice(&serialized.as_slice()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LucioFranco I can think of the following ways to avoid a copy here:
- Request the protobuf crate to expose an API that allows deserialization into a destination buffer.
- Change the codec API to return a
Vec<u8>
instead of accepting the destination as a parameter. Tonic can then create anEncodedBuf
from the returnedVec<u8>
. - Make the constructor of
EncodedBuf
public. This would allow a codec to create its ownEncodedBuf
andstd::mem::swap
it with the one passed to theencode
method, avoiding a copy.
I think option 3 is a reasonable workaround that doesn't require API changes. Wanted to get your thoughts on this.
PLUGIN_PATH="$(pwd)/bazel-bin/src/protoc-gen-rust-grpc" | ||
|
||
# Run protoc with the Rust and Rust gRPC plugins | ||
protoc \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should probably advise users to use grpc_build::CodeGen
and/or protobuf_codegen::CodeGen
instead of directly invoking protoc. If you directly invoke protoc and check in the generated code, then it's at risk of becoming stale if/when you upgrade your protobuf dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a note discouraging the direct use of the plugin. I've also added a README in the tonic_protobuf_build
(formerly grpc_build
) crate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not opposed to recommending it that way, but is this a real concern? I thought the protobuf library would support older generated code, but not vice-versa? I'm fairly sure that's true for Go at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are going to make an exception for some libraries, but we do generally plan to require the generated code to match the runtime version. In most other languages we are more relaxed about this but we have found that it's more difficult to maintain protobuf when we have to accommodate stale gencode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's quite a lot here, but it mostly looks good to me, pending some comments that are mostly nits. I loosely skimmed the interop code since you said that was copied from the other tonic tests. And my C++ is very rusty so it might be good to get a review from someone on the gRPC-C++ team. Maybe @drfloob ?
PLUGIN_PATH="$(pwd)/bazel-bin/src/protoc-gen-rust-grpc" | ||
|
||
# Run protoc with the Rust and Rust gRPC plugins | ||
protoc \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not opposed to recommending it that way, but is this a real concern? I thought the protobuf library would support older generated code, but not vice-versa? I'm fairly sure that's true for Go at least.
Co-authored-by: Doug Fawley <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't find any logic errors in the C++ code. There are quite a few Google C++ style guide issues, though I'm not sure they're enforced for this codebase. Many of them are good practices regardless, so I've left suggestions.
#ifndef NET_GRPC_COMPILER_RUST_GENERATOR_H_ | ||
#define NET_GRPC_COMPILER_RUST_GENERATOR_H_ | ||
|
||
#include "absl/log/absl_log.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The project could benefit from a C++ formatter & linter. System included should precede other libraries' headers. See https://google.github.io/styleguide/cppguide.html#Names_and_Order_of_Includes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Acknowledged. I'll raise a follow-up PR with scripts to run clang-format and buildifier, both locally and on CI. I don't want to make this PR bigger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@drfloob Thanks for the thorough review — I've addressed your comments.
#ifndef NET_GRPC_COMPILER_RUST_GENERATOR_H_ | ||
#define NET_GRPC_COMPILER_RUST_GENERATOR_H_ | ||
|
||
#include "absl/log/absl_log.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Acknowledged. I'll raise a follow-up PR with scripts to run clang-format and buildifier, both locally and on CI. I don't want to make this PR bigger.
@arjan-bal we need to get CI passing before we can merge. I think we can just bump the msrv to 1.79 but lets do that in a separate PR. As for the semver CI job I think we can just add a publish = false for now to that crate and it should resolve the issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, lets get these last few things addressed and CI passing then lets merge.
# Share repository cache between workflows. | ||
repository-cache: true | ||
module-root: ./protoc-gen-rust-grpc | ||
- name: Build protoc plugin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this step take a long time? Generally, speaking I would rather that we build the plugin in each test job to ensure we are building the latest code and there is no caching issues. Could you explain the reasoning for breaking it up like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment in the yaml. Building the protoc plugin from scratch takes 6–14 minutes, depending on the OS. This delays the execution of workflows that use the plugin in build.rs files.
Example workflow execution: https://github.com/hyperium/tonic/actions/runs/16217230891/job/45789317103
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay yeah if this works its fine for now, we probably want to improve this in the future..
.unwrap(); | ||
} | ||
let crate_mapping_path = if self.generate_message_code { | ||
self.output_dir.join("crate_mapping.txt") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to have multiple codegens override the same file? Do we need to add a warning incase say you do that by accident and its surprising?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to have multiple codegens override the same file?
Yes it's possible. A bigger problem when having multiple protoc
invocations using the same output directory is the generated.rs
file getting overwritten. The generated.rs
file exports the generated message symbols for all the proto files that were part of the input protos list.
Do we need to add a warning incase say you do that by accident and its surprising?
@acozzette wanted to get your views on this. Have you considered this issue?
}; | ||
|
||
// Generate the service code. | ||
let mut cmd = std::process::Command::new("protoc"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably want to allow users to set the path to protoc via an env var as well. We do this in the prost code so we should follow how it does that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are the relevant prost docs: https://docs.rs/prost-build/latest/prost_build/index.html#sourcing-protoc
The message codegen also doesn't support setting the path to protoc using an env var. I think protobuf-codegen
should support this before gRPC. Do you want me to file a feature request for protobuf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that would be good I think it shouldn't be too contentious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type Error = Status; | ||
|
||
fn encode(&mut self, item: Self::Item, buf: &mut EncodeBuf<'_>) -> Result<(), Self::Error> { | ||
let serialized = item.serialize().map_err(from_decode_error)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add a todo here mentioning that we want to figure out how to remove this extra copy that happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment. Do you want me to file an issue for the protobuf-rust?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might be good, we discussed this with @acozzette and co last week but probably good to track it somewhere as well. I would be okay if this was also just a tonic issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened a tonic issue with context about why protobuf doesn't provide the required API: #2345
let item = U::parse(slice).map_err(from_decode_error)?; | ||
buf.advance(slice.len()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its possible that the incoming buf slice is larger than the actual message, we should see if we can pull the len from the decoded amount of bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding, each field in an encoded proto message either has a length prefix or a fixed length. The parser should continue reading fields until it reaches the end of the buffer. Fields not present in the proto descriptor should be ignored to ensure forward compatibility—for example, when a new field is added to the proto, but the receiver is using an older version. A parsing error should be returned if the bytes cannot be parsed. Since there is no length prefix for the entire message, the parser must consume all the bytes it is given.
I think its possible that the incoming buf slice is larger than the actual message
Based on the above, I believe it may be incorrect to pass a slice to a parser but not consume the entire buffer. Can you clarify when this situation arises?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm that is a good question, I will look into it, maybe we also want to actually provide some type safe wrapper that enforces this invariant. I wouldn't worry about this for now I think your assumption is probably right on what we actually do.
Co-authored-by: Lucio Franco <[email protected]>
This PR includes the following:
grpc-build
crate that helps generate code during cargo builds.To keep the CI fast, the protoc plugin binary for each OS is cached. The plugin binaries are re-built only if the plugin's code or build files are updated.
Notes