-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unfork bindgen #124
Comments
Not sure I understand -- I always thought a struct was exactly identical to a class, just that it has public visibility by default? And I believe it's legal for forward declarations of a |
This is #54 - let's discuss over there. |
Thanks for the clarification -- interesting! |
For #414 and #102 many types have 'annotations' attached (the |
Note from #426: The Rust function signature for copy constructors and move constructors is the same, so bindgen needs to add an annotation that allows us to distinguish between them. |
I've renamed this issue to track the hopes of unforking I intend to work on this somewhat over the next couple of weeks to see how far I can get. Even if I can reduce the scale of the fork, that's still beneficial. Right now, the need to roll bindgen is the major ongoing maintenance headache for autocxx, and we need to zap that. Ways forward:
|
This is a major change to autocxx-bindgen which aims to improve maintainability by reducing divergence from upstream bindgen. Specifically, it: * Significantly reduces the textual diffence * Makes the remaining changes less invasive, such that merge conflicts should be easier to resolve * Redoes the changes such that they're more attuned to the current evolution of upstream bindgen, so it may be possible to upstream some or ideally all of these changes. (The ultimate goal is to unfork bindgen!) See google/autocxx#124 and rust-lang#2943 for the background here. Specifically this change: * Removes the #[bindgen_semantic_attributes] which were added to all sorts of items. Instead, * Much more is communicated via the existing ParseCallbacks mechansim. * In some cases, it's still necessary to annotate individual types - in this case we generate a newtype wrapper instead of attributes. This commit also re-enables the bindgen test suite. It does not yet add tests for all the above new functionality; that's yet to come.
autocxx-bindgen has just been changed substantially to have less dramatic forking from bindgen. This PR makes the corresponding changes in autocxx. It's not quite perfect yet (e.g. some O(n^2) behavior) but should be good enough to run tests. Part of #124.
ParseCallbacks previously reported structs but not enums. Enhance it to do so. At the moment, little information is provided about enums - but bindgen doesn't handle (rare) anonymous enums so this seems the right amount of information to report. At the moment, effectively this just provides a mapping between name and DiscoveredItemId. One of a number of PRs I'll be raising for google/autocxx#124. In future PRs I'll be hoping to add further callbacks which report more information based on DiscoveredItemId, so having the DiscoveredItemId for each enum is an important pre-requisite.
ParseCallbacks previously reported structs but not enums. Enhance it to do so. At the moment, little information is provided about enums - but bindgen doesn't handle (rare) anonymous enums so this seems the right amount of information to report. At the moment, effectively this just provides a mapping between name and DiscoveredItemId. One of a number of PRs I'll be raising for google/autocxx#124. In future PRs I'll be hoping to add further callbacks which report more information based on DiscoveredItemId, so having the DiscoveredItemId for each enum is an important pre-requisite.
These files aspects of bindgen behavior which may not be generally useful to most consumers but are more important to downstream postprocessors such as autocxx. One of them tests enums embedded within classes, and the other tests various types of C++ constructor. Part of google/autocxx#124.
With a new command-line option, this ensures that char16_t is distinct from uint16_t in generated bindings. On some platforms these are distinct types, so it can be important for downstream post processors to spot the difference. See the documentation on the new command-line option for expected behavior and usage here. Part of google/autocxx#124.
This makes two complementary improvements to the ParseCallbacks. The first is that Mods are now announced, as a new type of DiscoveredItem. The second is that the parentage of each item is announced. The parent of an item is often a mod (i.e. a C++ namespace) but not necessarily - it might be a struct within a struct, or similar. The reported information here is dependent on two pre-existing bindgen options: * whether to report C++ namespaces at all * whether to report inline namespaces conservatively. For that reason, the test suite gains two new tests. Part of google/autocxx#124
This makes two complementary improvements to the ParseCallbacks. The first is that Mods are now announced, as a new type of DiscoveredItem. The second is that the parentage of each item is announced. The parent of an item is often a mod (i.e. a C++ namespace) but not necessarily - it might be a struct within a struct, or similar. The reported information here is dependent on two pre-existing bindgen options: * whether to report C++ namespaces at all * whether to report inline namespaces conservatively. For that reason, the test suite gains two new tests. Part of google/autocxx#124
This change reports extra C++ information about items: * Whether methods are virtual or pure virtual or neither * Whether a method is a "special member", e.g. a move constructor * Whether a method is defaulted or deleted * C++ visibility (for structs, enums, unions and methods) It builds on top of rust-lang#3145. This PR is not yet ready for merge, since we need to merge rust-lang#3139 and then enhance the tests to cover those cases too. Part of google/autocxx#124
This change reports extra C++ information about items: * Whether methods are virtual or pure virtual or neither * Whether a method is a "special member", e.g. a move constructor * Whether a method is defaulted or deleted * C++ visibility (for structs, enums, unions and methods) It builds on top of rust-lang#3145. A follow up PR should enhance the tests once rust-lang#3139 is merged. Part of google/autocxx#124
Downstream postprocessors such as autocxx may want to use bindgen's representation of types to generate additional C++ code. bindgen is remarkably faithful at passing through enough information to make this possible - but for some C++ types, bindgen chooses to elide some template parameters. It's not safe for additional C++ code to be generated which references that type. This adds a callback by which such tools can recognize types where template parameters have been elided like this, so that extra C++ is avoided. This information could be provided in the existing `new_item_found` callback, but it's very niche and unlikely to be used by the majority of consumers, so a new callback is added instead. Part of google/autocxx#124
Downstream postprocessors such as autocxx may want to use bindgen's representation of types to generate additional C++ code. bindgen is remarkably faithful at passing through enough information to make this possible - but for some C++ types, bindgen chooses to elide some template parameters. It's not safe for additional C++ code to be generated which references that type. This adds a callback by which such tools can recognize types where template parameters have been elided like this, so that extra C++ is avoided. This information could be provided in the existing `new_item_found` callback, but it's very niche and unlikely to be used by the majority of consumers, so a new callback is added instead. An alternative approach here is to provide a mode to bindgen where it *always* uses all template params, by adding additional PhantomData fields to structs where those params are not currently used. This is being prototyped in google/autocxx#1425 but is unlikely to be successful, on the assumption that lots of the templated types can't actually be properly represented by bindgen/Rust, so the current strategy of discarding them is more likely to work in the broad strokes. Part of google/autocxx#124
As standard, bindgen treats C++ pointers and references the same in its output. Downstream postprocessors might want to treat these things differently; for example to use a CppRef<T> type for C++ references to encode the fact that they're not (usually) null. This PR emits references wrapped in a newtype wrapper called bindgen_marker_Reference<T> This behavior is enabled by an option flag; it isn't default. This type isn't actually defined in bindgen; users are expected to define or replace this during postprocessing (e.g. by using syn to reinterpret bindgen's output, or perhaps there are ways to make this useful using --raw-line or other means to inject a transparent newtype wrapper). The same approach is taken to types which bindgen chooses to make opaque. This is done in various circumstances but the most common case is for non-type template parameters. Alternative designs considered: * Apply an attribute to the function taking or returning such a param, e.g. #[bindgen_marker_takes_reference_arg(1)] fn takes_reference(a: u32, b: *const Foo) This avoids the marker type, but the problem here is a more invasive change to bindgen because type information can no longer be contained in a syn::Type; type metadata needs to be communicated around inside bindgen. * Make a ParseCallbacks call each time a type is opaque or a reference. This would work for standalone opaque types, e.g. pub struct Bar { pub _bindgen_opaque_blob: u8 } but the main case where we need these is where bindgen is using an opaque or reference type in the signature of some function, and there's no real opportunity to describe this in any kind of callback, since the callback would have to describe where exactly in the function signature the opaque or reference type has been used (and then we run into the same problems of passing this metadata around in the innards of bindgen). In order to maintain the current simple design where any C++ type is represented as a simple syn::Type, the best approach seems to be to do this wrapping in a fake marker type. Another design decision here was to represent an RValue reference as: TypeKind::Reference(_, ReferenceKind::RValue) rather than a new variant: TypeKind::RValueReference(_) In the majority of cases references are treated the same way whether they're rvalue or lvalue, so this was less invasive, but either is fine. Part of google/autocxx#124
As standard, bindgen treats C++ pointers and references the same in its output. Downstream postprocessors might want to treat these things differently; for example to use a CppRef<T> type for C++ references to encode the fact that they're not (usually) null. This PR emits references wrapped in a newtype wrapper called bindgen_marker_Reference<T> This behavior is enabled by an option flag; it isn't default. This type isn't actually defined in bindgen; users are expected to define or replace this during postprocessing (e.g. by using syn to reinterpret bindgen's output, or perhaps there are ways to make this useful using --raw-line or other means to inject a transparent newtype wrapper). The same approach is taken to types which bindgen chooses to make opaque. This is done in various circumstances but the most common case is for non-type template parameters. Alternative designs considered: * Apply an attribute to the function taking or returning such a param, e.g. #[bindgen_marker_takes_reference_arg(1)] fn takes_reference(a: u32, b: *const Foo) This avoids the marker type, but the problem here is a more invasive change to bindgen because type information can no longer be contained in a syn::Type; type metadata needs to be communicated around inside bindgen. * Make a ParseCallbacks call each time a type is opaque or a reference. This would work for standalone opaque types, e.g. pub struct Bar { pub _bindgen_opaque_blob: u8 } but the main case where we need these is where bindgen is using an opaque or reference type in the signature of some function, and there's no real opportunity to describe this in any kind of callback, since the callback would have to describe where exactly in the function signature the opaque or reference type has been used (and then we run into the same problems of passing this metadata around in the innards of bindgen). In order to maintain the current simple design where any C++ type is represented as a simple syn::Type, the best approach seems to be to do this wrapping in a fake marker type. Another design decision here was to represent an RValue reference as: TypeKind::Reference(_, ReferenceKind::RValue) rather than a new variant: TypeKind::RValueReference(_) In the majority of cases references are treated the same way whether they're rvalue or lvalue, so this was less invasive, but either is fine. Part of google/autocxx#124
As standard, bindgen treats C++ pointers and references the same in its output. Downstream postprocessors might want to treat these things differently; for example to use a CppRef<T> type for C++ references to encode the fact that they're not (usually) null. This PR emits references wrapped in a newtype wrapper called bindgen_marker_Reference<T> This behavior is enabled by an option flag; it isn't default. This type isn't actually defined in bindgen; users are expected to define or replace this during postprocessing (e.g. by using syn to reinterpret bindgen's output, or perhaps there are ways to make this useful using --raw-line or other means to inject a transparent newtype wrapper). The same approach is taken to types which bindgen chooses to make opaque. This is done in various circumstances but the most common case is for non-type template parameters. Alternative designs considered: * Apply an attribute to the function taking or returning such a param, e.g. #[bindgen_marker_takes_reference_arg(1)] fn takes_reference(a: u32, b: *const Foo) This avoids the marker type, but the problem here is a more invasive change to bindgen because type information can no longer be contained in a syn::Type; type metadata needs to be communicated around inside bindgen. * Make a ParseCallbacks call each time a type is opaque or a reference. This would work for standalone opaque types, e.g. pub struct Bar { pub _bindgen_opaque_blob: u8 } but the main case where we need these is where bindgen is using an opaque or reference type in the signature of some function, and there's no real opportunity to describe this in any kind of callback, since the callback would have to describe where exactly in the function signature the opaque or reference type has been used (and then we run into the same problems of passing this metadata around in the innards of bindgen). In order to maintain the current simple design where any C++ type is represented as a simple syn::Type, the best approach seems to be to do this wrapping in a fake marker type. Another design decision here was to represent an RValue reference as: TypeKind::Reference(_, ReferenceKind::RValue) rather than a new variant: TypeKind::RValueReference(_) In the majority of cases references are treated the same way whether they're rvalue or lvalue, so this was less invasive, but either is fine. Part of google/autocxx#124
Provide an option to add a line to every mod generated by bindgen. Part of google/autocxx#124, though this is only in fact necessary if we make the changes given in google/autocxx#1456.
This uses an existing callback to allow overriding of constructor name. Part of google/autocxx#124, though only necessary if we also do google/autocxx#1456.
This extends the existing discovery callback mechanism to report on functions and methods. At this stage, we don't say much about them, in order to be consistent with other discovery callbacks. Subsequent PRs will add extra callbacks to provide information especially about methods (virtualness, C++ visibility, etc.) Please request changes if you think that sort of information should arrive in these callbacks. Because methods are a fundamentally C++ thing, this splits the current ParseCallbacks test to cover both a .h and a .hpp header. Part of google/autocxx#124
Downstream code generators may need to know about the existence of certain C++ functions even if those functions can't be called. This is counterintuitive but: * A type can't even be allocated if it contains pure virtual functions or if its constructor is private. * A type may not be relocatable if it contains a deleted move constructor. This PR provides command line options to reveal the existence of these functions. Subsequent PRs will announce their special status using the ParseCallbacks mechanism. Part of google/autocxx#124.
Downstream code generators may need to know about the existence of certain C++ functions even if those functions can't be called. This is counterintuitive but: * A type can't even be allocated if it contains pure virtual functions or if its constructor is private. * A type may not be relocatable if it contains a deleted move constructor. This PR provides command line options to reveal the existence of these functions. Subsequent PRs will announce their special status using the ParseCallbacks mechanism. Part of google/autocxx#124.
This extends the existing discovery callback mechanism to report on functions and methods. At this stage, we don't say much about them, in order to be consistent with other discovery callbacks. Subsequent PRs will add extra callbacks to provide information especially about methods (virtualness, C++ visibility, etc.) Please request changes if you think that sort of information should arrive in these callbacks. Because methods are a fundamentally C++ thing, this splits the current ParseCallbacks test to cover both a .h and a .hpp header. Part of google/autocxx#124
No functional change - just deduplicating the logic which calls this callback, which will make it easier to make further changes in future. Part of google/autocxx#124
This adds more information to ParseCallbacks which indicates the location in the original source code at which a given item was found. This has proven to be useful in downstream code generators in providing diagnostics to explain why a given item can't be represented in Rust. (There are lots of reasons why this might not be the case - autocxx has around 100 which can be found here - https://github.com/google/autocxx/blob/d85eac76c9b3089d0d86249e857ff0e8c36b988f/engine/src/conversion/convert_error.rs#L39 - but irrespective of the specific reasons, it's useful to be able to point to the original location when emitting diagnostics). Should we make this a new callback or include this information within the existing callback? Pros of making it a new callback: * No compatibility breakage. Pros of including it in this existing callback: * No need to specify and test a policy about whether such callbacks always happen together, or may arrive individually * Easier for recipients (including bindgen's own test suite) to keep track of the source code location received. * Because we add new items to the DiscoveryItem enum anyway, we seem to have accepted it's OK to break compatibility in this callback (for now at least). Therefore I'm adding it as a parameter to the existing callback. If it's deemed acceptable to break compatibility in this way, I will follow the same thought process for some other changes too. Part of google/autocxx#124.
This extends the existing discovery callback mechanism to report on functions and methods. At this stage, we don't say much about them, in order to be consistent with other discovery callbacks. Subsequent PRs will add extra callbacks to provide information especially about methods (virtualness, C++ visibility, etc.) Please request changes if you think that sort of information should arrive in these callbacks. Because methods are a fundamentally C++ thing, this splits the current ParseCallbacks test to cover both a .h and a .hpp header. Part of google/autocxx#124
Downstream code generators may need to know about the existence of certain C++ functions even if those functions can't be called. This is counterintuitive but: * A type can't even be allocated if it contains pure virtual functions or if its constructor is private. * A type may not be relocatable if it contains a deleted move constructor. This PR provides command line options to reveal the existence of these functions. Subsequent PRs will announce their special status using the ParseCallbacks mechanism. Part of google/autocxx#124.
In its default behavior, bindgen does not emit information for any C++ operatorXYZ function (unless operatorXYZ is a valid identifier, which it isn't for C++ overloaded operators). This PR introduces a new command line option to represent all operators. This is not useful for those who directly consume the output of bindgen, but is useful for post-processors. Specifically, consumers will need to implement the callback to rename these items to something more useful. Part of google/autocxx#124
This extends the existing discovery callback mechanism to report on functions and methods. At this stage, we don't say much about them, in order to be consistent with other discovery callbacks. Subsequent PRs will add extra callbacks to provide information especially about methods (virtualness, C++ visibility, etc.) Please request changes if you think that sort of information should arrive in these callbacks. Because methods are a fundamentally C++ thing, this splits the current ParseCallbacks test to cover both a .h and a .hpp header. Part of google/autocxx#124
bindgen doesn't pass on quite all the information we need in order to be able to generate autocxx bindings. Options in the future might be to fork bindgen, ask very nicely if they mind us upstreaming patches to add metadata, or switch away from bindgen and use llvm directly.
In any case this bug will be a live tracker of all known cases where we don't have all the information that we require about the underlying C++.
struct
s in the bindgen output. When we then generate extra C++ we have to plump for one or the other. In some ABIs this will result in a binary incompatibility or failure to compile. Per Classes result in build warnings; build failure on Windows #54.a(uint32_t)
anda(uint8_t)
will be output asa
anda1
. There is no way for us to distinguish that from methods really calleda
anda1
. Worse, this is across all the types in a namespace, soMyStruct::a
andMyOtherStruct::a
becomeMyStruct::a
andMyOtherStruct::a1
.struct
nested inside anotherstruct
is not noted in the bindgen output, so we can't generate compatible C++ code - Nested types don't work #115.The text was updated successfully, but these errors were encountered: