Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unfork bindgen #124

Open
8 tasks
adetaylor opened this issue Nov 24, 2020 · 8 comments
Open
8 tasks

Unfork bindgen #124

adetaylor opened this issue Nov 24, 2020 · 8 comments
Labels
architectural Architectural improvement

Comments

@adetaylor
Copy link
Collaborator

adetaylor commented Nov 24, 2020

bindgen doesn't pass on quite all the information we need in order to be able to generate autocxx bindings. Options in the future might be to fork bindgen, ask very nicely if they mind us upstreaming patches to add metadata, or switch away from bindgen and use llvm directly.

In any case this bug will be a live tracker of all known cases where we don't have all the information that we require about the underlying C++.

@martinboehme
Copy link
Collaborator

  • Classes vs structs. Both present as structs in the bindgen output. When we then generate extra C++ we have to plump for one or the other. In some ABIs this will result in a binary incompatibility or failure to compile.

Not sure I understand -- I always thought a struct was exactly identical to a class, just that it has public visibility by default? And I believe it's legal for forward declarations of a struct to use class and vice versa (see this StackOverflow answer). I'm pretty sure there isn't an ABI difference either. Or am I misunderstanding what you're referring to here?

@adetaylor
Copy link
Collaborator Author

This is #54 - let's discuss over there.

@martinboehme
Copy link
Collaborator

Thanks for the clarification -- interesting!

@adetaylor
Copy link
Collaborator Author

For #414 and #102 many types have 'annotations' attached (the RustTy type in our fork of bindgen). Such annotations are currently discarded using an ignore_annotations method. Each time that happens, it's probably a bug. For example this means we would try (and fail) to generate POD struct fields using types with template parameters which bindgen has dropped. Once a plan presents itself here, we should ensure we never drop information like this.

@martinboehme
Copy link
Collaborator

Note from #426: The Rust function signature for copy constructors and move constructors is the same, so bindgen needs to add an annotation that allows us to distinguish between them.

@bsilver8192
Copy link
Contributor

#799 is also on this list, and I'm pretty sure #815 is too.

@adetaylor
Copy link
Collaborator Author

I've renamed this issue to track the hopes of unforking autocxx-bindgen and using plain old bindgen here.

I intend to work on this somewhat over the next couple of weeks to see how far I can get. Even if I can reduce the scale of the fork, that's still beneficial. Right now, the need to roll bindgen is the major ongoing maintenance headache for autocxx, and we need to zap that.

Ways forward:

  • bindgen's ParseCallbacks now receive quite a bit more information than they used to. Specifically, new_item_found probably enables us to eliminate a few of the bindgen_original_name annotations already, and I hope bindgen will accept PRs to expand the callbacks here — it seems that this callbacks mechanism is exactly intended for Fancy Postprocessing Use-Cases like ours.
  • A fair amount of the forking is attempting to identify unused template parameters. Perhaps bindgen could add a mode where it, err, doesn't — it just ensures all template parameters are used, if necessary by adding in extra PhantomData. I don't know if this is actually possible but this would greatly increase the percentage of C++ types which can be handled by autocxx. I'm trying this in Attempts to include all type params #1425.
  • I had a chat with one of the bindgen maintainers, and it resulted in this issue - they were open in principle to the idea of "pluggable output backends". Maybe that's not necessary given the callback mechanism.

adetaylor added a commit to adetaylor/rust-bindgen that referenced this issue Feb 14, 2025
This is a major change to autocxx-bindgen which aims to improve maintainability
by reducing divergence from upstream bindgen. Specifically, it:

* Significantly reduces the textual diffence
* Makes the remaining changes less invasive, such that merge conflicts should
  be easier to resolve
* Redoes the changes such that they're more attuned to the current
  evolution of upstream bindgen, so it may be possible to upstream some or
  ideally all of these changes. (The ultimate goal is to unfork bindgen!)

See google/autocxx#124 and
rust-lang#2943 for the background here.

Specifically this change:
* Removes the #[bindgen_semantic_attributes] which were added to all
  sorts of items. Instead,
* Much more is communicated via the existing ParseCallbacks mechansim.
* In some cases, it's still necessary to annotate individual types -
  in this case we generate a newtype wrapper instead of attributes.

This commit also re-enables the bindgen test suite. It does not yet add
tests for all the above new functionality; that's yet to come.
adetaylor added a commit that referenced this issue Feb 14, 2025
autocxx-bindgen has just been changed substantially to have less dramatic
forking from bindgen.  This PR makes the corresponding changes in autocxx. It's
not quite perfect yet (e.g. some O(n^2) behavior) but should be good enough to
run tests.

Part of #124.
adetaylor added a commit to adetaylor/rust-bindgen that referenced this issue Feb 16, 2025
ParseCallbacks previously reported structs but not enums. Enhance it to do so.
At the moment, little information is provided about enums - but bindgen doesn't
handle (rare) anonymous enums so this seems the right amount of information to
report. At the moment, effectively this just provides a mapping between name
and DiscoveredItemId.

One of a number of PRs I'll be raising for google/autocxx#124.

In future PRs I'll be hoping to add further callbacks which report more
information based on DiscoveredItemId, so having the DiscoveredItemId
for each enum is an important pre-requisite.
adetaylor added a commit to adetaylor/rust-bindgen that referenced this issue Feb 16, 2025
ParseCallbacks previously reported structs but not enums. Enhance it to do so.
At the moment, little information is provided about enums - but bindgen doesn't
handle (rare) anonymous enums so this seems the right amount of information to
report. At the moment, effectively this just provides a mapping between name
and DiscoveredItemId.

One of a number of PRs I'll be raising for google/autocxx#124.

In future PRs I'll be hoping to add further callbacks which report more
information based on DiscoveredItemId, so having the DiscoveredItemId
for each enum is an important pre-requisite.
adetaylor added a commit to adetaylor/rust-bindgen that referenced this issue Feb 16, 2025
These files aspects of bindgen behavior which may not be generally useful to
most consumers but are more important to downstream postprocessors such as
autocxx.

One of them tests enums embedded within classes, and the other tests various
types of C++ constructor.

Part of google/autocxx#124.
adetaylor added a commit to adetaylor/rust-bindgen that referenced this issue Feb 16, 2025
With a new command-line option, this ensures that char16_t
is distinct from uint16_t in generated bindings. On some platforms
these are distinct types, so it can be important for downstream
post processors to spot the difference.

See the documentation on the new command-line option for
expected behavior and usage here.

Part of google/autocxx#124.
adetaylor added a commit to adetaylor/rust-bindgen that referenced this issue Feb 21, 2025
This makes two complementary improvements to the ParseCallbacks.
The first is that Mods are now announced, as a new type of
DiscoveredItem. The second is that the parentage of each item is
announced. The parent of an item is often a mod (i.e. a
C++ namespace) but not necessarily - it might be a struct within
a struct, or similar.

The reported information here is dependent on two pre-existing
bindgen options:
* whether to report C++ namespaces at all
* whether to report inline namespaces conservatively.

For that reason, the test suite gains two new tests.

Part of google/autocxx#124
adetaylor added a commit to adetaylor/rust-bindgen that referenced this issue Feb 21, 2025
This makes two complementary improvements to the ParseCallbacks.
The first is that Mods are now announced, as a new type of
DiscoveredItem. The second is that the parentage of each item is
announced. The parent of an item is often a mod (i.e. a
C++ namespace) but not necessarily - it might be a struct within
a struct, or similar.

The reported information here is dependent on two pre-existing
bindgen options:
* whether to report C++ namespaces at all
* whether to report inline namespaces conservatively.

For that reason, the test suite gains two new tests.

Part of google/autocxx#124
adetaylor added a commit to adetaylor/rust-bindgen that referenced this issue Feb 21, 2025
This change reports extra C++ information about items:

* Whether methods are virtual or pure virtual or neither
* Whether a method is a "special member", e.g. a move constructor
* Whether a method is defaulted or deleted
* C++ visibility (for structs, enums, unions and methods)

It builds on top of rust-lang#3145.

This PR is not yet ready for merge, since we need to merge rust-lang#3139
and then enhance the tests to cover those cases too.

Part of google/autocxx#124
adetaylor added a commit to adetaylor/rust-bindgen that referenced this issue Feb 21, 2025
This change reports extra C++ information about items:

* Whether methods are virtual or pure virtual or neither
* Whether a method is a "special member", e.g. a move constructor
* Whether a method is defaulted or deleted
* C++ visibility (for structs, enums, unions and methods)

It builds on top of rust-lang#3145.

A follow up PR should enhance the tests once rust-lang#3139 is merged.

Part of google/autocxx#124
adetaylor added a commit to adetaylor/rust-bindgen that referenced this issue Feb 22, 2025
Downstream postprocessors such as autocxx may want to use bindgen's
representation of types to generate additional C++ code. bindgen is remarkably
faithful at passing through enough information to make this possible - but for
some C++ types, bindgen chooses to elide some template parameters. It's not
safe for additional C++ code to be generated which references that type.

This adds a callback by which such tools can recognize types where template
parameters have been elided like this, so that extra C++ is avoided.

This information could be provided in the existing
`new_item_found` callback, but it's very niche and unlikely to be used by
the majority of consumers, so a new callback is added instead.

Part of google/autocxx#124
adetaylor added a commit to adetaylor/rust-bindgen that referenced this issue Feb 22, 2025
Downstream postprocessors such as autocxx may want to use bindgen's
representation of types to generate additional C++ code. bindgen is remarkably
faithful at passing through enough information to make this possible - but for
some C++ types, bindgen chooses to elide some template parameters. It's not
safe for additional C++ code to be generated which references that type.

This adds a callback by which such tools can recognize types where template
parameters have been elided like this, so that extra C++ is avoided.

This information could be provided in the existing
`new_item_found` callback, but it's very niche and unlikely to be used by
the majority of consumers, so a new callback is added instead.

An alternative approach here is to provide a mode to bindgen where it
*always* uses all template params, by adding additional PhantomData
fields to structs where those params are not currently used.
This is being prototyped in google/autocxx#1425
but is unlikely to be successful, on the assumption that lots of the
templated types can't actually be properly represented by bindgen/Rust,
so the current strategy of discarding them is more likely to work
in the broad strokes.

Part of google/autocxx#124
adetaylor added a commit to adetaylor/rust-bindgen that referenced this issue Feb 24, 2025
As standard, bindgen treats C++ pointers and references the same in its
output. Downstream postprocessors might want to treat these things
differently; for example to use a CppRef<T> type for C++ references
to encode the fact that they're not (usually) null.

This PR emits references wrapped in a newtype wrapper called
  bindgen_marker_Reference<T>
This behavior is enabled by an option flag; it isn't default.
This type isn't actually defined in bindgen; users are expected
to define or replace this during postprocessing (e.g. by using
syn to reinterpret bindgen's output, or perhaps there are ways
to make this useful using --raw-line or other means to inject
a transparent newtype wrapper).

The same approach is taken to types which bindgen chooses to make
opaque. This is done in various circumstances but the most common
case is for non-type template parameters.

Alternative designs considered:
* Apply an attribute to the function taking or returning
  such a param, e.g.
    #[bindgen_marker_takes_reference_arg(1)]
    fn takes_reference(a: u32, b: *const Foo)
  This avoids the marker type, but the problem here is a
  more invasive change to bindgen because type information
  can no longer be contained in a syn::Type; type metadata
  needs to be communicated around inside bindgen.

* Make a ParseCallbacks call each time a type is opaque
  or a reference. This would work for standalone opaque types,
  e.g.
    pub struct Bar {
     pub _bindgen_opaque_blob: u8
    }
  but the main case where we need these is where bindgen is
  using an opaque or reference type in the signature of some
  function, and there's no real opportunity to describe this
  in any kind of callback, since the callback would have to
  describe where exactly in the function signature the
  opaque or reference type has been used (and then we run into
  the same problems of passing this metadata around in the
  innards of bindgen).

In order to maintain the current simple design where any C++
type is represented as a simple syn::Type, the best approach
seems to be to do this wrapping in a fake marker type.

Another design decision here was to represent an RValue
reference as:
  TypeKind::Reference(_, ReferenceKind::RValue)
rather than a new variant:
  TypeKind::RValueReference(_)
In the majority of cases references are treated the same way
whether they're rvalue or lvalue, so this was less invasive,
but either is fine.

Part of google/autocxx#124
adetaylor added a commit to adetaylor/rust-bindgen that referenced this issue Feb 24, 2025
As standard, bindgen treats C++ pointers and references the same in its
output. Downstream postprocessors might want to treat these things
differently; for example to use a CppRef<T> type for C++ references
to encode the fact that they're not (usually) null.

This PR emits references wrapped in a newtype wrapper called
  bindgen_marker_Reference<T>
This behavior is enabled by an option flag; it isn't default.
This type isn't actually defined in bindgen; users are expected
to define or replace this during postprocessing (e.g. by using
syn to reinterpret bindgen's output, or perhaps there are ways
to make this useful using --raw-line or other means to inject
a transparent newtype wrapper).

The same approach is taken to types which bindgen chooses to make
opaque. This is done in various circumstances but the most common
case is for non-type template parameters.

Alternative designs considered:
* Apply an attribute to the function taking or returning
  such a param, e.g.
    #[bindgen_marker_takes_reference_arg(1)]
    fn takes_reference(a: u32, b: *const Foo)
  This avoids the marker type, but the problem here is a
  more invasive change to bindgen because type information
  can no longer be contained in a syn::Type; type metadata
  needs to be communicated around inside bindgen.

* Make a ParseCallbacks call each time a type is opaque
  or a reference. This would work for standalone opaque types,
  e.g.
    pub struct Bar {
     pub _bindgen_opaque_blob: u8
    }
  but the main case where we need these is where bindgen is
  using an opaque or reference type in the signature of some
  function, and there's no real opportunity to describe this
  in any kind of callback, since the callback would have to
  describe where exactly in the function signature the
  opaque or reference type has been used (and then we run into
  the same problems of passing this metadata around in the
  innards of bindgen).

In order to maintain the current simple design where any C++
type is represented as a simple syn::Type, the best approach
seems to be to do this wrapping in a fake marker type.

Another design decision here was to represent an RValue
reference as:
  TypeKind::Reference(_, ReferenceKind::RValue)
rather than a new variant:
  TypeKind::RValueReference(_)
In the majority of cases references are treated the same way
whether they're rvalue or lvalue, so this was less invasive,
but either is fine.

Part of google/autocxx#124
adetaylor added a commit to adetaylor/rust-bindgen that referenced this issue Feb 24, 2025
As standard, bindgen treats C++ pointers and references the same in its
output. Downstream postprocessors might want to treat these things
differently; for example to use a CppRef<T> type for C++ references
to encode the fact that they're not (usually) null.

This PR emits references wrapped in a newtype wrapper called
  bindgen_marker_Reference<T>
This behavior is enabled by an option flag; it isn't default.
This type isn't actually defined in bindgen; users are expected
to define or replace this during postprocessing (e.g. by using
syn to reinterpret bindgen's output, or perhaps there are ways
to make this useful using --raw-line or other means to inject
a transparent newtype wrapper).

The same approach is taken to types which bindgen chooses to make
opaque. This is done in various circumstances but the most common
case is for non-type template parameters.

Alternative designs considered:
* Apply an attribute to the function taking or returning
  such a param, e.g.
    #[bindgen_marker_takes_reference_arg(1)]
    fn takes_reference(a: u32, b: *const Foo)
  This avoids the marker type, but the problem here is a
  more invasive change to bindgen because type information
  can no longer be contained in a syn::Type; type metadata
  needs to be communicated around inside bindgen.

* Make a ParseCallbacks call each time a type is opaque
  or a reference. This would work for standalone opaque types,
  e.g.
    pub struct Bar {
     pub _bindgen_opaque_blob: u8
    }
  but the main case where we need these is where bindgen is
  using an opaque or reference type in the signature of some
  function, and there's no real opportunity to describe this
  in any kind of callback, since the callback would have to
  describe where exactly in the function signature the
  opaque or reference type has been used (and then we run into
  the same problems of passing this metadata around in the
  innards of bindgen).

In order to maintain the current simple design where any C++
type is represented as a simple syn::Type, the best approach
seems to be to do this wrapping in a fake marker type.

Another design decision here was to represent an RValue
reference as:
  TypeKind::Reference(_, ReferenceKind::RValue)
rather than a new variant:
  TypeKind::RValueReference(_)
In the majority of cases references are treated the same way
whether they're rvalue or lvalue, so this was less invasive,
but either is fine.

Part of google/autocxx#124
@adetaylor adetaylor added the architectural Architectural improvement label Feb 24, 2025
adetaylor added a commit to adetaylor/rust-bindgen that referenced this issue Feb 26, 2025
Provide an option to add a line to every mod generated by bindgen.

Part of google/autocxx#124, though
this is only in fact necessary if we make the changes given in
google/autocxx#1456.
adetaylor added a commit to adetaylor/rust-bindgen that referenced this issue Feb 26, 2025
This uses an existing callback to allow overriding of constructor name.

Part of google/autocxx#124, though only
necessary if we also do google/autocxx#1456.
github-merge-queue bot pushed a commit to rust-lang/rust-bindgen that referenced this issue Apr 4, 2025
This extends the existing discovery callback mechanism to report on functions
and methods. At this stage, we don't say much about them, in order to be
consistent with other discovery callbacks. Subsequent PRs will add
extra callbacks to provide information especially about methods
(virtualness, C++ visibility, etc.) Please request changes if you think
that sort of information should arrive in these callbacks.

Because methods are a fundamentally C++ thing, this splits the
current ParseCallbacks test to cover both a .h and a .hpp header.

Part of google/autocxx#124
github-merge-queue bot pushed a commit to rust-lang/rust-bindgen that referenced this issue Apr 4, 2025
Downstream code generators may need to know about the existence of certain C++
functions even if those functions can't be called. This is counterintuitive but:

* A type can't even be allocated if it contains pure virtual functions
  or if its constructor is private.
* A type may not be relocatable if it contains a deleted move constructor.

This PR provides command line options to reveal the existence of these
functions. Subsequent PRs will announce their special status using the
ParseCallbacks mechanism.

Part of google/autocxx#124.
github-merge-queue bot pushed a commit to rust-lang/rust-bindgen that referenced this issue Apr 4, 2025
Downstream code generators may need to know about the existence of certain C++
functions even if those functions can't be called. This is counterintuitive but:

* A type can't even be allocated if it contains pure virtual functions
  or if its constructor is private.
* A type may not be relocatable if it contains a deleted move constructor.

This PR provides command line options to reveal the existence of these
functions. Subsequent PRs will announce their special status using the
ParseCallbacks mechanism.

Part of google/autocxx#124.
github-merge-queue bot pushed a commit to rust-lang/rust-bindgen that referenced this issue Apr 4, 2025
This extends the existing discovery callback mechanism to report on functions
and methods. At this stage, we don't say much about them, in order to be
consistent with other discovery callbacks. Subsequent PRs will add
extra callbacks to provide information especially about methods
(virtualness, C++ visibility, etc.) Please request changes if you think
that sort of information should arrive in these callbacks.

Because methods are a fundamentally C++ thing, this splits the
current ParseCallbacks test to cover both a .h and a .hpp header.

Part of google/autocxx#124
github-merge-queue bot pushed a commit to rust-lang/rust-bindgen that referenced this issue Apr 4, 2025
No functional change - just deduplicating the logic which calls this callback,
which will make it easier to make further changes in future.

Part of google/autocxx#124
github-merge-queue bot pushed a commit to rust-lang/rust-bindgen that referenced this issue Apr 4, 2025
This adds more information to ParseCallbacks which indicates the location in
the original source code at which a given item was found.

This has proven to be useful in downstream code generators in providing
diagnostics to explain why a given item can't be represented in Rust. (There
are lots of reasons why this might not be the case - autocxx has around 100
which can be found here -
https://github.com/google/autocxx/blob/d85eac76c9b3089d0d86249e857ff0e8c36b988f/engine/src/conversion/convert_error.rs#L39
- but irrespective of the specific reasons, it's useful to be able to point to
the original location when emitting diagnostics).

Should we make this a new callback or include this information within
the existing callback?

Pros of making it a new callback:
* No compatibility breakage.

Pros of including it in this existing callback:
* No need to specify and test a policy about whether such callbacks
  always happen together, or may arrive individually
* Easier for recipients (including bindgen's own test suite) to
  keep track of the source code location received.
* Because we add new items to the DiscoveryItem enum anyway,
  we seem to have accepted it's OK to break compatibility in this
  callback (for now at least).

Therefore I'm adding it as a parameter to the existing callback. If it's
deemed acceptable to break compatibility in this way, I will follow the
same thought process for some other changes too.

Part of google/autocxx#124.
github-merge-queue bot pushed a commit to rust-lang/rust-bindgen that referenced this issue Apr 4, 2025
This extends the existing discovery callback mechanism to report on functions
and methods. At this stage, we don't say much about them, in order to be
consistent with other discovery callbacks. Subsequent PRs will add
extra callbacks to provide information especially about methods
(virtualness, C++ visibility, etc.) Please request changes if you think
that sort of information should arrive in these callbacks.

Because methods are a fundamentally C++ thing, this splits the
current ParseCallbacks test to cover both a .h and a .hpp header.

Part of google/autocxx#124
github-merge-queue bot pushed a commit to rust-lang/rust-bindgen that referenced this issue Apr 4, 2025
Downstream code generators may need to know about the existence of certain C++
functions even if those functions can't be called. This is counterintuitive but:

* A type can't even be allocated if it contains pure virtual functions
  or if its constructor is private.
* A type may not be relocatable if it contains a deleted move constructor.

This PR provides command line options to reveal the existence of these
functions. Subsequent PRs will announce their special status using the
ParseCallbacks mechanism.

Part of google/autocxx#124.
github-merge-queue bot pushed a commit to rust-lang/rust-bindgen that referenced this issue Apr 4, 2025
In its default behavior, bindgen does not emit information for any
C++ operatorXYZ function (unless operatorXYZ is a valid identifier,
which it isn't for C++ overloaded operators).

This PR introduces a new command line option to represent all operators.
This is not useful for those who directly consume the output of bindgen,
but is useful for post-processors. Specifically, consumers will need to
implement the callback to rename these items to something more useful.

Part of google/autocxx#124
github-merge-queue bot pushed a commit to rust-lang/rust-bindgen that referenced this issue Apr 4, 2025
This extends the existing discovery callback mechanism to report on functions
and methods. At this stage, we don't say much about them, in order to be
consistent with other discovery callbacks. Subsequent PRs will add
extra callbacks to provide information especially about methods
(virtualness, C++ visibility, etc.) Please request changes if you think
that sort of information should arrive in these callbacks.

Because methods are a fundamentally C++ thing, this splits the
current ParseCallbacks test to cover both a .h and a .hpp header.

Part of google/autocxx#124
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
architectural Architectural improvement
Projects
None yet
Development

No branches or pull requests

3 participants