Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate external bindings during translation #480

Open
TravisCardwell opened this issue Mar 5, 2025 · 11 comments
Open

Generate external bindings during translation #480

TravisCardwell opened this issue Mar 5, 2025 · 11 comments

Comments

@TravisCardwell
Copy link
Collaborator

We should support generation of external bindings configuration for a module during translation. This is only relevant to the preprocessor mode. We can add an option to the CLI preprocessor command to enable this as well as specify the output file path.

@TravisCardwell
Copy link
Collaborator Author

External bindings configuration should be generated from the Hs declarations. We currently do not expose this stage of the pipeline, however. We can generate them from the C.Header and (re-) translate internally, as is currently done with test generation. Alternatively, we could add a new opaque HsDecls wrapper and expose the stage.

The current pipeline can be seen in #498 (comment).

If we add the HsDecls wrapper, we will have the following:

graph TD

  Clang-- parseCHeader -->C
  C-- genHsDecls -->Hs
  Hs-- genSHsDecls --> SHs

  subgraph Preprocessor
    PPIO["`source
    (IO)`"]
    PPPure["`source
    (pure)`"]
    HsModule-- genPP -->PPIO
    HsModule-- genPPString -->PPPure
  end

  SHs-- genModule -->HsModule

  subgraph Template Haskell
    DecsQ
    Exts[Set TH.Extension]
  end

  SHs-- genTH --> DecsQ
  SHs-- genExtensions --> Exts

  Dump["`dump
  (IO)`"]
  C-- dumpCHeader -->Dump

  Hs-- genExtBindings -->ExtBindings

  Tests["`tests
  (IO)`"]
  Hs-- genTests -->Tests
Loading

Any opinions? My vote is to go ahead and add HsDecls.

@TravisCardwell
Copy link
Collaborator Author

TravisCardwell commented Mar 12, 2025

Our external bindings design includes the Haskell package name. It is not actually used anywhere, though we may use it in the future, and it is probably useful information for users. We do not yet have this information (when using the preprocessor command via the CLI), however.

The current plan is to add a new --gen-external-bindings option that takes the path to write the external bindings configuration to. How should we specify the package name? We could use a separate --package option. It would not be used unless --gen-external-bindings is specified, in which case it would be required. This is a bit annoying, but it would work. Are there any better ideas?

Example usage:

$ hs-bindgen preprocess \
    --external-bindings .../path/to/base.yaml \
    --external-bindings .../path/to/hs-bindgen-runtime.yaml \
    --include-path cbits \
    --input foo.h \
    --package acme \
    --module Acme.Foo \
    --output src/Acme/Foo.hs \
    --gen-external-bindings bindings/acme.yaml

That example demonstrates how it would work now. Improvements include:

After we implement support for getting information from a .cabal file in the future, we would know the package name from the .cabal file. Usage could become:

$ hs-bindgen preprocess \
    --cabal library \
    --input foo.h \
    --module Acme.Foo \
    --gen-external-bindings bindings/acme.yaml

Notes:

  • The .cabal file located in the current working directory is used.
  • The library component is specified by the user.
  • The external bindings are either specified in the .cabal file somehow or are enabled by default.
  • The project include path is specified in the .cabal file.
  • The package name is specified in the .cabal file.
  • The output file is determined using the module name and the source directory specified for the library in the .cabal file.

@TravisCardwell
Copy link
Collaborator Author

Consider the case of creating bindings with multiple modules for C headers foo.h and bar.h where foo.h imports bar.h.

First translate Acme.Bar:

$ hs-bindgen preprocess \
    --external-bindings .../base.yaml \
    --external-bindings .../hs-bindgen-runtime.yaml \
    ...
    --input bar.h
    ...
    --gen-external-bindings bindings/acme-bar.yaml

Then translate Acme.Foo:

$ hs-bindgen preprocess \
    --external-bindings .../base.yaml \
    --external-bindings .../hs-bindgen-runtime.yaml \
    --external-bindings bindings/acme-bar.yaml \
    ...
    --input foo.h
    ...
    --gen-external-bindings bindings/acme-foo.yaml

Those commands create separate external bindings configurations for each module, which need to be merged to create an external bindings configuration for the package. Note that this is already implemented in function mergeExtBindings. (ExtBindings does not have a semigroup instance because errors occur on conflict.) We can add this functionality to the Lib API as well as create a new command in the CLI. In the future, we can provide some automation for dealing with multiple modules.

Related discussion: #75 (comment)

@TravisCardwell
Copy link
Collaborator Author

Any opinions? My vote is to go ahead and add HsDecls.

@edsko and I discussed this yesterday. We decided to go ahead and move development commands to a separate CLI, as @phadej suggested. Main CLI hs-bindgen-cli will only use the public API, while development CLI hs-bindgen-dev will be able to use the internal API.

With that change, CHeader is no longer needed. I removed it and implemented opaque type HsDecls, wrapping [Hs.Decl]. Test generation (genTests) is changed to use this type direction, which is better than the previous implementation. The new external bindings generation (genExtBindings) API will also use this type.

FWIW, I like the new public API much better!

@TravisCardwell
Copy link
Collaborator Author

TravisCardwell commented Mar 13, 2025

Test simple_structs.h:

$ cabal run hs-bindgen -- preprocess \
    --include-path hs-bindgen/examples \
    --input simple_structs.h \
    --package acme \
    --module Acme.SimpleStructs \
    --output SimpleStruct.hs \
    --gen-external-bindings simple_structs.yaml

There is an issue. Type S5 is a C struct with matching tags:

typedef struct S5 {
    char a;
    int b;
} S5;

In this case, we translate to a single Haskell type S5. The current implementation generates the following external binding configuration:

- cname: struct S5
  headers:
  - simple_structs.h
  identifier: S5
  module: Acme.SimpleStructs
  package: acme

This is correct, but we also need to generate the following:

- cname: S5
  headers:
  - simple_structs.h
  identifier: S5
  module: Acme.SimpleStructs
  package: acme

Currently, I do not think we represent this case in the C AST. We just register a type alias in the DeclState. To resolve this issue, we need to also represent it in the AST. Actually, it looks like the information is included in the DeclPath... Experimenting...

@TravisCardwell
Copy link
Collaborator Author

Nope, the information is not included in the DeclPath. This is easily confirmed in the simple_structs.hs fixture:

structDeclPath = DeclPathStruct (DeclNameTag (CName "S5")) DeclPathTop

My initial understanding was correct.

@TravisCardwell
Copy link
Collaborator Author

In the processing of typedefs, the DeclPath uses DeclPathStruct without checking the kind of underlying type, here. Perhaps we should use DeclPathUnion if the underlying type is a union. We should also support typedefs of enum types.

We need to be able to distinguish the case when a struct, union, or enum tag is the same as a typedef alias for the type. One way to do this is with DeclPath nesting like the following:

DeclPathStruct
  (DeclNameTag "foo")
  (DeclPathStruct (DeclNameTypedef "foo") DeclPathTop)

The two uses of DeclPathStruct is questionable. They should always match, but pattern matching must handle all cases.

An alternative way to do this is to add a constructor:

-- | Structure/union/enum tag and typedef name are the same
DeclNameBoth CName

Thoughts?

@TravisCardwell
Copy link
Collaborator Author

I am using trace debugging to try to understand the exact order of processing as we fold the Clang AST. I am currently considering the following declaration, in simple_structs.h:

// struct with typedef
typedef struct S2 {
    char a;
    int b;
    float c;
} S2_t;

As noted in a comment, a typedef struct is processed as a struct followed by a typedef. Indeed, trace debugging shows that the CXType_Record CXCursor_StructDecl is processed before the CXType_Typedef. Type struct S2 is therefore already registered as a TypeDecl in the DeclState when typedef S2_t is processed.

Consider this line in the code that processes the typedef:

use <-
  processTypeDeclRec
    (DeclPathStruct (DeclNameTypedef tag) DeclPathTop)
    extBindings
    unit
    Nothing
    ty'

ty' is the underlying type: struct S2. In processTypeDeclRec, the OMap.lookup returns Nothing, passing the DeclPath to another processTypeDecl' call. That underlying type turns out to be an elaborated type. The underlying type of that elaborated type is the actual struct S2. In the next call to processTypeDeclRec, OMap.lookup returns a Just TypeDecl{}. The previously processed type is returned, and the DeclPath is not used. Logs indicate that the DeclPath argument to this processTypeDeclRec call is only used when there are pointers.

The DeclPath stored in the C AST for S2 does not include a DeclPathTypedef because of this behavior. A DeclNameTypedef constructor just indicates that a struct/union is anonymous but has a typedef name, not the general existence of a typedef around the declaration of a struct/union

Since a type may have any number of typedef aliases, I do not think we can resolve the issue using (just) DeclPath.

One way to resolve the issue is to add aliases to the C AST. Note that aliases should be separate from the canonical DeclPath. While folding, we would need to update the AST node when adding an alias. We do not do any updates yet, but it looks possible using unionWithL.

Alternatively, we could change parseCHeader to return aliases (separate from the declarations), retrieved from the final DeclState. We would need to add some information to the TypeAlias constructor. We would then need to query aliases when generating external bindings. This is a bit convoluted, but the option is available if adding aliases to the C AST proves problematic.

@phadej
Copy link
Collaborator

phadej commented Mar 14, 2025

@TravisCardwell you overcomplicate the matter.

AFAIK (if it's not please tell),

typedef struct S2 {
    char a;
    int b;
    float c;
} S2_t;

typedef struct S2  S2_t_b;

and

struct S2 {
    char a;
    int b;
    float c;
} S2_t;

typedef struct S2 S2_t;
typedef struct S2 S2_t_b;

generate the same HsBindgen.C.AST structure (sans locations).

In particular, I made sure that

typedef struct s { .. } t;

and

struct s { .. };
typedef struct s t;

work the same.

This was in length discussed in #306

@TravisCardwell
Copy link
Collaborator Author

Yes, the generated Haskell looks good. I am working on generating the external bindings.

For this example, we can already generate external bindings without issue because each typedef has a different name and therefore translates to different Haskell newtype types.

I am running into trouble when the typedef name matches the tag, such as in this example:

typedef struct S5 {
    char a;
    int b;
} S5;

In this case, we only translate to a single Haskell type. We need to generate bindings for both struct S5 and S5, referencing the same Haskell type. The C.Struct does not reflect the fact that it has an S5 alias; the DeclPath (only) represents the struct S5 spelling.

My idea for resolving this is to add alias information to the C AST. If we assume that the case where the typedef name is equal to the tag is the only time we will create such aliases, then Maybe DeclPath would suffice. Perhaps that is fine... Even if we allow users to optionally translate typedefs to types (instead of newtypes), we would translate to a separate Hs.Decl with an origin that contains the DeclPath for the typedef.

Are there any other (perhaps not-yet-implemented) cases where there are more than one C name spelling that map to the same translated Haskell type?

@phadej
Copy link
Collaborator

phadej commented Mar 14, 2025

typedef struct S5 {
    char a;
    int b;
} S5;

is the same as

typedef struct {
    char a;
    int b;
} S5;

We don't differentiate these (partly because libclang really doesn't make it easy, partly because e.g. in C++ these are indeed the same).

That's also discussed in #306

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants