Tracking issue: stub generation

Dear all (cc @cansik @torokati44 @qnzhou @tmsrise @rzhikharevich-wmt @njroussel @Speierers),

I am interested in providing a stub generation mechanism as part of nanobind. This is a tracking issue to brainstorm solutions.

**Context**: @cansik's [nanobind-stubgen](https://github.com/cansik/nanobind-stubgen) package is the only solution at the moment and works well in many cases. My goal is to overcome limitations discussed in discussion #163:

1. Enabling a better "out-of-the-box" experience by integrating stub generation into the CMake build system.

2. Stub generation currently involves complicated parsing, which is fragile and not always well-defined. Nanobind has this information in a more structured form and could provide it.

4. Stubs serve two purposes, and stub generation should cater to both needs:
  - To get autocomplete in VS Code and similar tools, which requires extracting function signatures and docstrings. I am mainly interested in this use case.

  - Type checkers like MyPy. I haven't used them before and know very little (hence this issue to exchange experience). It seems to me that stubs only need to contain typed signatures but no docstrings. But nanobind often generates type annotations that MyPy isn't happy with, so some sort of postprocessing may be needed.

Here is what I have in mind, before having actually having done anything. There may be roadblocks I haven't considered.

1. The CMake build system gets a new command `nanobind_add_stubs`. This will register a command that is run at _install time_. Basically we need the whole package to be importable, and doing that in a non-installed build might be tricky. 

```cmake
nanobind_add_stubs(
  PATH ${CMAKE_INSTALL_PREFIX}
  PACKAGE nanobind_example
  DEPENDS nanobind_example_ext
)
```
When the user installs the extension to `${CMAKE_INSTALL_PREFIX}`, this will run a Python file (shipped as part of the nanobind distribution) that imports the package and then generates `nanobind_example/__init__.pyi`.

Here, I am already getting confused because of unfamiliarity with stub generation. I've seen that packages sometimes contain multiple `.pyi` files. How does one decide where to put what? Can `.pyi` files import each other? What would be the best way to expose this in the `nanobind_add_stubs()` function?

2. I also wanted to modify nanobind's function class (`nb_func`) so that it exposes information in a more structured way, a bit like `__signature__` from `inspect.signature`. But `__signature__` is too limited because it (like Python) has no concept of overload chains.

Therefore, I am thinking of adding a function `__nb_signature__` that returns list of pairs of strings `[("signature", "docstring"), ...]` that the stub generator can turn into something like this

```python
from typing import overload

@overload
def func(x: int):
    """docstring 1"""

@overload
def func(x:str):
    """docstring 2"""
```

3.  Some types signatures in nanobind aren't parseable in Python. There are a few things that I think could wrong:

  - What if a C++ type hasn't been mapped yet when the extension is imported? In that case, the nanobind docstring includes the raw type (something like `std::__1::vector<Item *>`). In that case, the stubs could omit that overload entirely, put some generic placeholder (`object`?) or put the type name into a string. Thoughts?
  - The representation of default arguments (via `__repr__`) might not make sense as a Python expression. This seems like an unsolvable problem because nanobind simply does not know the Python expression to re-create an object. One option would be to try to `eval()` the expression in the stub generator and omit it or replace it by some kind of placeholder if an issue is found. Not sure -- thoughts?
  - Some nanobind type features don't have equivalents in `typing.*`. An example are the nd-array types annotations which are AFAIK too complex to be handled by anything currently existing. I'm thinking that it could be useful if the stub generator command `nanobind_add_stubs(..)` could be called with a user-provided Python file that implements some kind of post-process on the type signatures.
  
I'm curious about your thoughts on this! Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tracking issue: stub generation #420

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Tracking issue: stub generation #420

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions