Skip to content

Tracking issue: stub generation #420

@wjakob

Description

@wjakob

Dear all (cc @cansik @torokati44 @qnzhou @tmsrise @rzhikharevich-wmt @njroussel @Speierers),

I am interested in providing a stub generation mechanism as part of nanobind. This is a tracking issue to brainstorm solutions.

Context: @cansik's nanobind-stubgen package is the only solution at the moment and works well in many cases. My goal is to overcome limitations discussed in discussion #163:

  1. Enabling a better "out-of-the-box" experience by integrating stub generation into the CMake build system.

  2. Stub generation currently involves complicated parsing, which is fragile and not always well-defined. Nanobind has this information in a more structured form and could provide it.

  3. Stubs serve two purposes, and stub generation should cater to both needs:

  • To get autocomplete in VS Code and similar tools, which requires extracting function signatures and docstrings. I am mainly interested in this use case.

  • Type checkers like MyPy. I haven't used them before and know very little (hence this issue to exchange experience). It seems to me that stubs only need to contain typed signatures but no docstrings. But nanobind often generates type annotations that MyPy isn't happy with, so some sort of postprocessing may be needed.

Here is what I have in mind, before having actually having done anything. There may be roadblocks I haven't considered.

  1. The CMake build system gets a new command nanobind_add_stubs. This will register a command that is run at install time. Basically we need the whole package to be importable, and doing that in a non-installed build might be tricky.
nanobind_add_stubs(
  PATH ${CMAKE_INSTALL_PREFIX}
  PACKAGE nanobind_example
  DEPENDS nanobind_example_ext
)

When the user installs the extension to ${CMAKE_INSTALL_PREFIX}, this will run a Python file (shipped as part of the nanobind distribution) that imports the package and then generates nanobind_example/__init__.pyi.

Here, I am already getting confused because of unfamiliarity with stub generation. I've seen that packages sometimes contain multiple .pyi files. How does one decide where to put what? Can .pyi files import each other? What would be the best way to expose this in the nanobind_add_stubs() function?

  1. I also wanted to modify nanobind's function class (nb_func) so that it exposes information in a more structured way, a bit like __signature__ from inspect.signature. But __signature__ is too limited because it (like Python) has no concept of overload chains.

Therefore, I am thinking of adding a function __nb_signature__ that returns list of pairs of strings [("signature", "docstring"), ...] that the stub generator can turn into something like this

from typing import overload

@overload
def func(x: int):
    """docstring 1"""

@overload
def func(x:str):
    """docstring 2"""
  1. Some types signatures in nanobind aren't parseable in Python. There are a few things that I think could wrong:
  • What if a C++ type hasn't been mapped yet when the extension is imported? In that case, the nanobind docstring includes the raw type (something like std::__1::vector<Item *>). In that case, the stubs could omit that overload entirely, put some generic placeholder (object?) or put the type name into a string. Thoughts?
  • The representation of default arguments (via __repr__) might not make sense as a Python expression. This seems like an unsolvable problem because nanobind simply does not know the Python expression to re-create an object. One option would be to try to eval() the expression in the stub generator and omit it or replace it by some kind of placeholder if an issue is found. Not sure -- thoughts?
  • Some nanobind type features don't have equivalents in typing.*. An example are the nd-array types annotations which are AFAIK too complex to be handled by anything currently existing. I'm thinking that it could be useful if the stub generator command nanobind_add_stubs(..) could be called with a user-provided Python file that implements some kind of post-process on the type signatures.

I'm curious about your thoughts on this! Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions