Skip to content

set_variant(): try next requested variant if one fails to import #1522

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

merlinND
Copy link
Member

@merlinND merlinND commented Mar 18, 2025

Description

If the user specifies several variants in order, e.g.:

mi.set_variant("cuda_ad_rgb", "llvm_ad_rgb")

and they are all compiled (available) but some fail to import, then keep trying with the next requested variant instead of throwing an exception.

This could happen e.g. when requested a CUDA variant with the NVIDIA driver installed / CUDA available but no GPU available, or when LLVM is not installed.

Testing

On a machine with the NVIDIA driver installed,

CUDA_VISIBLE_DEVICES= python -c 'import mitsuba as mi; mi.set_variant("cuda_ad_rgb", "llvm_ad_rgb"); print(mi.variant())'

Old behavior: raises ImportError and exits.
New behavior: fails to load cuda_ad_rgb, but correctly goes to the second choice llvm_ad_rgb.

The old behavior can still be obtained by requesting a single variant at a time (no fallback): mi.set_variant("cuda_ad_rgb")

Checklist

  • My code follows the style guidelines of this project
  • My changes generate no new warnings
  • My code also compiles for cuda_* and llvm_* variants. If you can't test this, please leave below
  • I have commented my code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • I cleaned the commit history and removed any "Merge" commits
  • I give permission that the Mitsuba 3 project may redistribute my contributions under the terms of its license

@merlinND merlinND added the enhancement New feature or request label Mar 18, 2025
@merlinND merlinND self-assigned this Mar 18, 2025
@merlinND merlinND requested a review from njroussel April 2, 2025 07:45
Copy link
Member

@njroussel njroussel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this! I've had it on my backlog to have mi.variants() only report valid variants, but there's one nasty problem: if that logic is ever wrong/broken, we'll be skipping hundreds of tests on the CI. Keeping this fallback logic in set_variant() shouldn't ever mess with how we've setup the test suite 👍

If the user specifies several variants in order:
    mi.set_variant("cuda_ad_rgb", "llvm_ad_rgb")
and they are all compiled (available) but some fail to import,
then keep trying with the next requested variant instead of throwing an exception.

This could happen e.g. when requested a CUDA variant with CUDA installed but no GPU available,
or when LLVM is not installed.
+ remove unnecessary borrow.
@merlinND merlinND force-pushed the set-variant-choices branch from 240eebb to a4dcf11 Compare April 16, 2025 13:07
@merlinND
Copy link
Member Author

merlinND commented Apr 16, 2025

Thanks for the review @njroussel and @wjakob!
I've rebased and incorporated the feedback.

Currently, both on master and with this PR, failing to set a variant results in a lot of leaks when exiting, even if the exception is caught and we then successfully set another variant:

# CUDA_VISIBLE_DEVICES= python ./test_failed_variant.py

import mitsuba as mi

if __name__ == "__main__":
    mi.set_log_level(mi.LogLevel.Debug)
    try:
        mi.set_variant("cuda_ad_rgb")
    except ImportError as e:
        print("Failed to set variant")

    print("Current variant:", mi.variant())
    # Still leaks if we then set a valid variant
    mi.set_variant("llvm_ad_rgb")
    print("Current variant:", mi.variant())
nanobind: leaked 23 instances!
 - leaked instance 0x769448183008 of type "mitsuba.Color0f"
 - leaked instance 0x769443f3bac8 of type "drjit.cuda.ad.TensorXf"
 - leaked instance 0x769443f1ebd0 of type "drjit.scalar.Array3f"
 - ... skipped remainder
nanobind: leaked 273 types!
 - leaked type "drjit.cuda.ad.Int"
 - leaked type "mitsuba.Ray3d"
 - ... skipped remainder
nanobind: leaked 1486 functions!
- ...

it's unrelated to this PR, though.

Edit: hopefully fixed with 4306f55

When importing a variant, immediately try initializing the backend.
Without this change, initialization could fail much later, in `color_management_static_initialization()`.
By that point, many other things have been loaded, which led to reference leaks to various types and functions, even if the import error was properly handled.
Comment on lines +124 to +130
// Before loading everything in and creating a lot of references to
// various objects, we ensure that this backend can be initialized
// without issues by creating a simple variable.
// If initialization fails, an exception will be raised, which the user
// can catch and handle if desired.
// Leaving initialization to fail later would lead to reference leaks.
MI_VARIANT_FLOAT(0);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should hopefully be enough to avoid leaks that have been happening when failing to import a variant.

@merlinND
Copy link
Member Author

Pinging this PR, would it be okay to merge?

The latest commit (4306f55) avoids partial initialization of variants when a backend turns out to be unavailable, which used to result in leaks (even before this PR).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants