Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] UMAP transform throws illegal memory access error when data_on_host=True #6216

Closed
btepera opened this issue Jan 10, 2025 · 0 comments · Fixed by #6259
Closed

[BUG] UMAP transform throws illegal memory access error when data_on_host=True #6216

btepera opened this issue Jan 10, 2025 · 0 comments · Fixed by #6259
Assignees
Labels
3 - Ready for Review Ready for review by team bug Something isn't working

Comments

@btepera
Copy link

btepera commented Jan 10, 2025

When running UMAP with batched nn descent, fit is supported today but not transform, which falls back to using brute force knn (#6215). If I run fit with data_on_host set to True, this causes the transform call to throw an illegal memory access error.

import numpy as np
from cuml.manifold import UMAP

N = 10000
K = 32

rng = np.random.default_rng()
data = rng.random((N, K), dtype="float32")

reducer = UMAP(
    n_components=2,
    n_neighbors=15,
    build_algo="nn_descent",
    build_kwds={"nnd_n_clusters": 4},
)

fitted_umap = reducer.fit(data, data_on_host=True)
embeddings = fitted_umap.transform(data)

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[4], line 9
      1 reducer = UMAP(
      2     n_components=2,
      3     n_neighbors=15,
      4     build_algo="nn_descent",
      5     build_kwds={"nnd_n_clusters": 4},
      6 )
      8 fitted_umap = reducer.fit(data, data_on_host=True)
----> 9 embeddings = fitted_umap.transform(data)

File [/raid/btepera/miniforge3/envs/rapids-24.12/lib/python3.12/site-packages/cuml/internals/api_decorators.py:188](http://127.0.0.1:8890/raid/btepera/miniforge3/envs/rapids-24.12/lib/python3.12/site-packages/cuml/internals/api_decorators.py#line=187), in _make_decorator_function.<locals>.decorator_function.<locals>.decorator_closure.<locals>.wrapper(*args, **kwargs)
    185     set_api_output_dtype(output_dtype)
    187 if process_return:
--> 188     ret = func(*args, **kwargs)
    189 else:
    190     return func(*args, **kwargs)

File [/raid/btepera/miniforge3/envs/rapids-24.12/lib/python3.12/site-packages/cuml/internals/api_decorators.py:393](http://127.0.0.1:8890/raid/btepera/miniforge3/envs/rapids-24.12/lib/python3.12/site-packages/cuml/internals/api_decorators.py#line=392), in enable_device_interop.<locals>.dispatch(self, *args, **kwargs)
    391 if hasattr(self, "dispatch_func"):
    392     func_name = gpu_func.__name__
--> 393     return self.dispatch_func(func_name, gpu_func, *args, **kwargs)
    394 else:
    395     return gpu_func(self, *args, **kwargs)

File [/raid/btepera/miniforge3/envs/rapids-24.12/lib/python3.12/site-packages/cuml/internals/api_decorators.py:190](http://127.0.0.1:8890/raid/btepera/miniforge3/envs/rapids-24.12/lib/python3.12/site-packages/cuml/internals/api_decorators.py#line=189), in _make_decorator_function.<locals>.decorator_function.<locals>.decorator_closure.<locals>.wrapper(*args, **kwargs)
    188         ret = func(*args, **kwargs)
    189     else:
--> 190         return func(*args, **kwargs)
    192 return cm.process_return(ret)

File base.pyx:720, in cuml.internals.base.UniversalBase.dispatch_func()

File umap.pyx:841, in cuml.manifold.umap.UMAP.transform()

RuntimeError: CUDA error encountered at: file=[/opt/conda/conda-bld/work/cpp/src/umap/fuzzy_simpl_set/naive.cuh](http://127.0.0.1:8890/opt/conda/conda-bld/work/cpp/src/umap/fuzzy_simpl_set/naive.cuh) line=257: call='cudaPeekAtLastError()', Reason=cudaErrorIllegalAddress:an illegal memory access was encountered
Obtained 39 stack frames

Independent of having transform support batched nn descent (which I would imagine is a larger effort), we should handle this fallback appropriately in cases where data_on_host was True during the fit.

@btepera btepera added ? - Needs Triage Need team to review and classify bug Something isn't working labels Jan 10, 2025
@csadorf csadorf added 2 - In Progress Currenty a work in progress and removed ? - Needs Triage Need team to review and classify labels Jan 23, 2025
csadorf added a commit to csadorf/cuml that referenced this issue Jan 24, 2025
csadorf added a commit to csadorf/cuml that referenced this issue Jan 24, 2025
csadorf added a commit to csadorf/cuml that referenced this issue Jan 24, 2025
csadorf added a commit to csadorf/cuml that referenced this issue Jan 24, 2025
csadorf added a commit to csadorf/cuml that referenced this issue Jan 24, 2025
@csadorf csadorf added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currenty a work in progress labels Jan 24, 2025
rapids-bot bot pushed a commit that referenced this issue Jan 28, 2025
…#6259)

Fixes #6216 by identifying whether the original input data is on host or device and conditionally builds the brute force index (required for a separate `transform()` call) for the correct matrix view.

- [x] Identify and fix root cause
- [x] Clean up implementation
- [x] Implement unit test
- [x] Document fix

Closes #6216

Authors:
  - Simon Adorf (https://github.com/csadorf)
  - Dante Gama Dessavre (https://github.com/dantegd)

Approvers:
  - William Hicks (https://github.com/wphicks)
  - Victor Lafargue (https://github.com/viclafargue)

URL: #6259
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants