You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm developing a distributed inference server system with data partition across two machines. When using DistNeighborLoader as follow code, I've encountered an issue: it requires both machines to execute sampling operations synchronously and with the same number of executions. These conditions are difficult to meet in real-world scenarios. What solutions are available?
If the execution times of DistNeighborLoader are not the same, the following error will be reported:
Exception in thread Thread-1 (server):
Traceback (most recent call last):
File ".../main.py", line 46, in server
loader = DistNeighborLoader(
^^^^^^^^^^^^^^
File ".../python3.12/site-packages/torch_geometric/distributed/dist_neighbor_loader.py", line 90, in __init__
DistLoader.__init__(
File ".../python3.12/site-packages/torch_geometric/distributed/dist_loader.py", line 92, in __init__
self.worker_init_fn(0)
File ".../python3.12/site-packages/torch_geometric/distributed/dist_loader.py", line 135, in worker_init_fn
self.dist_sampler.register_sampler_rpc()
File ".../python3.12/site-packages/torch_geometric/distributed/dist_neighbor_sampler.py", line 123, in register_sampler_rpc
partition2workers = rpc_partition_to_workers(
^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../python3.12/site-packages/torch/distributed/rpc/api.py", line 94, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File ".../python3.12/site-packages/torch_geometric/distributed/rpc.py", line 123, in rpc_partition_to_workers
for worker_name, (role, nparts, idx) in gathered_results.items():
^^^^^^^^^^^^^^^^^^^
TypeError: cannot unpack non-iterable NoneType object
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I'm developing a distributed inference server system with data partition across two machines. When using
DistNeighborLoader
as follow code, I've encountered an issue: it requires both machines to execute sampling operations synchronously and with the same number of executions. These conditions are difficult to meet in real-world scenarios. What solutions are available?If the execution times of
DistNeighborLoader
are not the same, the following error will be reported:Beta Was this translation helpful? Give feedback.
All reactions