You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Description:
After starting the Celeborn Worker, the following issues occur:
Worker not getting resources (Slots)
The logs show:
This indicates that the Worker is not being allocated any resources, preventing it from executing any tasks.
Incorrect disk space information
In the heartbeat message sent from the Worker to the Master, the disk space shows:
This suggests that the Worker is not correctly detecting the disk space, which may indicate a problem with the storage path or mount.
No running Shuffle tasks
The logs show:
This indicates that no tasks are being submitted to the Worker, likely due to insufficient resources or failed shuffle allocation.
Memory management is normal, but no Shuffle operations are taking place
Even though memory usage is normal:
No shuffle operations are occurring due to the lack of available resources.
Steps to Reproduce:
maxSlots
,activeSlots
,usableSpace
,totalSpace
, andcommitted shuffles
fields.Log Summary:
Here are the relevant log entries:
Expected Result:
We expect the Celeborn Worker to correctly allocate resources, detect disk space, and handle shuffle tasks.
Actual Result:
The Worker is not getting any resources, the disk space is incorrectly detected, and no shuffle tasks are being processed.
Beta Was this translation helpful? Give feedback.
All reactions