- "'2025-06-23 15:47:22,272\\tINFO job_manager.py:531 -- Runtime env is setting up.\\nINFO 06-23 15:53:36 [__init__.py:244] Automatically detected platform cuda.\\n2025-06-23 15:53:54,307\\tINFO worker.py:1554 -- Using address 10.128.2.45:6379 set in the environment variable RAY_ADDRESS\\n2025-06-23 15:53:54,308\\tINFO worker.py:1694 -- Connecting to existing Ray cluster at address: 10.128.2.45:6379...\\n2025-06-23 15:53:54,406\\tINFO worker.py:1879 -- Connected to Ray cluster. View the dashboard at \\x1b[1m\\x1b[32mhttp://10.128.2.45:8265 \\x1b[39m\\x1b[22m\\nNo cloud storage mirror configured\\n2025-06-23 15:53:57,501\\tWARNING util.py:589 -- The argument ``compute`` is deprecated in Ray 2.9. Please specify argument ``concurrency`` instead. For more information, see https://docs.ray.io/en/master/data/transforming-data.html#stateful-transforms.\\n2025-06-23 15:53:58,095\\tINFO logging.py:290 -- Registered dataset logger for dataset dataset_33_0\\n2025-06-23 15:53:59,702\\tINFO streaming_executor.py:117 -- Starting execution of Dataset dataset_33_0. Full logs are in /tmp/ray/session_2025-06-23_10-53-41_019757_1/logs/ray-data\\n2025-06-23 15:53:59,702\\tINFO streaming_executor.py:118 -- Execution plan of Dataset dataset_33_0: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadRange->Map(_preprocess)] -> ActorPoolMapOperator[MapBatches(ChatTemplateUDF)] -> ActorPoolMapOperator[MapBatches(TokenizeUDF)] -> ActorPoolMapOperator[MapBatches(vLLMEngineStageUDF)] -> ActorPoolMapOperator[MapBatches(DetokenizeUDF)] -> TaskPoolMapOperator[Map(_postprocess)]\\n\\nRunning 0: 0.00 row [00:00, ? row/s]\\n \\n\\x1b[33m(raylet)\\x1b[0m [2025-06-23 15:54:00,800 E 829 829] (raylet) node_manager.cc:3287: 2 Workers (tasks / actors) killed due to memory pressure (OOM), 0 Workers crashed due to other reasons at node (ID: b72a45799ac9496bf52347fb9f9ef218722683d7bd8dd14702e821f0, IP: 10.128.2.45) over the last time period. To see more information about the Workers killed on this node, use `ray logs raylet.out -ip 10.128.2.45`\\n\\nRunning 0: 0.00 row [00:01, ? row/s]\\n \\n\\x1b[33m(raylet)\\x1b[0m \\n\\nRunning 0: 0.00 row [00:01, ? row/s]\\n \\n\\x1b[33m(raylet)\\x1b[0m Refer to the documentation on how to address the out of memory issue: https://docs.ray.io/en/latest/ray-core/scheduling/ray-oom-prevention.html. Consider provisioning more memory on this node or reducing task parallelism by requesting more CPUs per task. To adjust the kill threshold, set the environment variable `RAY_memory_usage_threshold` when starting Ray. To disable worker killing, set the environment variable `RAY_memory_monitor_refresh_ms` to zero.\\n\\nRunning 0: 0.00 row [00:01, ? row/s]\\n \\n\\x1b[33m(raylet)\\x1b[0m \\n\\nRunning 0: 0.00 row [01:01, ? row/s]\\n \\n\\x1b[33m(raylet)\\x1b[0m [2025-06-23 15:55:00,824 E 829 829] (raylet) node_manager.cc:3287: 1 Workers (tasks / actors) killed due to memory pressure (OOM), 0 Workers crashed due to other reasons at node (ID: b72a45799ac9496bf52347fb9f9ef218722683d7bd8dd14702e821f0, IP: 10.128.2.45) over the last time period. To see more information about the Workers killed on this node, use `ray logs raylet.out -ip 10.128.2.45`\\n\\nRunning 0: 0.00 row [01:01, ? row/s]\\n \\n\\x1b[33m(raylet)\\x1b[0m Refer to the documentation on how to address the out of memory issue: https://docs.ray.io/en/latest/ray-core/scheduling/ray-oom-prevention.html. Consider provisioning more memory on this node or reducing task parallelism by requesting more CPUs per task. To adjust the kill threshold, set the environment variable `RAY_memory_usage_threshold` when starting Ray. To disable worker killing, set the environment variable `RAY_memory_monitor_refresh_ms` to zero.\\n\\nRunning 0: 0.00 row [01:01, ? row/s]'"
0 commit comments