GPU utilization in spark error #1294

m15pradeep · 2025-02-04T06:29:43Z

I'm using attached install_gpu_driver.sh in dataproc 2.2. GPU is not getting recognized in spark. Attached installation logs for reference

dataproc-gpu-main.txt
dataproc-initialization-script-0.log
install_gpu_driver.txt

Command:
Library: tensorflow[and-cuda]
import tensorflow as tf
print(tf.config.list_physical_devices('CPU'))
print(tf.config.list_physical_devices('GPU'))

Log:
2025-02-04 06:04:29.413125: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2025-02-04 06:04:31.327492: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738649071.978220 80542 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738649072.400870 80542 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-02-04 06:04:36.546724: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-02-04 06:04:50.776297: E external/local_xla/xla/stream_executor/cuda/cuda_driver.cc:152] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error
2025-02-04 06:04:50.776349: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:137] retrieving CUDA diagnostic information for host: gpu-nvidia-l4-a363a292-a17a0463-m
2025-02-04 06:04:50.776359: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:144] hostname: gpu-nvidia-l4-a363a292-a17a0463-m
2025-02-04 06:04:50.776482: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:168] libcuda reported version is: 570.86.15
2025-02-04 06:04:50.776521: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:172] kernel reported version is: 570.86.15
2025-02-04 06:04:50.776531: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:259] kernel version seems to match DSO: 570.86.15
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]
[]

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU utilization in spark error #1294

GPU utilization in spark error #1294

m15pradeep commented Feb 4, 2025

GPU utilization in spark error #1294

GPU utilization in spark error #1294

Comments

m15pradeep commented Feb 4, 2025