-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about device option #6
Comments
That is an interesting question and I don't know the answer. I assume that tensorflow won't be able to see the other GPU's and so GPU0 will be the first GPU that it can see. I added the following lines into the script to see what GPU's tensorflow can see. print("GPU Devices:")
print(tf.config.list_physical_devices('GPU')) It is in queue and I'll report back when it is done. |
The test showed that In case it is useful, here is more detail. The code to print the list of devices that I wrote above was only for Tensorflow 2.0+ and it had to be changed to what was below for tensorflow 1.4 from tensorflow.python.client import device_lib
local_device_protos = device_lib.list_local_devices()
gpus = [x.name for x in local_device_protos if x.device_type == 'GPU']
print(gpus) The output from this job (
In the stderr file, Tensorflow says that it assigned
When we look at the GPUs on the machine we get that this PCI bus id belongs to CUDA1 and not CUDA0. $ condor_status -long gitter2002.chtc.wisc.edu | grep -i 0000:5e:00.0
CUDA1DevicePciBusId = "0000:5E:00.0" So, to sum up, this server has 4 GPU's (CUDA0, CUDA1, CUDA2, CUDA3). HTcondor assigned this job the 2nd GPU i.e. CUDA1 and Tensorflow mapped that to GPU0. Let me know if you need anything else. |
awesome, thanks @sameerd ! I'll pass this on. |
thanks @sameerd for looking into this, Christina asked about this on my behalf. Do you recommend that users always use tf.device to instruct tf to only use the GPU that has been allocated to the job by HTCondor? I have someone who saw output in their log from tf that looks like tf is trying to use all the GPUs on the machine the job landed on |
@jmvera255 If someone's logs look like Tensorflow was trying to use all the GPUs then either
I'm not sure what else could be causing tensorflow to use all the GPUs. |
templates-GPUs/docker/tensorflow_python/test_tensorflow.py
Line 19 in 21c9139
Does this line make this script ALWAYS use the "first" GPU on a server? What if HTCondor has assigned you a different one (i.e. gpu device 3 instead of gpu device 0)?
@sameerd
The text was updated successfully, but these errors were encountered: