You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello Everyone,
I am trying to write a multi-gpu version training code of this repository referring to this code cifar-multi-gpu.
However, I found out it always ended up with None value of gradients by opt.compute_gradient
withtf.variable_scope(tf.get_variable_scope()) astower_graph:
foriinxrange(FLAGS.num_gpus):
withtf.device('/gpu:%d'%i):
# this should be outof the loop or not, the work of piplinewithtf.name_scope('%s_%d'% ('tower', i)) asscope:
input_list=coco.read(file_name_list) #image, ih, iw, gt_boxes, gt_masks, num_instances, img_id input_list=list(input_list)
input_list[0], input_list[3], input_list[4] =coco_preprocess.preprocess_image(input_list[0], input_list[3], input_list[4], is_training=True)
withslim.arg_scope(resnet_v1.resnet_arg_scope()):
logits, end_points=resnet50(input_list[0], 1000, is_training=False)
loss=tower_loss(scope, input_list, end_points)
# Reuse variables for the next tower.tf.get_variable_scope().reuse_variables()
# Retain the summaries from the final tower.summaries=tf.get_collection(tf.GraphKeys.SUMMARIES, scope)
# Calculate the gradients for the batch of data on this CIFAR tower.grads=opt.compute_gradients(loss)
# Keep track of the gradients across all towers.tower_grads.append(grads)
# We must calculate the mean of each gradient. Note that this is the# synchronization point across all towers.grads=average_gradients(tower_grads)
The mechanism I understand is to use name_scope to distinguish different towers to calculate the gradients separately while reuse all the variables in all towers to update them at once with averaged gradients. I think the main problem here is about the resnet50, because of the different name scope, the end_points' name in every tower changed. So I updated the dictionary by passing scope name. However, I cannot get valid gradients. Someone has any idea?
The text was updated successfully, but these errors were encountered:
Hello Everyone,
I am trying to write a multi-gpu version training code of this repository referring to this code cifar-multi-gpu.
However, I found out it always ended up with
None
value of gradients byopt.compute_gradient
The mechanism I understand is to use
name_scope
to distinguish different towers to calculate the gradients separately while reuse all the variables in all towers to update them at once with averaged gradients. I think the main problem here is about theresnet50
, because of the different name scope, the end_points' name in every tower changed. So I updated the dictionary by passingscope
name. However, I cannot get valid gradients. Someone has any idea?The text was updated successfully, but these errors were encountered: