Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about training Gibson scenes #13

Open
ZhengdiYu opened this issue Jun 19, 2021 · 13 comments
Open

Questions about training Gibson scenes #13

ZhengdiYu opened this issue Jun 19, 2021 · 13 comments

Comments

@ZhengdiYu
Copy link

ZhengdiYu commented Jun 19, 2021

Hey Julian,

Thank you for your great work!

Cars:
Previously, I have trained NDF on ShapeNet cars. And I found that the learning rate is 1e-6 in your code. In this case the convergence would be affected. It's too slow and the performance is not good. So I alter the learning rate to 1e-4, and that worked out as well as the pre-trained model.

Also the batch_size and the initialization will not affect the performance. Here are the loss value for different BS and lr:
image

Scenes:
I also tried to use this learning rate 1e-4 for scenes. Looking at the loss value. It's also much better than before. However, the loss value is simply 3~4 times bigger than the loss of cars. I also tried to use your model pre-trained on cars to directly apply to scenes to generalize and it's even better than training on scenes. Please look at the example below:

Loss:
My training settings: Trained for 160 hours, batchsize1 (I think we can only use 1 because the point number in each cube is different), lr 1e-4. Other arguments remaining the same as cars(e.g. threshold). For preprocessing I was just using the script you released in NDF-1 repo and got many cubes for each scene. During training I just adapt the dataloader from Cars, and feed the split cubes into the network one by one (batchsize1), so the steps number in each epoch is the total cubes' number. For cars, the boundary sampled points number fed into the network is 50000, which is a half of the boundary sampled points(100000 sampled from mesh). So I also use 2 strategies to train on scenes, one is using all points in the cube as boundary points input(decoder input), the other one is using a half of them, but both of them did not work well:

image

Qualitative results:
image
image
image
image

Questions:
Q1. Do you remember how long it would take to train on scenes? Could you provide a pre-trained model if possible?

Q2. When do you plan to release the code for scenes? Or could you give us a brief introduction or guide line of how to train and test on scenes?

Q3. Have you changed arguments (e.g. thresholds) for training and testing on scenes (e.g. filter_val)? How do you train on scenes? Do you use a batch size 1 using the pre-processed data? Is there any other difference between the training on cars and scenes? (e.g. sample number?)

Q4. In Supplementary, What does this sentence in section 1 Hyperparameters - Network training part: 'To speed up training we initialized all networks with parameters gained by training on the full ShapeNet' means?

  • I didn't see any code related to this. Instead, you just train from scratch. I also tried loading your pre-trained model on cars to continue train on scenes but the loss is bigger than the ones trained on scenes (0.02 and 0.003), which is really weird. Because obviously the generation result using cars' model is much better than the one trained on scenes but somehow the loss is much bigger. In the mean time, the performance would degenerate if I load the pre-trained model and continue to train on scenes. The best performing one now for me is still the model pre-trained on cars.

(updates: I found that during generation of each cubes' dense point cloud, the gradients produced by the model trained on scenes is not as precise as the one trained on cars. Though, for scenes' model, the df is generally smaller (more points with df < 0.03) at the beginning, after 7 refinement steps, only a small amount of them would have smaller df than before resulting in less points with df<0.009, so the generation will take much more steps and time, which I think is due to the inaccurate gradients because after we move the points along the gradients, the df should be smaller. On the other hand, the cars' model have less points with df<0.03 to begin with, but after 7 refinement steps, almost all of them can be correctly moved closer to the surface, resulting in more points with df<0.009.). I think something's gone wrong during training, it doesn't make sense. The loss is smaller, but the gradient is somehow more inaccurate. It seems like the ground truth might be incorrect, but the only adaption I made is switch pymesh to igl to compute gt_df. That's all.

Q5. Do you use different dense pc generation strategies for cars and scenes? Do you use the same script for both of them?

  • My method: I generate the scenes by first generating each cube using the same generation script(only decrease sample_num for each cube to ~50000) as cars and then based on the cube corner to stack them together.

  • However, The iteration for generating the dense point cloud sometimes will be infinite, because after each iteration no new points would fall within the 'filter_val' threshold (you keep those points with df < filter_val after each iteration) So I tried to increase the sample_num for the iteration to 200000 to increase the possibility of selecting correct points, but it did not improve the performance but only to some extent avoid the infinite loop described.

  • I think there might be 2 reasons:

  1. Some arguments for generation has changed for scenes (e.g. filter_val, sample_num..). But I think it's not the key, because I also tried use your model pre-trained on cars to directly perform on scenes with exactly the same generation settings , it looks even better than the model I trained on scenes.
  2. You might have a different training strategy or arguments I don't know.

Hope you can help me, thanks for your time!

Best,
Zhengdi

@jchibane
Copy link
Owner

jchibane commented Jun 30, 2021

Hi @ZhengdiYu ,

thanks for the detailed post!

Cars: Your analysis is good - for our experiments we initially had a larger learning rate as well, I altered it to stabilize training but it seems this was too radical. Thanks for pointing it out!

Scenes:
Q1. Do you remember how long it would take to train on scenes? Could you provide a pre-trained model if possible?
Try using a model trained on cars and use it as initialization for the scenes. That's what we did. I think we trained for ~3-4 days.
Yes, I will upload the model for scenes.

Q2. When do you plan to release the code for scenes? Or could you give us a brief introduction or guideline of how to train and test on scenes?
We do plan to upload coda and readme for scenes too, but this might take some time, unfortunately.

Q3. Yes, using the same setup should be fine, details can be found in the supplementary.

Q4. See my answer on Q1. We used a network checkpoint obtained from ShapeNet training, as initialization to speed up training.
"I also tried loading your pre-trained model on cars to continue train on scenes but the loss is bigger than the ones trained on scenes (0.02 and 0.003), which is really weird."
This probably suggests that something within your training setup of scenes has a bug.

Q5. Do you use different dense pc generation strategies for cars and scenes? Do you use the same script for both of them?
Yes, we used the same script.

Summary: probably there is something off in your training code, maybe some alignment issue.
You added qualitative examples, but did not comment - what do you want to illustrate there?

Best,
Julian

@ZhengdiYu
Copy link
Author

ZhengdiYu commented Jun 30, 2021

Hi @ZhengdiYu ,

thanks for the detailed post!

Cars: Your analysis is good - for our experiments we initially had a larger learning rate as well, I altered it to stabilize training but it seems this was too radical. Thanks for pointing it out!

Scenes:
Q1. Do you remember how long it would take to train on scenes? Could you provide a pre-trained model if possible?
Try using a model trained on cars and use it as initialization for the scenes. That's what we did. I think we trained for ~3-4 days.
Yes, I will upload the model for scenes.

Q2. When do you plan to release the code for scenes? Or could you give us a brief introduction or guideline of how to train and test on scenes?
We do plan to upload coda and readme for scenes too, but this might take some time, unfortunately.

Q3. Yes, using the same setup should be fine, details can be found in the supplementary.

Q4. See my answer on Q1. We used a network checkpoint obtained from ShapeNet training, as initialization to speed up training.
"I also tried loading your pre-trained model on cars to continue train on scenes but the loss is bigger than the ones trained on scenes (0.02 and 0.003), which is really weird."
This probably suggests that something within your training setup of scenes has a bug.

Q5. Do you use different dense pc generation strategies for cars and scenes? Do you use the same script for both of them?
Yes, we used the same script.

Summary: probably there is something off in your training code, maybe some alignment issue.
You added qualitative examples, but did not comment - what do you want to illustrate there?

Best,
Julian

Hey @jchibane ,

Thank you for your detailed reply,and according to your reply:

First of all, I think it might not be an alignment issue. Because I just use your pre-process script and adapt the filename and data_path of .npz data so that your dataloader can read the data of scenes, that's all I did. There is no transformation operation during training, just feed the '/pymesh_boundary_{}_samples.npz' the script produce to train the network. The only difference between car and scenes is the pre-processing, other scripts are exactly the same.

Q1. Does this mean that I can load a pre-trained model from cars and then continue to train for 3¬4 days on scenes with 1e-4 lr? Dose this means we need a week to train scenes?
Q2. Thanks for that.
Q3. Alright. But I don't see something related to the sample number. Could you tell me how many points do you feed the network at once? And could I assume the batchsize is just 1 and the threshold is 1 ? Do you just feed all the points of the single cube into the network at once instead of 50000?
(For example, there are 3 boundary sampled point cloud a, b, c, their point number should be similar, so the sample num could just be set to a.shape[0], and then use the strategy you said in supplementary: 1% of samples from σ = 0.16, 49% of samples from σ = 0.04 and 50% of samples from σ = 0.01. The sample should be the mix of 1% of a, 49% of b, 50% of c) Is this correct?

Q4. This is what I meant: 0.02 and 0.003 is the loss of scenes which I got use your pre-trained model and my model trained on scenes, the pre-trained model has higher loss but better performance.
(0.02) is what I get when I load you pre-trained model to train on scenes but before starting to train, so it's the 1st step's loss of training from loading. (0.003) is the converged loss I trained solely on scenes. Why is this weird? -> As you can see from the qualitative result, the best performing one is directly using pre-trained model without training. So I think if we load your pre-trained model to train on scenes, the loss value at the starting point(1st step, before start to train) should have already been lower than the one converged on scenes. However it's not.

Do you also load the learning rate? I'll keep on trying this one and keep updated.

Q5. Isn't 90K points for a cube too much? Some of the scenes have hundreds of cubes.

About your Summary: probably there is something off in your training code, maybe some alignment issue.
~I think it might not be the case. Because I just use your preprocessing code and adapted from your dataloader to feed these cubes into the network. There is not any transformation operation at all.

Updates(7.7.2021):
I tried loading the model from your cars' pretrained and continue to train on cars for another 4 days. But the performance is still not as good as directly using the pre-trained model. It seems something is wrong with the data. Could you check your pre-processing scripts and provide a pre-trained model on scenes? Thank you very much. Because the split.txt you provided also has some problem.


About qualitative examples, just like the title of each picture. I did some kinds of evaluation. Let me put it clearer:

  • Setting 1: Picture 3, 4. (No training phase, just testing)Load a cars' pre-trained model and directly perform generalization ability on scenes. I got best qualitative results using this setting, but noting that the loss value is the highest one (0.02).

  • Setting 2: Picture 1, 2. Trained on Gibson scenes from scratch. The loss value is much lower than setting1 (0.003), but as you can see from the picture, the results are much worse than setting1.

  • Setting 3: No picture presented. Load a cars' pre-trained model and continue to train on scenes (basically just as you said in Q1's reply.) I haven't trained for enough time yet, the loss has decreased from 0.02 in setting1 to about 0.003~0.005. However, the qualitative performance has also degenerated comparing to setting 1.

Conclusion:
I also think there might be two reasons.

  1. The training data (e.g. distance fields) are wrong. However, I was just using the script you provided but only switching pymesh to igl. because our server can not use pymesh. Except that, I was using completely the same script as cars. The dataloader is also adapted from cars', just feeding each cube one by one to the network. It doesn't make sense, because the loss is lower, but the performance is poorer as you can see.

  2. Maybe I should train the model for longer time.

By the way, 2 possible bug in data.

  1. the split.txt has an error too. 'split_scenes_name.npz': There is a name 'BiltmoreCallicoon' in the test split, which should be 2 scenes('Biltmore' and 'Callicoon') but it's somehow become one name. And the name 'Cebolla' has been appeared twice in both the val and test split.
  2. Some boundary points have no points at all, which will lead to nan during computing average loss.

Here I have 2 screen shots during dense_point_cloud generation to help you see the problem, I printed the pred_df in each refinement step:

image
image
image

We can see that after each refinement step, the setting 2's df sometimes even would remain unchanged, it seems that it's not moving towards the surface at all, but circling around the surface. So, either the df is false, or the gradients are false, I think. And when I load the model from cars to continue training, it would gradually become similar to setting2's situation.

Best,
Zhengdi

@jchibane
Copy link
Owner

Hey Zhengdi,

you can try if you can work with our trained model for scenes:
https://nextcloud.mpi-klsb.mpg.de/index.php/s/QMoPT26Dnp92yL6

Best,
Julian

@ZhengdiYu
Copy link
Author

ZhengdiYu commented Jul 15, 2021

Hey Zhengdi,

you can try if you can work with our trained model for scenes:
https://nextcloud.mpi-klsb.mpg.de/index.php/s/QMoPT26Dnp92yL6

Best,
Julian

Hi Julian,

Thank you for your model. I have tried this model today. Unfortunately, the problem still exists. The iteration takes much more times than car's model. And I found the performance is not even better than before. The inside is good, but there are too many false points outside and on the wall.

Are you sure you are using the same setting in generation script? I use exactly the same script as cars to generate dense pcd for each cube, and then transform them into the world coordinate system. In open3d, It looks like this:
image
image
The inside looks good, but there are too many false points outside the room

The only difference between car and scenes is that we need to fuse the individual cubes into the complete scene after generation. But I think it's not a problem of the way I fuse the cubes. Because the performance using the car's model looks considerable (see my previous comment, the qualitative result). But using this model, I can't get a better result.

This is how I transform each cube into the world coordinate system and stack them together. Just several lines of code referring to your scene_process.py:

your scene_process.py:
https://github.com/aymenmir1/ndf-1/blob/7919efe87f577399b5b3d64239091fbc1261dbe1/dataprocessing/scene_process.py#L156-L167

            verts_inds = find_verts(boundary_points, min_x, min_x + 2.5, min_y, min_y + 2.5, min_z, min_z + 2.5)


            cube_df = df[verts_inds]
            cube_points = boundary_points[verts_inds]


            cube_points2 = cube_points[:] - cube_corner
            grid_cube_points =cube_points2.copy()
            grid_cube_points[:, 0], grid_cube_points[:, 2] = cube_points2[:, 2], cube_points2[:, 0]
            grid_cube_points = grid_cube_points / 2.5
            grid_cube_points = 2 * grid_cube_points - 1


            out_path = output_path + '/{}/{}/'.format(scan_name, cube_corner)
            os.makedirs(out_path, exist_ok=True)

How I fuse individual cubes:

for scene in scene_list:
    cube_path = join(dense_point_cloud_path, scene)

    duration = 0
    scene_point_cloud = np.zeros((0, 3))
    for cube_name in glob(cube_path + '/*'):
        generation = np.load(cube_name)
        grid_cube_points = generation['point_cloud']
        duration += generation['duration']

        cube_corner = cube_name.split('_')[0].split('/')[-1][1:-1]
        cube_corner = np.array(cube_corner.split(', ')).astype(np.float)

        cube_points = grid_cube_points.copy()
        cube_points[:, 0], cube_points[:, 2] = grid_cube_points[:, 2], grid_cube_points[:, 0]
        cube_points = (cube_points + 1) / 2
        cube_points = cube_points * 2.5
        cube_points = cube_points[:] + cube_corner

        scene_point_cloud = np.vstack((scene_point_cloud, cube_points))

    print(cube_path)
    np.savez(cube_path + '/' + 'dense_point_cloud_{}'.format(num_steps), point_cloud=scene_point_cloud, duration=duration)
    print(f'finish {scene}')

I also tried to decrease the range here from [-1.5, 1.5] to [-1, 1](so * 2 -1), it'll help to eliminate some of the false points. But there are still many of them remaining:

ndf/models/generation.py

Lines 27 to 30 in 570d770

sample_num = 200000
samples_cpu = np.zeros((0, 3))
samples = torch.rand(1, sample_num, 3).float().to(self.device) * 3 - 1.5
samples.requires_grad = True

image

Best,
Zhengdi

@ZhengdiYu
Copy link
Author

Hey Julian,

I found that the model your gave is recorded 150h and 108 epochs. I was wondering that how many epochs is for car and how many for scenes? It'll need ~100 hours to train 40 epochs on cars and ~200 hours to train 20 epochs on scenes. So it seems impossible in either way to train ~100 epochs in 150 hours.

Could you tell me? Thank you!

Best,
Zhengdi

@miraymen
Copy link

Hi Zhengdi

Thanks for pointing out the problems in the scenes processing. Both of them should be resolved now.

Thanks
Aymen

@ZhengdiYu
Copy link
Author

Hey Aymen,

Is there any other updates? I only found that you have updated the split file. But there is still no scripts for scene's training and testing

Best,
Zhengdi

@miraymen
Copy link

There are also some changes in the scene processing file.

@ZhengdiYu
Copy link
Author

I can only see that you have eliminated the cubes when len(verts_inds) == 0. And some of the path issues. However, I don't think these changes will make a big difference to training and the performance. I still couldn't reproduce the performance on scenes with your provided model, your scripts.

@jchibane
Copy link
Owner

Have you tried warm-starting the model with a pre-trained model?

@ZhengdiYu
Copy link
Author

Have you tried warm-starting the model with a pre-trained model?

Yes, I did. It's even worse than directly using a pre-trained model. The more trained on scenes, the worse the performance. I have no idea if that's because I was using igl to compute udf.

@ZhengdiYu
Copy link
Author

Have you tried warm-starting the model with a pre-trained model?

Is it due to wrong df computed by np.abs(igl.signed_distance())? I saw another issue mentioned this can not be used to compute df for open surface. But I think you have also used this function on cars' open surface.

@ausumezbas
Copy link

@ZhengdiYu have you had any luck resolving any of the issues you brought up in this thread?

I am also trying to train and test NDF on Gibson scenes dataset. Would you be able to share the dataloader files and training configs that you used for Gibson scenes dataset?

I have pre-processed the data using scene_process.py script in NDF-1 repository, and I'm trying to figure out the details on how I can get started with training the network on these cubes. Is it correct that we can only use batch size of 1 for the Gibson scenes dataset due to each cube having different number of points?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants