Performance Evaluation on Pavement Crack Datasets | Discussion #159

mzg0108 · 2025-01-07T21:18:18Z

Hi,
I wanted to evaluate the model on pavement crack dataset. The performance was not great. The model seems to work very well on other datasets but for the pavement crack images it does't live up to the expectations.
I have looked in the pavement images from DIS5K datasets with the predictions that the author have shared. The evaluation shows the F1-score is around 85% while the mean IOU is around 75%. While the score is not bad it is not extraordinary either.
Here are some of the images from DIS5K dataset with input images, GT and predictions. I looked into the predictions of pvt_v2_b2 models.

Can you share while training from scratch or fine-tuning the BiRefNet on pavement crack datasets what should I modify to obtain better score as well as getting the very precise segmentation for the cracks.

Currently, I trained and fine-tuned (in two separate experiments) the BiRefNet (pvt_v2_b2) on DeepCrack dataset and obtained just ok scores. The number of training samples in this dataset are 300.
Scores:
Training from scratch:
mean-IOU mean-F1-Score
0.570722621 0.726700708

Fine-Tuning:
0.575622402 0.730660343
A good F1-score on DeepCrack in the literature is over 85%.

During training from scratch or fine-tuning, I didn't freeze any part of the model, not even the backbone. Should I try that?

ZhengPeng7 · 2025-01-10T06:39:15Z

Hi, thanks for your interest and conducted experiments. Just too tired these days with illness and came back to reply to you.

I also downloaded the DeepCrack dataset. Images there seem to be low-resolution ones. You'd better change the resolution in configuration to a lower one (e.g., 512x512).

Since you can obtain decent results from the general model without specific training on crack data, I think it's very possible to achieve a very high score with fine-tuning or training from scratch on crack data.
The BiRefNet with pvt_v2_b2 backbone was only trained on DIS5K for a more comprehensive and fair comparison in the original paper, which is not highly suggested. Otherwise, I would have deployed that version on HuggingFace.

The backbone is better not to be frozen, as I tested it long ago in my experiments.

I suggest you use the standard BiRefNet to fine-tune or train from scratch with BiRefNet with swin_v1_l or swin_v1_b as the backbone. Benefitting from the FP16 I upgraded a week ago and the 512x512 resolution, the training should be able to be conducted on 24GB GPUs like 4090 with batch size >= 2.

mzg0108 · 2025-01-17T22:52:30Z

Thank you so much for your response. I greatly appreciate it.
I am sorry to hear/read about your illness. May you get well soon and fully.

Ill try the swin models.

Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Evaluation on Pavement Crack Datasets | Discussion #159

Performance Evaluation on Pavement Crack Datasets | Discussion #159

mzg0108 commented Jan 7, 2025 •

edited

Loading

ZhengPeng7 commented Jan 10, 2025

mzg0108 commented Jan 17, 2025

Performance Evaluation on Pavement Crack Datasets | Discussion #159

Performance Evaluation on Pavement Crack Datasets | Discussion #159

Comments

mzg0108 commented Jan 7, 2025 • edited Loading

ZhengPeng7 commented Jan 10, 2025

mzg0108 commented Jan 17, 2025

mzg0108 commented Jan 7, 2025 •

edited

Loading