You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Firstly, why are the test results better than the training results?
Yes, in FLD, I noticed that your validation (val) results have an mAP of 0.754, but the best result in training is only 0.72456. I would like to understand why the test results are better than the training results. As you mentioned in your paper, the FLD dataset is split in half, with one part for training and the other for testing (val). So, it seems to me that the strongest data in training should be equal to the best results in val. Indeed, the best result in both cases occurs at epoch 19, as seen in https://github.com/tusharsangam/TransVisDrone/blob/main/runs/train/FL/image_size_1280_temporal_YOLO5L_5_frames_FL_end/results.csv. However, the mAP values at epoch 19 are different: 0.72456 in training and 0.754 in val. Does the training log correspond to this value, or were different data augmentations or evaluation metrics used in the results?
Secondly, I would like to confirm that all the models in Table 1 of your paper are experimental comparisons on the NPS and FLD datasets. Is that correct? I mean, all the data used had the same image annotation, right?
Thirdly, regarding the specific quantity of image annotations you used in your paper. You mentioned using Dogfig annotations data (https://github.com/mwaseema/Drone-Detection/tree/main/annotations). Since Dogfig's annotations are significantly fewer than the original unannotated images, I'd like to ask if you only used the reannotated annotations from Dogfig, or if you combined both the original unannotated versions and the detailed annotations from Dogfig. For example, the FLD dataset originally had 38,948 frames, but Dogfig has detailed annotations for only 20,017 frames. Did you use the union of the original and corrected versions or only the frames with detailed annotations?
Fourthly, I'm curious if we only detect frames with labels or if we detect all frames, whether they have detailed annotations or not.
The text was updated successfully, but these errors were encountered:
Firstly, why are the test results better than the training results?
Yes, in FLD, I noticed that your validation (val) results have an mAP of 0.754, but the best result in training is only 0.72456. I would like to understand why the test results are better than the training results. As you mentioned in your paper, the FLD dataset is split in half, with one part for training and the other for testing (val). So, it seems to me that the strongest data in training should be equal to the best results in val. Indeed, the best result in both cases occurs at epoch 19, as seen in https://github.com/tusharsangam/TransVisDrone/blob/main/runs/train/FL/image_size_1280_temporal_YOLO5L_5_frames_FL_end/results.csv. However, the mAP values at epoch 19 are different: 0.72456 in training and 0.754 in val. Does the training log correspond to this value, or were different data augmentations or evaluation metrics used in the results?
Secondly, I would like to confirm that all the models in Table 1 of your paper are experimental comparisons on the NPS and FLD datasets. Is that correct? I mean, all the data used had the same image annotation, right?
Thirdly, regarding the specific quantity of image annotations you used in your paper. You mentioned using Dogfig annotations data (https://github.com/mwaseema/Drone-Detection/tree/main/annotations). Since Dogfig's annotations are significantly fewer than the original unannotated images, I'd like to ask if you only used the reannotated annotations from Dogfig, or if you combined both the original unannotated versions and the detailed annotations from Dogfig. For example, the FLD dataset originally had 38,948 frames, but Dogfig has detailed annotations for only 20,017 frames. Did you use the union of the original and corrected versions or only the frames with detailed annotations?
Fourthly, I'm curious if we only detect frames with labels or if we detect all frames, whether they have detailed annotations or not.
The text was updated successfully, but these errors were encountered: