Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GroupMultiSourceSampler has a dead loop #12327

Open
chenjiaqiang-a opened this issue Mar 12, 2025 · 0 comments
Open

GroupMultiSourceSampler has a dead loop #12327

chenjiaqiang-a opened this issue Mar 12, 2025 · 0 comments
Assignees

Comments

@chenjiaqiang-a
Copy link

Describe the bug

While employing personal data for semi-supervised object detection, I encountered an issue where the data could not be read properly, resulting in an infinite loop during data reading without any error messages. After thorough investigation, it was determined that the problem lies within a logical flaw in the GroupMultiSourceSampler.

The basic working principle of GroupMultiSourceSampler is to divide the data into four parts based on group and source. There are two groups: images with width greater than or equal to height, and images with width less than height. There are two sources: labeled data and unlabeled data. As shown in the figure below.

Image

The issue I encountered is that all images in the labeled data have a width greater than or equal to their height, while the unlabeled data contains images from both groups. This leads to a problem in the following code segment.

class GroupMultiSourceSampler(MultiSourceSampler):
    def __init__(self,
......
        self.group_source2inds = [{
            source:
            self._indices_of_rank(self.group2size_per_source[source][group])
            for source in range(len(dataset.datasets))
        } for group in range(len(self.group_ratio))]
......
    def __iter__(self) -> Iterator[int]:
        batch_buffer = []
        while True:
            group = np.random.choice(
                list(range(len(self.group_ratio))), p=self.group_ratio)
            for source, num in enumerate(self.num_per_source):
                batch_buffer_per_source = []
                for idx in self.group_source2inds[group][source]:
......

The dead loop occurs at:

class MultiSourceSampler(Sampler):
......
    def _infinite_indices(self, sample_size: int) -> Iterator[int]:
        """Infinitely yield a sequence of indices."""
        g = torch.Generator()
        g.manual_seed(self.seed)
        while True:
            if self.shuffle:
                yield from torch.randperm(sample_size, generator=g).tolist()
            else:
                yield from torch.arange(sample_size).tolist()
......

I am not sure if this needs to be fixed, as the program runs correctly after I switched to using MultiSourceSampler.

Checklist

  1. I have searched related issues but cannot get the expected help.
  2. I have read the FAQ documentation but cannot get the expected help.
  3. The bug has not been fixed in the latest version.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants