Skip to content

Feature: Add Imbalanced Data Policy to Active Learning Environment #7

@isaquepim

Description

@isaquepim

Feature: Add Imbalanced Data Policy to Active Learning Environment

Context

Currently, the active learning environment does not explicitly handle class imbalance. This can negatively impact model performance, especially in scenarios where minority classes are underrepresented.

There is relevant literature that could guide this implementation, such as:
Minority Class Oriented Active Learning for Imbalanced Datasets
https://arxiv.org/pdf/2202.00390

Proposal

Add an imbalance-aware data policy to the Active Learning module.

Possible directions:

  • Implement a minority-class-oriented sampling strategy
  • Introduce heuristics to prioritize underrepresented classes
  • Allow dynamic policy adjustment based on dataset distribution

Open Questions

  • Should we support heuristic rules based on RegEx queries?
  • Should the active learning policy be dynamically adjusted based on RegEx-defined filters?
  • How configurable should the imbalance strategy be (fixed strategy vs user-defined)?

💡 Future Related Ideas

  • Model personalization based on feature subsets
  • Adaptive strategies that evolve during AL cycles
  • Visualization of class distribution over iterations

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions