Skip to content

Find features that qualify as training data #30

@hafezissa

Description

@hafezissa

Expanding on ISSUE 19, we want to narrow down the features for training. Ideally, there should be an eligibility criteria for whether a feature can be used for training. We have an idea of what that criteria should look like:

  • Look only at float64 and int64 -- some float64 should be converted to integers
  • Disregard features that have strictly one unique value, ie. all ones, all zeros, all nulls
  • Calculate the percentage of NULL for each feature, if the percentage is higher than 1/3 (this threshold is just an example) then disregard that feature
  • Any other ways we can eliminate these features?

DoD: Number/List of features that pass the criteria.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions