Skip to content

Dataset Preparation for ML #159

@vbramhadevi

Description

@vbramhadevi

Description

Prepare the labeled dataset for machine learning by cleaning, formatting, and structuring it for training. This step ensures high-quality input data for the model. All outputs must be stored inside the inference/ folder.

Tasks

  • Create script prepare_data.py inside inference/
  • Read dataset from inference/labeled_flight_data.csv
  • Remove unnecessary columns (e.g., timestamp if not used for training)
  • Handle missing or null values (drop or fill appropriately)
  • Encode event_label into numerical format (e.g label encoding)
  • Select input features: altitude, heading, vertical_speed, velocity, roll, pitch, yaw, g_force
  • Separate features (X) and labels (y)
  • Split dataset into training and testing sets (e.g 80/20 split)
  • Normalize or scale feature values
  • Save and push the processed dataset to inference/dataset
  • Push code to inference/

Acceptance Criteria

  • Dataset is cleaned and ready for ML training
  • Features and labels are correctly separated
  • Train-test split is implemented
  • Encoded labels are consistent and valid

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

Status

Ready

Relationships

None yet

Development

No branches or pull requests

Issue actions