-
Notifications
You must be signed in to change notification settings - Fork 5
Dataset Preparation for ML #159
Copy link
Copy link
Open
Description
Description
Prepare the labeled dataset for machine learning by cleaning, formatting, and structuring it for training. This step ensures high-quality input data for the model. All outputs must be stored inside the inference/ folder.
Tasks
- Create script prepare_data.py inside inference/
- Read dataset from inference/labeled_flight_data.csv
- Remove unnecessary columns (e.g., timestamp if not used for training)
- Handle missing or null values (drop or fill appropriately)
- Encode event_label into numerical format (e.g label encoding)
- Select input features: altitude, heading, vertical_speed, velocity, roll, pitch, yaw, g_force
- Separate features (X) and labels (y)
- Split dataset into training and testing sets (e.g 80/20 split)
- Normalize or scale feature values
- Save and push the processed dataset to inference/dataset
- Push code to inference/
Acceptance Criteria
- Dataset is cleaned and ready for ML training
- Features and labels are correctly separated
- Train-test split is implemented
- Encoded labels are consistent and valid
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request
Type
Projects
Status
Ready