Skip to content

Conversation

@mitokic
Copy link
Collaborator

@mitokic mitokic commented Dec 16, 2025

This pull request includes a version bump and minor improvements to the data preprocessing steps in the exploratory data analysis (EDA) functions. The most important changes are focused on improving data quality by handling missing values before analysis.

Data preprocessing improvements:

  • In the run_xreg_analysis function in R/agent_eda.R, added a step to drop rows with any NA values from the filtered input data before analysis to ensure cleaner data is used.
  • In the run_all_eda_per_combo function in R/agent_eda.R, added a step to drop rows where the Target column is NA after filtering by date, improving the reliability of downstream analysis.

Version updates:

  • Bumped the package version from 0.6.0.9015 to 0.6.0.9016 in both the DESCRIPTION and NEWS.md files to reflect these changes. [1] [2]

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds NA value handling to two EDA functions to prevent analysis failures when missing values are present. The version is bumped from 0.6.0.9015 to 0.6.0.9016 to reflect these changes.

  • Added tidyr::drop_na() to run_xreg_analysis to remove rows with any NA values before external regressor correlation analysis
  • Added tidyr::drop_na(Target) to run_all_eda_per_combo to remove rows with NA Target values before running EDA analyses
  • Updated version number in DESCRIPTION and NEWS.md

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
R/agent_eda.R Added NA value filtering in two EDA functions: aggressive filtering (all NAs) in run_xreg_analysis and targeted filtering (Target NAs only) in run_all_eda_per_combo
NEWS.md Updated development version number to 0.6.0.9016
DESCRIPTION Updated package version to 0.6.0.9016

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

dplyr::filter(Date <= hist_end_date) %>%
dplyr::arrange(Date)
dplyr::arrange(Date) %>%
tidyr::drop_na() # drop rows with any NA values
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using tidyr::drop_na() without arguments drops rows with NA in ANY column (including Target, Combo, Date, and all regressors). This is overly aggressive for external regressor analysis.

Looking at the correlation calculation below (lines 1408-1415), NA values are already handled appropriately per-regressor with keep <- !(is.na(x) | is.na(y)). The current approach may unnecessarily drop rows where a specific regressor being analyzed has valid values, but another regressor in the dataset has NA.

Consider using tidyr::drop_na(Target) instead to only drop rows where the Target is missing, allowing the existing per-regressor NA handling logic to work as designed.

Suggested change
tidyr::drop_na() # drop rows with any NA values
tidyr::drop_na(Target) # drop rows where Target is NA

Copilot uses AI. Check for mistakes.
@mitokic mitokic marked this pull request as ready for review December 16, 2025 03:09
@mitokic mitokic requested a review from smdp2000ofc December 16, 2025 03:09
@mitokic mitokic merged commit 4030c7f into main Dec 16, 2025
9 of 16 checks passed
@mitokic mitokic deleted the mitokic/12152025/eda-na-fix branch December 16, 2025 13:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants