fix issue in eda process when NA values are present #197

mitokic · 2025-12-16T01:41:14Z

This pull request includes a version bump and minor improvements to the data preprocessing steps in the exploratory data analysis (EDA) functions. The most important changes are focused on improving data quality by handling missing values before analysis.

Data preprocessing improvements:

In the run_xreg_analysis function in R/agent_eda.R, added a step to drop rows with any NA values from the filtered input data before analysis to ensure cleaner data is used.
In the run_all_eda_per_combo function in R/agent_eda.R, added a step to drop rows where the Target column is NA after filtering by date, improving the reliability of downstream analysis.

Version updates:

Bumped the package version from 0.6.0.9015 to 0.6.0.9016 in both the DESCRIPTION and NEWS.md files to reflect these changes. [1] [2]

Copilot

Pull request overview

This pull request adds NA value handling to two EDA functions to prevent analysis failures when missing values are present. The version is bumped from 0.6.0.9015 to 0.6.0.9016 to reflect these changes.

Added tidyr::drop_na() to run_xreg_analysis to remove rows with any NA values before external regressor correlation analysis
Added tidyr::drop_na(Target) to run_all_eda_per_combo to remove rows with NA Target values before running EDA analyses
Updated version number in DESCRIPTION and NEWS.md

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
R/agent_eda.R	Added NA value filtering in two EDA functions: aggressive filtering (all NAs) in `run_xreg_analysis` and targeted filtering (Target NAs only) in `run_all_eda_per_combo`
NEWS.md	Updated development version number to 0.6.0.9016
DESCRIPTION	Updated package version to 0.6.0.9016

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-16T01:43:55Z

R/agent_eda.R

        dplyr::filter(Date <= hist_end_date) %>%
-        dplyr::arrange(Date)
+        dplyr::arrange(Date) %>%
+        tidyr::drop_na() # drop rows with any NA values


Using tidyr::drop_na() without arguments drops rows with NA in ANY column (including Target, Combo, Date, and all regressors). This is overly aggressive for external regressor analysis.

Looking at the correlation calculation below (lines 1408-1415), NA values are already handled appropriately per-regressor with keep <- !(is.na(x) | is.na(y)). The current approach may unnecessarily drop rows where a specific regressor being analyzed has valid values, but another regressor in the dataset has NA.

Consider using tidyr::drop_na(Target) instead to only drop rows where the Target is missing, allowing the existing per-regressor NA handling logic to work as designed.

Suggested change

tidyr::drop_na() # drop rows with any NA values

tidyr::drop_na(Target) # drop rows where Target is NA

R/agent_eda.R

fix issue in eda process when NA values are present

bbd1866

mitokic requested a review from Copilot December 16, 2025 01:41

Copilot started reviewing on behalf of mitokic December 16, 2025 01:41 View session

Copilot AI reviewed Dec 16, 2025

View reviewed changes

mitokic marked this pull request as ready for review December 16, 2025 03:09

mitokic requested a review from smdp2000ofc December 16, 2025 03:09

smdp2000ofc approved these changes Dec 16, 2025

View reviewed changes

mitokic merged commit 4030c7f into main Dec 16, 2025
9 of 16 checks passed

mitokic deleted the mitokic/12152025/eda-na-fix branch December 16, 2025 13:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix issue in eda process when NA values are present #197

fix issue in eda process when NA values are present #197

Uh oh!

mitokic commented Dec 16, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	tidyr::drop_na() # drop rows with any NA values
	tidyr::drop_na(Target) # drop rows where Target is NA

fix issue in eda process when NA values are present #197

fix issue in eda process when NA values are present #197

Uh oh!

Conversation

mitokic commented Dec 16, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants