-
Notifications
You must be signed in to change notification settings - Fork 20
Quant methods #291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
dajmcdon
wants to merge
20
commits into
tidymodels:main
Choose a base branch
from
dajmcdon:quant-methods
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Quant methods #291
Changes from all commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
e797cd4
implementation complete
dajmcdon 0dee1f8
vctrs is already imported
dajmcdon c96d012
switch back to map()
dajmcdon 1350550
cover various cases for median
dajmcdon 1f9cede
add tests for quantile_pred vec methods
dajmcdon 09983db
test for `quantile_pred` type
dajmcdon 69cbe89
add quantile imputation, document
dajmcdon 254fec2
minor doc edits
dajmcdon 92a51b4
add tests for quantile imputation
dajmcdon 2fffc7d
need dplyr, replace %>% with |>
dajmcdon f7bfb23
redocument for links
dajmcdon 8ebb6b4
muffle pkgdown build error
dajmcdon e890000
document is_quantile_pred within quantile_pred
dajmcdon dce6894
remove dplyr import
dajmcdon 0714987
Apply suggestions from code review
dajmcdon c7c7896
resolve most PR comments
dajmcdon 892884a
redocument
dajmcdon d2801f8
Merge branch 'main' into quant-methods
dajmcdon 4dc100f
fix: incorrect snap with probs as mandatory
dajmcdon bd2121e
fix: refactor interior imputation to avoid switch
dajmcdon File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,228 @@ | ||
| #' Restrict numeric data to the interval \[lower, upper\] | ||
| #' | ||
| #' @param x a numeric vector | ||
| #' @param lower number, the lower bound | ||
| #' @param upper number, the upper bound | ||
| #' @param ... unused | ||
| #' @export | ||
| #' | ||
| #' @return An object of the same type as `x` | ||
| #' | ||
| #' @keywords internal | ||
| snap <- function(x, lower, upper, ...) { | ||
| UseMethod("snap") | ||
| } | ||
|
|
||
| #' @export | ||
| snap.default <- function(x, lower, upper, ...) { | ||
| cli::cli_abort( | ||
| "No {.fn snap} method provided for {.obj_type_friendly {x}}." | ||
| ) | ||
| } | ||
|
|
||
| #' @export | ||
| snap.numeric <- function(x, lower, upper, ...) { | ||
| rlang::check_dots_empty() | ||
| check_number_decimal(lower) | ||
| check_number_decimal(upper) | ||
|
|
||
| pmin(pmax(x, lower), upper) | ||
| } | ||
|
|
||
| #' @export | ||
| snap.quantile_pred <- function(x, lower, upper, ...) { | ||
dajmcdon marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| rlang::check_dots_empty() | ||
| if (vec_size(x) == 0) { | ||
| return(x) | ||
| } | ||
| values <- as.matrix(x) | ||
| quantile_levels <- extract_quantile_levels(x) | ||
| values <- map(vctrs::vec_chop(values), ~ snap(.x, lower, upper)) | ||
| quantile_pred(do.call(rbind, values), quantile_levels = quantile_levels) | ||
| } | ||
|
|
||
|
|
||
| #' Impute additional quantiles from a `quantile_pred` | ||
| #' | ||
| #' While a [quantile_pred] describes evaluations for the inverse | ||
| #' cummulative distribution function (CDF, sometimes called the "quantile | ||
| #' function") at particular quantile levels, this is not enough | ||
| #' to fully describe the distribution. For example, | ||
| #' ```r | ||
| #' p <- c(.1, .5, .9) | ||
| #' quantile_pred(matrix(qnorm(p), nrow = 1), p) | ||
| #' ``` | ||
| #' encapsulates the 10%, 50%, and 90% quantile levels of the standard normal distribution. | ||
| #' But, what if we need, say, the 25% and 75% levels? This function imputes | ||
| #' them if possible. | ||
| #' | ||
| #' @details | ||
| #' If `probs` is simply a subset of `quantile_levels` that already exist in `x`, | ||
| #' then these will be returned (up to numeric error). Small errors are possible | ||
| #' due to difficulties matching double vectors. | ||
| #' | ||
| #' For `probs` that do not exist in `x`, these will be interpolated or | ||
| #' extrapolated as needed. The process has 3 steps. | ||
| #' | ||
| #' First, by default (`middle = "cubic"`), missing _internal_ quantile levels are | ||
dajmcdon marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| #' interpolated using a cubic spline fit to the observed values + quantile | ||
| #' levels with | ||
| #' [stats::splinefun]. Second, if cubic interpolation fails (or if | ||
| #' `middle = "linear"`), linear interpolation is used via [stats::approx]. | ||
| #' Finally, missing _external_ quantile levels (those outside the range of | ||
| #' `quantile_levels`) are extrapolated. This is done using a linear fit on the | ||
| #' logistic scale to the two closest tail values. | ||
| #' | ||
| #' This procedure results in sorted quantiles that interpolate/extrapolate | ||
| #' smoothly, while also enforcing heavy tails beyond the range. | ||
| #' | ||
| #' Optionally, the resulting quantiles can be constrained to a compact interval | ||
| #' using `lower` and/or `upper`. This is done after extrapolation, so it may | ||
| #' result in multiple quantile levels having the same value (a CDF with a spike). | ||
| #' | ||
| #' | ||
| #' @param x an object of class [quantile_pred] | ||
| #' @param probs vector. probabilities at which to evaluate the inverse CDF | ||
| #' @param lower number. lower bound for the resulting values | ||
| #' @param upper number. upper bound for the resulting values | ||
| #' @param middle character. if `middle = 'cubic'` (the default), a cubic | ||
| #' spline is used for interpolation where possible; `middle='linear'` | ||
| #' interpolates linearly; see Details below. | ||
| #' | ||
| #' @returns A matrix with `length(probs)` columns and `length(x)` rows. Each | ||
| #' row contains the inverse CDF (quantile function) given by `x`, | ||
| #' extrapolated/interpolated to `probs`. | ||
| #' @export | ||
| #' | ||
| #' @examples | ||
| #' p <- c(.1, .5, .9) | ||
| #' qp <- quantile_pred(matrix(c(qnorm(p), qexp(p)), nrow = 2, byrow = TRUE), p) | ||
| #' impute_quantiles(qp, p) | ||
| #' as.matrix(qp) # same as the imputation | ||
| #' | ||
| #' p1 <- c(.05, .25, .75, .95) | ||
| #' impute_quantiles(qp, p1) | ||
| #' rbind(qnorm(p1), qexp(p1)) # exact values, for comparison | ||
| impute_quantiles <- function( | ||
| x, | ||
| probs, | ||
| lower = -Inf, | ||
| upper = Inf, | ||
| middle = c("cubic", "linear") | ||
| ) { | ||
| if (!is_quantile_pred(x)) { | ||
| cli::cli_abort( | ||
| "{.arg x} must be a {.cls quantile_pred} object, not | ||
| {.obj_type_friendly {x}}." | ||
| ) | ||
| } | ||
| if (length(extract_quantile_levels(x)) < 2) { | ||
| cli::cli_abort( | ||
| "Quantile interpolation is not possible when fewer than 2 quantiles | ||
| are avaliable." | ||
| ) | ||
| } | ||
| if (is.unsorted(probs)) { | ||
| probs <- sort(probs) | ||
| } | ||
| check_quantile_level_values(probs, "probs", call = caller_env()) | ||
| check_number_decimal(lower) | ||
| check_number_decimal(upper) | ||
| if (lower >= upper) { | ||
| cli::cli_abort( | ||
| "{.arg lower} ({lower}) must be less than {.arg upper} ({upper})." | ||
| ) | ||
| } | ||
| middle <- rlang::arg_match(middle) | ||
| snap(impute_quantile_internal(x, probs, middle), lower, upper) | ||
| } | ||
|
|
||
| impute_quantile_internal <- function(x, probs_out, middle) { | ||
| probs_in <- extract_quantile_levels(x) | ||
| vals_in <- as.matrix(x) | ||
| if (all(probs_out %in% probs_in) && !anyNA(vals_in)) { | ||
| return(vals_in[, match(probs_out, probs_in), drop = FALSE]) | ||
| } | ||
| vals_out <- map( | ||
| vctrs::vec_chop(vals_in), | ||
| ~ impute_quantiles_single(.x, probs_in, probs_out, middle) | ||
| ) | ||
| vals_out <- do.call(rbind, vals_out) | ||
| vals_out | ||
| } | ||
|
|
||
| impute_quantiles_single <- function(vals_in, probs_in, probs_out, middle) { | ||
| vals_out <- rep(NA, length(probs_out)) | ||
| good <- !is.na(vals_in) | ||
| if (!any(good)) { | ||
| return(vals_out) | ||
| } | ||
| vals_in <- vals_in[good] | ||
| probs_in <- probs_in[good] | ||
|
|
||
| # in case we only have one point, and it matches something we wanted | ||
| if (length(good) < 2) { | ||
| matched_one <- probs_out %in% probs_in | ||
| vals_out[matched_one] <- vals_in[matched_one] | ||
| return(vals_out) | ||
| } | ||
|
|
||
| below <- probs_out < min(probs_in) | ||
| above <- probs_out > max(probs_in) | ||
| interior <- !below & !above | ||
|
|
||
| if (any(interior)) { | ||
| if (middle == "cubic") { | ||
| result <- tryCatch( | ||
| { | ||
| interp_fun <- stats::splinefun(probs_in, vals_in, method = "hyman") | ||
| quartiles <- interp_fun(c(.25, .5, .75)) | ||
| }, | ||
| error = function(e) { | ||
| return(NA) | ||
| } | ||
| ) | ||
| if (any(is.na(result))) middle <- "linear" | ||
| } | ||
| if (middle == "linear") { | ||
| interp_fun <- function(probs) stats::approx(probs_in, vals_in, probs)$y | ||
| quartiles <- interp_fun(c(.25, .5, .75)) | ||
| } | ||
| vals_out[interior] <- interp_fun(probs_out[interior]) | ||
| } | ||
| if (any(below) || any(above)) { | ||
| interior_data <- data.frame( | ||
| probs = c(probs_in, probs_out[interior]), | ||
| vals = c(vals_in, vals_out[interior]) | ||
| ) | ||
| interior_data <- interior_data[vctrs::vec_unique_loc(interior_data$probs), ] | ||
| interior_data <- interior_data[vctrs::vec_order(interior_data$probs), ] | ||
| } | ||
| if (any(below)) { | ||
| left_tail_data <- utils::head(interior_data, 2) | ||
| vals_out[below] <- tail_extrapolate(probs_out[below], left_tail_data) | ||
| } | ||
| if (any(above)) { | ||
| right_tail_data <- utils::tail(interior_data, 2) | ||
| vals_out[above] <- tail_extrapolate(probs_out[above], right_tail_data) | ||
| } | ||
| vals_out | ||
| } | ||
|
|
||
| logit <- function(p) { | ||
| p <- pmax(pmin(p, 1), 0) | ||
| log(p) - log(1 - p) | ||
| } | ||
|
|
||
| # extrapolates linearly on the logistic scale using | ||
| # the two points nearest the tail | ||
| tail_extrapolate <- function(tail_probs, interior) { | ||
| if (nrow(interior) == 1L) { | ||
| return(rep(interior$vals[1], length(tail_probs))) | ||
| } | ||
| x <- logit(interior$probs) | ||
| x0 <- logit(tail_probs) | ||
| y <- interior$vals | ||
| m <- diff(y) / diff(x) | ||
| m * (x0 - x[1]) + y[1] | ||
| } | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.