Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New admiralonco function: derive_param_confirmed_bor #24

Closed
rossfarrugia opened this issue Apr 6, 2022 · 19 comments · Fixed by #35
Closed

New admiralonco function: derive_param_confirmed_bor #24

rossfarrugia opened this issue Apr 6, 2022 · 19 comments · Fixed by #35
Assignees

Comments

@rossfarrugia
Copy link
Contributor

rossfarrugia commented Apr 6, 2022

Feature Idea

derive_param_confirmed_bor: checks for best confirmed overall response from OVR (where ANL01FL=”Y”) as default

Flexibility needed:

  • to change the input source parameter from OVR, or the source of PD date
  • to change the ADSL reference date (e.g. TRTSDT/RANDDT) and the xxx days after this time to allow an SD or NON-CR/NON-PD to count from
  • greater than or equals yy days apart check for confirming the response

Details: checks for confirmed best overall response from PARAMCD=”OVR” (where ANL01FL=”Y”) up to/including first PD as default and sets new parameter record with AVAL/AVALC from the source record. Set ADT as the first date where the condition is fulfilled.
BOR is set to 'NE', in the case where the subject has only AVALC = 'SD' or 'NON-CR/NON-PD' less than xxx days after the reference date from ADSL.
Note: it is important that ANL01FL=”Y” provides only the eligible records to be considered for this derivation, as explained in the ADRS vignette.

Relevant Input

pd <- date_source(
  dataset_name = “adrs”,
  date = ADT,
  filter = PARAMCD == “PD” & AVALC == “Y”
)

derive_param_confirmed_bor(
  dataset,
  dataset_adsl,
  filter_source,
  source_pd,
  source_datasets,
  reference_date,
  ref_start_window,
  ref_confirm,
  max_nr_ne = 1,
  accept_sd = FALSE,
  missing_as_ne = FALSE,
  aval_fun = aval_resp,
  set_values_to,
  subject_keys = vars(STUDYID, USUBJID)
)

Note: for source_pd a date_source() object is expected. See https://pharmaverse.github.io/admiral/reference/derive_param_tte.html as an example of how this works.

Relevant Output

New parameter: Covers CBOR (for confirmed responses) from ADRS workflow doc.

Reproducible Example/Pseudo Code

derive_param_confirmed_bor(
  dataset = adrs,
  dataset_adsl = adsl,
  filter_source = PARANCD == "OVR" & ANL01FL == "Y",
  source_pd = list(pd),
  source_datasets = list(adrs = adrs),
  reference_date = TRTSDT,
  ref_start_window = 28,
  ref_confirm = 28,
  max_nr_ne = 1,
  accept_sd = FALSE,
  missing_as_ne = FALSE,
  aval_fun = aval_resp,
  set_values_to = vars(
	PARAMCD = "CBOR",
	PARAM = " Best Confirmed Overall Response by Investigator",
	PARCAT1 = "Tumor Response",
	PARCAT2 = "Investigator",
	PARCAT3 = "Recist 1.1",
	ANL01FL = "Y")
)
@rossfarrugia
Copy link
Contributor Author

Wait for #19 before starting

@rossfarrugia
Copy link
Contributor Author

Reminder to use filter_pd from #18 once available

@bundfussr
Copy link
Collaborator

bundfussr commented Apr 21, 2022

If data should be cut at start of anti-cancer therapy, a new variable could be created defined as minimum of first PD date and start of anti-cancer therapy. This variable is specified in the date_source() object passed to source_pd.

@bundfussr
Copy link
Collaborator

bundfussr commented Apr 25, 2022

Possible Workflow

  1. Select observations up to first PD date. (filter_pd() see New helper function: filter_pd #18)
  2. For each possible response value ("CR", "PR", "SD", ...): select valid observations and assign values for AVAL and temp_AVALN
    There are two type of selections:
    1. selections taking more than one observation into account ("CR" and "PR"), most likely this requires a helper function (see below)
    2. selections based on a single observation ("SD", "PD", "NE", "NON-CR/NON-PD"), filter() can be used, e.g., for "SD": filter(AVAL %in% c("CR", "PR", "SD") & ADT >= TRTSDT + days(ref_start_window)
      For AVALC = NA the observations from dataset_adsl are used.
  3. bind all observations from previous step (bind_rows())
  4. select the first observation with respect to temp_AVALN and ADTM for each subject (filter_extreme())
  5. create variables specified by set_values_to and AVALN as defined by avaln_fun (mutate())
  6. add new observations to input dataset (bind_rows())

Possible Workflow for "selections taking more than one observation into account"

  1. create unique observation number (derive_var_obs_nr(dataset_ovr, new_var = temp_obs_nr))
  2. join dataset with itself by subject (left_join(dataset_ovr, select(dataset_ovr, AVALC, ADT), by = USUBJID, suffix = c("", ".join")))
  3. select observations up to first possible confirming event, e.g., AVALC.join == "CR" & ADT.join >= ADT + days(ref_confirm)
    For this we could implement filter_relative as suggested in Feature Request: Implement filter_relative() function admiral#1023.
  4. select groups (defined by temp_obs_nr) which fulfill confirmation criteria, e.g., all(AVALC.join %in% c("CR", "NE")) & count_vals(var = AVALC, val = "NE") <= 1
    where count_vals() is another helper function defined as
    count_vals <- function(var, val){
      length(var[var == val])
    }
    
  5. reduce selected observations to one per temp_obs_nr (filter_extreme()) and drop AVALC.join and ADT.join (select()).

@rossfarrugia
Copy link
Contributor Author

@bundfussr probably makes sense you take on this one as it's the most complex piece and your knowledge of where to re-use things from admiral core will help, and as you're involved in the discussions with Amit and Catherine around confirmation rules.

one comment for the above is when you say around the "cut at start of anti-cancer therapy", we had agreed before in the ADRS workflow that this could be handled in ANL01FL, in case that simplifies anything.

@bundfussr
Copy link
Collaborator

@rossfarrugia , sorry I forgot about using ANL01FL for "cut at start of anti-cancer therapy". Maybe we should provide function flagging observations up to a certain date or a certain condition.

However, we may need to rethink filter_pd() due to the requirement Catherine mentioned in #19.

@rossfarrugia
Copy link
Contributor Author

@bundfussr i added a reply in that issue but for me any extra specific rules about what constitutes a PD would have to be implemented when the PD date itself gets derived, and then this is fed into filter_pd so i dont see impact to filter_pd.

@bundfussr
Copy link
Collaborator

@rossfarrugia , I started implementing the function and I wonder how we should handle the selection of the source observations. I think the initial idea was to select observations with PARAMCD == source_param & ANL01FL == "Y" from the input dataset. However, this seems very restrictive to me. If ANL01FL does not provide the correct restriction, you could not use the function. Assume for example ANL01FL is used to select one observation in case of multiple results per date and ANL02FL is used to select records from treatment start to start of anti-cancer therapy.

What about replacing source_param by filter_source? Then you could specify filter_source = PARAMCD == "OVR" & ANL01FL == "Y" & ANL02FL == "Y".

@rossfarrugia
Copy link
Contributor Author

@bundfussr sure that makes sense to me - are you OK to notify the team of this change via the Slack and update all the other issues containing source_param and the ADRS workflow doc? sorry would have done this myself but snowed under catching up from PHUSE

@bundfussr
Copy link
Collaborator

@rossfarrugia , yes, will do.

@rossfarrugia
Copy link
Contributor Author

@bundfussr not suggesting to add even more arguments to this function, but just wanted to ask what's your opinion on how best we could cover different response values like iCR / iPR when we move beyond solid tumor/RECIST 1.1 for future releases? i was wondering if simplest way could be a pre-processing step creates a temp variable with these renamed and fitting with our BOR functions expectation. This fits in with #12 on our backlog for later.

@bundfussr
Copy link
Collaborator

@rossfarrugia , the example specs for non solid tumors do not contain confirmed BOR. Thus I am not sure if recoding the response values will work. For unconfirmed BOR it will not work because there are more responses (9 responses for non solid tumors versus 6 responses for solid tumors).

I think it is unlikely that we can move beyond solid tumor/RECIST 1.1 without major changes of the functions we are implementing at the moment.

@rossfarrugia
Copy link
Contributor Author

Makes sense @bundfussr - we can look deeper into this after our foundational release for solid tumor/RECIST 1.1. we could always make different flavours of these functions dedicated to different indications and response criteria down the line (assuming we can find specs), as including all possibilities in one function would make it very difficult to maintain.

@sgorm123
Copy link
Collaborator

@rossfarrugia @bundfussr Should the reference_date column be populated for the records created in this function call? I ask, as I have done this in derive_param_bor and do not see this in derive_param_confirmed_bor.

@bundfussr
Copy link
Collaborator

@sgorm123 , good question!

I agree that it looks strange at the moment. For derive_param_confirmed_bor() we could just keep it and expect that it is included in both dataset and dataset_adsl. However, it does not work in general. For example, derive_param_confirmed_resp() does not have a reference_date parameter. So if we derive confirmed BOR and confirmed response, TRTSDT would be populated for BOR but not for response.

Maybe we could add a keep_source_vars parameter to all derive_param_* functions. All specified variables are kept if they exist in at least one of the source datasets.

@sgorm123 , @rossfarrugia , @amitjaingsk , what do you think?

@rossfarrugia
Copy link
Contributor Author

Hmm... i'm personally hoping we can avoid adding yet more arguments in the functions as they're already getting complex. One of the first steps in https://github.com/pharmaverse/admiralonco/blob/10_adrs_vignette_template/vignettes/adrs.Rmd is to join any ADSL vars you need for the downstream derivations (e.g. TRTSDT or RANDDT), so could we not just as default keep all these populated for all newly created parameter records? And then at the end the user has to merge any remaining ADSL vars anyway. I can't see any good reason where as a user I would want my already merged ADSL variables to suddenly be left blank in my newly created ADRS records.

@bundfussr
Copy link
Collaborator

The issue is that we do not know which ADSL variables were added in the first steps of the ADRS flow. Furthermore, the user may want to keep other variables like AVISIT, APHASE, ...

If we want to avoid an extra argument, we could keep all variables from the input dataset and for subjects without observations in the input dataset we could keep all variables from ADSL which are also in the input dataset. Variables which should not be populated for the new parameter or populated differently (e.g., RSSTRESC, VISIT, PARCATy, ANLzzFL, ...) could be overwritten using the set_value_to parameter. Am not sure if this is less complex than adding a keep_source_vars parameter.

@sgorm123 , @rossfarrugia , @amitjaingsk , what do you think?

@sgorm123
Copy link
Collaborator

sgorm123 commented Jun 13, 2022

I do not have a strong opinion, though I would prefer not to have a new argument and go with your suggestion @rossfarrugia , but also happy to defer to others. I do feel that the reference_date in this case should be populataed, somehow. At the moment I just assert that reference_date is in adsl and adrs.

As an aside, I think ideally I would prefer only the new columns and rows created in the function are returned back (then leave it up to the user to join back with what is passed), over binding back with the input dataset in the function. However, I presume this is an admiral decision, and pros and cons with both and I am sure discussed previously.

@rossfarrugia
Copy link
Contributor Author

i like your suggestion @bundfussr - i find it more logical from a user perspective as when i take a source record as input to create a new record i just expect that everything will be kept except the set_value_to overwrites. otherwise it would be a pain to fill the gaps later for purpose of ADaM compliance checks. can you mention on the slack channel to let everyone know this approach for their functions please

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants