Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameter Tuning Code Integration: update evaluation and add egfr gold standard #193

Merged
merged 31 commits into from
Mar 21, 2025

Conversation

ntalluri
Copy link
Collaborator

@ntalluri ntalluri commented Nov 4, 2024

No description provided.

@ntalluri ntalluri requested a review from agitter November 18, 2024 18:33
@ntalluri
Copy link
Collaborator Author

Note for self: Try seeing if it makes more sense to separate the parameter tuning and evaluation code into their classes for organization

@ntalluri
Copy link
Collaborator Author

I think I need to redo the idea for using ensembling for parameter tuning.

Currently the code will take in the ensemble file and build a node ensemble file by processing an ensemble of edge frequencies to identify the highest frequency associated with each node (y_scores). This is then compared to the node gold standard (y_trues). Then these are plotted (with no point labels) to a graph showing the PRC between the nodes in the output pathways and the gold standard, as well as the average PR between all the nodes.

I don’t think this can be used to do ensemble parameter tuning the way I think I would do it. It could be used to help parameter tune though to get a better grid search or for evaluation in general.

  • also the visualization for this could be way better (it lacks label information making it hard to fully understand)

Copy link
Collaborator

@agitter agitter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I need to redo the idea for using ensembling for parameter tuning.

Is there a way to break this huge pull request into smaller parts? Currently it combines evaluation, parameter tuning, ensembling, etc. That is making it hard to coordinate all of those decisions and also review the big pull request. I'm wondering if we can start small and merge in a subset of the file changes to lock in some progress, like the evaluation code.

If it is all interdependent, we'll have to deal with that and proceed as is.

@agitter
Copy link
Collaborator

agitter commented Mar 7, 2025

Is there a way to break this huge pull request into smaller parts?

In response to this, we now have separate follow up pull requests #207, #208, #209.

@ntalluri
Copy link
Collaborator Author

ntalluri commented Mar 11, 2025

@agitter Start with this PR first (then follow up with the pull requests #207, #208, #209 after this one is merged)

Included in this PR:

  1. updated code to decouple the ML and evaluation code from each other in the config file
  2. new test cases for the config file to test different coupling situations
  3. the new egfr gold standard node dataset (still need to add where this came from)
  4. a new egfr parameter tuned config.yaml file

Copy link
Collaborator

@agitter agitter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of complexity involved just in this config file parsing. It is good you added so many test cases.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the overall goal of this huge config file? Why does it use a long list of individual runs instead of a list or other syntax that enumerates parameter combinations? Some context would be helpful.

If this is being added to support the SPRAS benchmarking, does it belong in the spras-benchmarking repo and not here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is huge because I used yaml prettify.

This was for benchmarking the egfr dataset. We can keep it in the spras-benchmarking repo, but a while back you mentioned how it would be an example we would want to keep in the SPRAS repo.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind the YAML syntax and whether we use a more compact or expanded format. I was referring to how many separate runs there are, >300 I believe. When I previously suggested that we put the parameter tuning config in this repo, I may have anticipated we would have far fewer parameter combinations. This now seems like a heavy duty analysis instead of an example of how to use the SPRAS software. My perspective is that config files demonstrating how to use (or test) SPRAS can say here, and those used to do analysis with SPRAS can go in the benchmarking repo. Does that seem like a good division?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to go through and clean this up. I think when I was tuning before, I had code that made separate runs based on my chosen heuristics; this isn't really needed and everything can be in a list.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind the YAML syntax and whether we use a more compact or expanded format. I was referring to how many separate runs there are, >300 I believe. When I previously suggested that we put the parameter tuning config in this repo, I may have anticipated we would have far fewer parameter combinations. This now seems like a heavy duty analysis instead of an example of how to use the SPRAS software. My perspective is that config files demonstrating how to use (or test) SPRAS can say here, and those used to do analysis with SPRAS can go in the benchmarking repo. Does that seem like a good division?

Yes this is a good division.

ntalluri and others added 2 commits March 17, 2025 10:15
Co-authored-by: Anthony Gitter <[email protected]>
@ntalluri ntalluri changed the title Parameter Tuning Code Integration Parameter Tuning Code Integration: update evaluation Mar 18, 2025
@ntalluri ntalluri changed the title Parameter Tuning Code Integration: update evaluation Parameter Tuning Code Integration: update evaluation and add egfr dataset Mar 18, 2025
@ntalluri ntalluri changed the title Parameter Tuning Code Integration: update evaluation and add egfr dataset Parameter Tuning Code Integration: update evaluation and add egfr gold standard Mar 18, 2025
```

#### Edges
TBD
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sumedhars once you have the EGFR pathway edge gold standard ready, we can update this documentation in a follow up pull request.

@agitter agitter merged commit c7e2b18 into Reed-CompBio:master Mar 21, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants