-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parameter Tuning Code Integration: update evaluation and add egfr gold standard #193
Parameter Tuning Code Integration: update evaluation and add egfr gold standard #193
Conversation
… what idea to use
Note for self: Try seeing if it makes more sense to separate the parameter tuning and evaluation code into their classes for organization |
I think I need to redo the idea for using ensembling for parameter tuning. Currently the code will take in the ensemble file and build a node ensemble file by processing an ensemble of edge frequencies to identify the highest frequency associated with each node (y_scores). This is then compared to the node gold standard (y_trues). Then these are plotted (with no point labels) to a graph showing the PRC between the nodes in the output pathways and the gold standard, as well as the average PR between all the nodes. I don’t think this can be used to do ensemble parameter tuning the way I think I would do it. It could be used to help parameter tune though to get a better grid search or for evaluation in general.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I need to redo the idea for using ensembling for parameter tuning.
Is there a way to break this huge pull request into smaller parts? Currently it combines evaluation, parameter tuning, ensembling, etc. That is making it hard to coordinate all of those decisions and also review the big pull request. I'm wondering if we can start small and merge in a subset of the file changes to lock in some progress, like the evaluation code.
If it is all interdependent, we'll have to deal with that and proceed as is.
…ameter tuning methods
@agitter Start with this PR first (then follow up with the pull requests #207, #208, #209 after this one is merged) Included in this PR:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a lot of complexity involved just in this config file parsing. It is good you added so many test cases.
config/egfr-param-tuning.yaml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the overall goal of this huge config file? Why does it use a long list of individual runs instead of a list or other syntax that enumerates parameter combinations? Some context would be helpful.
If this is being added to support the SPRAS benchmarking, does it belong in the spras-benchmarking repo and not here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is huge because I used yaml prettify.
This was for benchmarking the egfr dataset. We can keep it in the spras-benchmarking repo, but a while back you mentioned how it would be an example we would want to keep in the SPRAS repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind the YAML syntax and whether we use a more compact or expanded format. I was referring to how many separate runs there are, >300 I believe. When I previously suggested that we put the parameter tuning config in this repo, I may have anticipated we would have far fewer parameter combinations. This now seems like a heavy duty analysis instead of an example of how to use the SPRAS software. My perspective is that config files demonstrating how to use (or test) SPRAS can say here, and those used to do analysis with SPRAS can go in the benchmarking repo. Does that seem like a good division?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to go through and clean this up. I think when I was tuning before, I had code that made separate runs based on my chosen heuristics; this isn't really needed and everything can be in a list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind the YAML syntax and whether we use a more compact or expanded format. I was referring to how many separate runs there are, >300 I believe. When I previously suggested that we put the parameter tuning config in this repo, I may have anticipated we would have far fewer parameter combinations. This now seems like a heavy duty analysis instead of an example of how to use the SPRAS software. My perspective is that config files demonstrating how to use (or test) SPRAS can say here, and those used to do analysis with SPRAS can go in the benchmarking repo. Does that seem like a good division?
Yes this is a good division.
Co-authored-by: Anthony Gitter <[email protected]>
Co-authored-by: Anthony Gitter <[email protected]>
``` | ||
|
||
#### Edges | ||
TBD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sumedhars once you have the EGFR pathway edge gold standard ready, we can update this documentation in a follow up pull request.
No description provided.