-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add goals to create general stats and synonym stats and rare and community stats #8812
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didnt check all the
Co-authored-by: Nico Matentzoglu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran these queries, and found the following:
- total count doesn't match:
Queries | count |
---|---|
AllDiseaseExcludingSusceptibility | 25379 |
HumanDisease | 25661 |
NonHumanAnimalDisease | 2959 |
Human disease + non-human disease are not equal to all disease excluding susceptibility.
Is it possible that "susceptibilities" are counted in the "human disease" count? if yes, then let's update to human disease = human disease excluding susceptibilities
Queries | counts |
---|---|
HumanDiseaseInfectious | 1074 |
HumanGeneticDiseases | 11468 |
RareSubset | 15682 |
Numbers look ok.
2 changes requested:
- Can we please update the name of this query to indicate that it is "human-rare disease"?
- Can we add also number of cancer terms? (it looks like it is missing in the original spreadsheet, but it is in the statistics that we often report (e.g. here)
- non-human genetic diseases: src/sparql/reports/COUNT-non-human-genetic-diseases.sparql
- the count is : 1021 (@katiermullen, does this number make sense to you?)
Changes requested:
In the report. the name is "?countHumanGeneticDiseases" instead of "countNONHumanGeneticDiseases"
- sh run.sh make report-reason-query-COUNT-non-human_diseases_infectious
?countNonHumanDiseaseInfectious = 87. (@katiermullen does this number make sense to you?
Someone should confirm the changes in the mondo.Makefile are ok. @matentzn, could you please check and/or confirm it is ok? Thank you |
@sabrinatoro I pushed new changes based on your comments above that include these updates: Issue 2: Update the name of this query to indicate that it is "human-rare disease" There is now a query to get the number of cancer terms Changed variable name for COUNT-non-human-genetic-diseases.sparql to ?countNonHumanGeneticDiseases TODO: Counts for non-human-genetic-diseases and non-human_diseases_infectious need to be reviewed by @katiermullen |
I can confirm that the counts for non-human-genetic-diseases (1021) and non-human_diseases_infectious (87) correspond to the number of terms in the 'hereditary disease, non-human animal' and 'infectious disease, non-human animal' branches, respectively. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran both
sh run.sh make create-general-mondo-stats-all
and sh run.sh make create-synonym-mondo-stats-all
.
Both got the following error:
reports/mondo_stats/tmp/COUNT-all_disease_excluding_susceptibility.tsv] Error 1
I will run the other sparql individually and test if their work
UPDATE: all the reports ran fine on their own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
report | count |
---|---|
?countAllDiseaseExcludingSusceptibility | 25379 |
?countHumanDiseaseExcludingSusceptibility | 22419 |
?countNonHumanAnimalDisease | 2959 |
Sum of human and non-human diseases = 25378
There is 1 difference. I wonder if this is counting the term "disease"
The counts and titles in the reports are ok.
Thanks for reviewing @sabrinatoro. The count queries use For the error running the Update: I pushed an update that may fix the issue running the |
src/sparql/reports/COUNT-all_disease_excluding_susceptibility.sparql
Outdated
Show resolved
Hide resolved
@sabrinatoro this is ready to review with the updates from the Tech call discussion to remove the explicit removal of 'disease susceptibility' terms from the queries. The initial post has all |
closes #8809
closes #8906
closes #8916
closes #8908
Create general stats as requested in this gSheet
The queries can either be run individually or as:
sh run.sh make create-general-mondo-stats-all
Update: added make goal and query to create synonym stats. The query can either be run as a "report" or as
sh run.sh make create-synonym-mondo-stats-all
Update 27Mar2025: added make goal and query to create rare subset stats. The query can either be run as a "report" or as
sh run.sh make create-rare-mondo-stats-all
Update 29Mar2025: added make goal to create report file of opened and closed issues between two dates with the count within each open/closed issue by tag. The dates can be entered manually or by default it uses the last two most recent dates of the tagged releases compared to the date the goal is run on. It also creates a file of unique users, the count of issues they opened or closed, and unique tags for these issues is also created.
This can be run from the command line given some prerequisites since it does not use ODK or from the GitHub Action "Generate GitHub Issue Statistics"
Example:
and