Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add goals to create general stats and synonym stats and rare and community stats #8812

Open
wants to merge 32 commits into
base: master
Choose a base branch
from

Conversation

twhetzel
Copy link
Collaborator

@twhetzel twhetzel commented Mar 5, 2025

closes #8809
closes #8906
closes #8916
closes #8908

Create general stats as requested in this gSheet
The queries can either be run individually or as:
sh run.sh make create-general-mondo-stats-all

  • curator review and approval of general statistics values
  • developer review of general statistics queries and make goals

Update: added make goal and query to create synonym stats. The query can either be run as a "report" or as
sh run.sh make create-synonym-mondo-stats-all

  • curator review and approval of synonym statistics values
  • developer review of synonym statistics queries and make goals

Update 27Mar2025: added make goal and query to create rare subset stats. The query can either be run as a "report" or as sh run.sh make create-rare-mondo-stats-all

  • curator review and approval of rare statistics values
  • developer review of rare statistics queries and make goals

Update 29Mar2025: added make goal to create report file of opened and closed issues between two dates with the count within each open/closed issue by tag. The dates can be entered manually or by default it uses the last two most recent dates of the tagged releases compared to the date the goal is run on. It also creates a file of unique users, the count of issues they opened or closed, and unique tags for these issues is also created.
This can be run from the command line given some prerequisites since it does not use ODK or from the GitHub Action "Generate GitHub Issue Statistics"

  • curator review
    Example:
# Stats generated on 2025-03-30 06:02:16Z for GitHub issues from 2025-03-15 to 2025-03-18
type	label	count
new_issue		1
closed_issue		12
label_new	New term request	1
label_new	user request	1
label_closed	New term request	2
...

and

# Stats generated on 2025-03-30 04:58:22Z for GitHub issues from 2025-03-15 to 2025-03-18
type	github_handle	count	labels
opened	J-Siew	1	New term request,user request
closed	galyea123	3	effort-XS,relabel term,synonym,user request
closed	sagehrke	1	outreach
closed	eedoh01	1	New term request,user request
...

Sorry, something went wrong.

twhetzel added 3 commits March 4, 2025 13:10
WIP

Verified

This commit was signed with the committer’s verified signature.
twhetzel Trish Whetzel
WIP

Verified

This commit was signed with the committer’s verified signature.
twhetzel Trish Whetzel

Verified

This commit was signed with the committer’s verified signature.
twhetzel Trish Whetzel
@twhetzel twhetzel requested a review from matentzn as a code owner March 5, 2025 07:14
@twhetzel twhetzel requested review from matentzn and removed request for matentzn March 5, 2025 07:15
Copy link
Member

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didnt check all the

twhetzel and others added 3 commits March 10, 2025 12:51

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Co-authored-by: Nico Matentzoglu <[email protected]>

Verified

This commit was signed with the committer’s verified signature.
twhetzel Trish Whetzel

Verified

This commit was signed with the committer’s verified signature.
twhetzel Trish Whetzel
@twhetzel twhetzel requested a review from matentzn March 10, 2025 22:11
Copy link
Collaborator

@sabrinatoro sabrinatoro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran these queries, and found the following:

  1. total count doesn't match:
Queries count
AllDiseaseExcludingSusceptibility 25379
HumanDisease 25661
NonHumanAnimalDisease 2959

Human disease + non-human disease are not equal to all disease excluding susceptibility.
Is it possible that "susceptibilities" are counted in the "human disease" count? if yes, then let's update to human disease = human disease excluding susceptibilities

Queries   counts
HumanDiseaseInfectious 1074
HumanGeneticDiseases 11468
RareSubset 15682

Numbers look ok.
2 changes requested:

  • Can we please update the name of this query to indicate that it is "human-rare disease"?
  • Can we add also number of cancer terms? (it looks like it is missing in the original spreadsheet, but it is in the statistics that we often report (e.g. here)
  1. non-human genetic diseases: src/sparql/reports/COUNT-non-human-genetic-diseases.sparql
  • the count is : 1021 (@katiermullen, does this number make sense to you?)

Changes requested:
In the report. the name is "?countHumanGeneticDiseases" instead of "countNONHumanGeneticDiseases"

  1. sh run.sh make report-reason-query-COUNT-non-human_diseases_infectious
    ?countNonHumanDiseaseInfectious = 87. (@katiermullen does this number make sense to you?

@sabrinatoro
Copy link
Collaborator

Someone should confirm the changes in the mondo.Makefile are ok. @matentzn, could you please check and/or confirm it is ok? Thank you

Verified

This commit was signed with the committer’s verified signature.
twhetzel Trish Whetzel
@twhetzel
Copy link
Collaborator Author

@sabrinatoro I pushed new changes based on your comments above that include these updates:
Issue 1: Human disease + non-human disease are not equal to all disease excluding susceptibility.
Update: The count of "human disease" now excludes any classes that are also a subclass of susceptibility

Issue 2: Update the name of this query to indicate that it is "human-rare disease"
I updated the name of the query to indicate it is human rare disease.
I also updated the query to count subclasses of MONDO:0700096 'human disease' vs. MONDO:000001 disease that are in the rare subset.

There is now a query to get the number of cancer terms

Changed variable name for COUNT-non-human-genetic-diseases.sparql to ?countNonHumanGeneticDiseases

TODO: Counts for non-human-genetic-diseases and non-human_diseases_infectious need to be reviewed by @katiermullen

@katiermullen
Copy link
Collaborator

I can confirm that the counts for non-human-genetic-diseases (1021) and non-human_diseases_infectious (87) correspond to the number of terms in the 'hereditary disease, non-human animal' and 'infectious disease, non-human animal' branches, respectively.

Verified

This commit was signed with the committer’s verified signature.
twhetzel Trish Whetzel

Verified

This commit was signed with the committer’s verified signature.
twhetzel Trish Whetzel

Verified

This commit was signed with the committer’s verified signature.
twhetzel Trish Whetzel
@twhetzel twhetzel requested a review from sabrinatoro March 27, 2025 06:21
@twhetzel twhetzel assigned sabrinatoro and unassigned twhetzel Mar 27, 2025
@twhetzel twhetzel changed the title Add goal to create general stats Add goals to create general stats and synonym stats Mar 27, 2025
Copy link
Collaborator

@sabrinatoro sabrinatoro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran both
sh run.sh make create-general-mondo-stats-all and sh run.sh make create-synonym-mondo-stats-all.
Both got the following error:
reports/mondo_stats/tmp/COUNT-all_disease_excluding_susceptibility.tsv] Error 1

I will run the other sparql individually and test if their work
UPDATE: all the reports ran fine on their own.

@sabrinatoro sabrinatoro assigned twhetzel and unassigned sabrinatoro Mar 27, 2025
Copy link
Collaborator

@sabrinatoro sabrinatoro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

report count
?countAllDiseaseExcludingSusceptibility 25379
?countHumanDiseaseExcludingSusceptibility 22419
?countNonHumanAnimalDisease 2959

Sum of human and non-human diseases = 25378
There is 1 difference. I wonder if this is counting the term "disease"

The counts and titles in the reports are ok.

@twhetzel
Copy link
Collaborator Author

twhetzel commented Mar 27, 2025

Thanks for reviewing @sabrinatoro. The count queries use ?class rdfs:subClassOf* obo:MONDO_0000001 so with the * the query is counting the class itself and all of it's subclasses.

For the error running the make goals, is this, reports/mondo_stats/tmp/COUNT-all_disease_excluding_susceptibility.tsv] Error 1 the only error message that is displayed or were there any other lines above this?

Update: I pushed an update that may fix the issue running the make commands.

Verified

This commit was signed with the committer’s verified signature.
twhetzel Trish Whetzel
@twhetzel twhetzel requested a review from sabrinatoro March 27, 2025 23:19
@twhetzel twhetzel assigned sabrinatoro and unassigned twhetzel Mar 27, 2025

Verified

This commit was signed with the committer’s verified signature.
twhetzel Trish Whetzel

Verified

This commit was signed with the committer’s verified signature.
twhetzel Trish Whetzel

Verified

This commit was signed with the committer’s verified signature.
twhetzel Trish Whetzel

Verified

This commit was signed with the committer’s verified signature.
twhetzel Trish Whetzel

Verified

This commit was signed with the committer’s verified signature.
twhetzel Trish Whetzel
@twhetzel twhetzel changed the title Add goals to create general stats and synonym stats Add goals to create general stats and synonym stats and rare stats Mar 28, 2025

Verified

This commit was signed with the committer’s verified signature.
twhetzel Trish Whetzel

Verified

This commit was signed with the committer’s verified signature.
twhetzel Trish Whetzel

Verified

This commit was signed with the committer’s verified signature.
twhetzel Trish Whetzel

Verified

This commit was signed with the committer’s verified signature.
twhetzel Trish Whetzel
@twhetzel
Copy link
Collaborator Author

twhetzel commented Mar 29, 2025

@sabrinatoro this is ready to review with the updates from the Tech call discussion to remove the explicit removal of 'disease susceptibility' terms from the queries. The initial post has all make commands to run.

Verified

This commit was signed with the committer’s verified signature.
twhetzel Trish Whetzel
@twhetzel twhetzel changed the title Add goals to create general stats and synonym stats and rare stats Add goals to create general stats and synonym stats and rare and community stats Mar 30, 2025
twhetzel added 11 commits March 29, 2025 19:23

Verified

This commit was signed with the committer’s verified signature.
twhetzel Trish Whetzel

Verified

This commit was signed with the committer’s verified signature.
twhetzel Trish Whetzel

Verified

This commit was signed with the committer’s verified signature.
twhetzel Trish Whetzel

Verified

This commit was signed with the committer’s verified signature.
twhetzel Trish Whetzel

Verified

This commit was signed with the committer’s verified signature.
twhetzel Trish Whetzel

Verified

This commit was signed with the committer’s verified signature.
twhetzel Trish Whetzel

Verified

This commit was signed with the committer’s verified signature.
twhetzel Trish Whetzel

Verified

This commit was signed with the committer’s verified signature.
twhetzel Trish Whetzel

Verified

This commit was signed with the committer’s verified signature.
twhetzel Trish Whetzel

Verified

This commit was signed with the committer’s verified signature.
twhetzel Trish Whetzel

Verified

This commit was signed with the committer’s verified signature.
twhetzel Trish Whetzel
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants