Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with EDAM ontologies file extension #3511

Open
toniher opened this issue Mar 26, 2025 · 3 comments
Open

Issue with EDAM ontologies file extension #3511

toniher opened this issue Mar 26, 2025 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@toniher
Copy link

toniher commented Mar 26, 2025

Description of the bug

I am checking EDAM ontologies automatic assignment loading process here:

def load_edam():

Now, extension is retrieved and processed from the second [1] column of EDAM.tsv. I would rather use an intended field for that, the 15th [14]. With the present assignment, there are some file extensions that are not used in real life (e.g., such as uniprotkb). As a drawback, there may be extensions appearing several times, e.g., json:

json,http://edamontology.org/format_3464
json,http://edamontology.org/format_3969
json,http://edamontology.org/format_3970

In these cases, I would favour assigning the most generic one, that I would hope is the first one is first created in the dictionary (so no overwriting would be allowed).
This will lead to less automatic assignation of filetypes, but more precise ones. Users would need to be recommended to curate manually when more specific EDAM mapping might exist for some specific file extensions such as JSON (e.g., http://edamontology.org/format_3970)

Command used and terminal output

System information

No response

@toniher toniher added the bug Something isn't working label Mar 26, 2025
@toniher toniher self-assigned this Mar 26, 2025
toniher added a commit to toniher/nf-core-tools that referenced this issue Mar 26, 2025
@toniher
Copy link
Author

toniher commented Mar 26, 2025

It's worth noting that even though we removed many false positives (e.g., uniprotkb file extension), with this change we now miss some useful ones (such as gff or bed). My take would be check how we could add these file extensions upstream in https://github.com/edamontology/edamontology/ (a debate maybe for another issue could also accommodate different EDAM releases - now latest)

toniher added a commit that referenced this issue Mar 27, 2025
Changing retrieval of file extension from EDAM - Referred at: #3511
toniher added a commit to toniher/edamontology that referenced this issue Mar 31, 2025
@matuskalas
Copy link

matuskalas commented Mar 31, 2025

Thanks, merged upstream in EDAM. Please post more if needed.

Just one warning: the EDAM.tsv is not (at all) up to date with the EDAM.owl, as we're currently lacking that part of the process. Please let us know in case that this is a showstopper for you, and we can consider prioritising it. Otherwise please let us know if/when you need the http://edamontology.org/EDAM.owl updated with these file extensions (it's not a CD after each merge to main)

@toniher
Copy link
Author

toniher commented Apr 1, 2025

Thanks @matuskalas ! The tool used during the linting process provides additions of EDAM links to the stated file formats in the module specifications (detail: https://nf-co.re/blog/2025/modules-ontology). Contributors need to review the additions anyway, but since many of them might not be familiar with EDAM, any initial automation always helps. So, nothing critical, but having a more up-to-date TSV will be beneficial and time-saving. Thanks for the prompt response, and let us know if anything we could help as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: To do
Development

No branches or pull requests

2 participants