Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor sort_star_input.py #222

Open
adthrasher opened this issue Feb 26, 2025 · 1 comment
Open

Refactor sort_star_input.py #222

adthrasher opened this issue Feb 26, 2025 · 1 comment

Comments

@adthrasher
Copy link
Member

adthrasher commented Feb 26, 2025

          I guess it's fine for now, but can you open an issue for refactoring this? I think the best set up would probably be rename and move `sort_star_input.py` into the `util` image, and then do the STAR input checking as part of `parse_input` instead of as part of the STAR task? I think that would also remove the need to host our own STAR image, wouldn't it?

Originally posted by @a-frantz in #139 (comment)

  • Rename the script and move it to a different container (util?).
  • Refactor the STAR task to remove the use of the script.
  • Add the script call to parse-input at the RNA-Seq workflow-level.
  • Remove our star Docker image in favor of clean image from biocontainers
@a-frantz
Copy link
Member

Complication: currently, sort_star_input.py is fine with the RGs and FASTQs being out of order (e.g. R1s: [rg1.fq, rg2.fq, rg3.fq], R2s: [rg2.fq, rg3.fq, rg1.fq] RGs: [rg3, rg1, rg2]) because it will sort them in the output files. That would be messy to parse back into WDL, but we could instead have the script fail on bad orderings.

Point being, a bad order to rnaseq-standard is acceptable and recoverable, but will go undetected (and maybe lead to strange behavior?) in the Hi-C workflow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants