-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: catch when FORMAT field is present in truth #114
base: main
Are you sure you want to change the base?
Conversation
WalkthroughA new file ( Changes
Sequence Diagram(s)sequenceDiagram
participant Rule as add_format_field Rule
participant VCF as vcf-genotype-annotator
participant Fallback as bcftools reheader & conversion
Rule->>VCF: Execute vcf-genotype-annotator
alt Success
VCF-->>Rule: Return VCF with added FORMAT field
else Failure
VCF-->>Rule: Error (GT field exists)
Rule->>Fallback: Execute bcftools reheader & convert BCF to VCF using truth.txt
Fallback-->>Rule: Return modified VCF
end
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
workflow/rules/common.smk
(1 hunks)workflow/rules/download.smk
(1 hunks)workflow/rules/eval.smk
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Formatting
🔇 Additional comments (2)
workflow/rules/download.smk (1)
81-97
: Good addition of a new rule to check for FORMAT fieldThe new
has_format_field
rule is a well-structured solution to detect FORMAT fields in BCF files. This addresses the core issue mentioned in the PR objectives, providing a reliable way to check if a FORMAT field exists in the truth data.The implementation using
bcftools view -h | grep -q FORMAT
is clean and concise, writing "true" or "false" to the output file accordingly.workflow/rules/eval.smk (1)
65-76
: Good refactoring of theadd_format_field
ruleThe changes to this rule improve the structure and flexibility of how the FORMAT field is handled. The use of named input parameters makes the code more maintainable, and the integration with the new functions provides a cleaner solution to the FORMAT field issue.
The use of
format_field_cmd
parameterization allows for dynamic handling based on the presence of the FORMAT field, which directly addresses the problem described in the PR objectives.
workflow/rules/common.smk
Outdated
def has_format_field(wildcards): | ||
file_path = input.format_field_file | ||
with open(file_path, "r") as infile: | ||
content = infile.read().strip() | ||
if content == "true": | ||
return True | ||
else: | ||
return False | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix input parameter handling in has_format_field
function
The function assumes input.format_field_file
is defined, but this variable isn't passed as a parameter to the function. This could cause runtime errors when the function is called outside the context where input
is defined.
Apply this fix to properly handle the input parameter:
-def has_format_field(wildcards):
- file_path = input.format_field_file
+def has_format_field(wildcards, input=None):
+ file_path = input.format_field_file if input else f"resources/variants/{wildcards.genome}/has-format-field.txt"
with open(file_path, "r") as infile:
content = infile.read().strip()
if content == "true":
return True
else:
return False
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def has_format_field(wildcards): | |
file_path = input.format_field_file | |
with open(file_path, "r") as infile: | |
content = infile.read().strip() | |
if content == "true": | |
return True | |
else: | |
return False | |
def has_format_field(wildcards, input=None): | |
file_path = input.format_field_file if input else f"resources/variants/{wildcards.genome}/has-format-field.txt" | |
with open(file_path, "r") as infile: | |
content = infile.read().strip() | |
if content == "true": | |
return True | |
else: | |
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
workflow/rules/common.smk (2)
285-292
:⚠️ Potential issueFix input parameter handling in
has_format_field
functionThe function assumes
input.format_field_file
is defined, but this variable might not be available when the function is called outside the context whereinput
is defined.Apply this fix to properly handle the input parameter:
-def has_format_field(wildcards, input): - file_path = input.format_field_file +def has_format_field(wildcards, input=None): + if input is None or not hasattr(input, "format_field_file"): + file_path = f"resources/variants/{wildcards.genome}/has-format-field.txt" + else: + file_path = input.format_field_file with open(file_path, "r") as infile: content = infile.read().strip() if content == "true": return True else: return False
295-302
:⚠️ Potential issueFix parameter handling in
get_format_field_command
functionThe function assumes
params.format_field
is defined, but this might not be available in all contexts where the function is called.Apply this fix to handle cases where
params
might not be defined:-def get_format_field_command(wildcards, params): +def get_format_field_command(wildcards, params=None): + has_format = params.format_field if params and hasattr(params, "format_field") else has_format_field(wildcards) if params.format_field: # bcftools reheader changes sample name which is associated with GT field to truth return "bcftools reheader -s {input.truth_name} {input.bcf} | bcftools view -Oz > {output}" else: # bcftools convert makes sure that input for vcf-genotype-annotator is in vcf format # adds FORMAT field with GT field and sample name 'truth' return "vcf-genotype-annotator <(bcftools convert -Ov {input.bcf}) truth 0/1 -o {output} &> {log}"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
workflow/rules/common.smk
(1 hunks)workflow/rules/eval.smk
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Formatting
🔇 Additional comments (2)
workflow/rules/eval.smk (2)
65-69
: Good improvement: Named inputs for clarityThe change to use named inputs (
bcf
,truth_name
,format_field_file
) instead of a single string input makes the code more readable and maintainable.
79-79
: Parameter reference is correct, but dependent on function fixesThe shell command now correctly uses
{params.format_field_cmd}
instead of hardcoded logic, which is a good improvement. However, the function generating this parameter needs to be fixed as noted in the common.smk review.
params: | ||
format_field=has_format_field, | ||
format_field_cmd=get_format_field_command, | ||
conda: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Check that function parameter handling is fixed
The added parameters properly connect to the new functions, but make sure the issues with parameter handling in those functions are fixed first, as noted in the common.smk review.
stratify_truth
currently fails with:This fix might work - but maybe a second rule to check for the GT field first is needed.
Summary by CodeRabbit
New Features
Chores