Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion tools/usher/macros.xml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<macros>
<token name="@TOOL_VERSION@">0.2.1</token>
<token name="@TOOL_VERSION@">0.6.6</token>
<token name="@GALAXY_TOOL_VERSION@">galaxy0</token>
<xml name="xrefs">
<xrefs>
Expand All @@ -15,6 +15,7 @@
<citations>
<citation type="doi">10.1101/2020.09.26.314971</citation>
<citation type="doi">10.1101/2021.04.03.438321</citation>
<citation type="doi">10.1101/2021.08.04.455157</citation>
</citations>
</xml>
<macro name="sanitize_string" >
Expand All @@ -24,6 +25,7 @@
<add value="-"/>
<add value="."/>
<add value=":"/>
<add value=","/>
</valid>
</sanitizer>
</macro>
Expand Down
337 changes: 194 additions & 143 deletions tools/usher/matutils.xml

Large diffs are not rendered by default.

94 changes: 94 additions & 0 deletions tools/usher/ripples.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
<tool id='usher_ripples' name='UShER RIPPLES' version='@TOOL_VERSION@+@GALAXY_TOOL_VERSION@' profile='23.2'>
<description>detects recombination events in large mutation annotated tree (MAT) files.</description>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<description>detects recombination events in large mutation annotated tree (MAT) files.</description>
<description>detect recombination events in large mutation annotated tree (MAT) files</description>

<macros>
<import>macros.xml</import>
</macros>
<expand macro="xrefs"/>
<expand macro='requirements' />
<version_command>usher --version</version_command>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we put this also into the macro?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can and will

from a teaching perspective: what's the rationale behind this request? Is it to avoid redundancy between the different wrappers? simplicity? good practice? thx :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundancy ... DRY. Here it's not that important, I don't expect this command changes much over time - but who knows :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, makes sense indeed :-)

and done so in the meantime.

<command detect_errors='exit_code'><![CDATA[
## get correct extension filenames
ln -sf '$input_mat' '$input_mat.element_identifier' &&

ripples
--input-mat '$input_mat.element_identifier'

--branch-length $branch_length
--min-coordinate-range $min_coordinate_range
--max-coordinate-range $max_coordinate_range
--samples-filename '$samples_filename'
--parsimony-improvement $parsimony_improvement
--num-descendants $num_descendants

--outdir ./
--threads \${GALAXY_SLOTS:-1} > output_stdout.txt

]]> </command>
<inputs>
<param argument="--input-mat" type="data" format="protobuf3" label="Mutation-annotated tree object" help="Load a mutation annotated tree file, in protocol-buffers format (protobuf3)."/>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please consider adding min/max to all integers/float params

Copy link
Contributor Author

@lsterck lsterck Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like to, but have no idea what sensible ranges would be ...
As I understand it is very dependent from the input/output trees used and as such it does not really make sense to "fix" ranges

<param argument="--branch-length" type="integer" value="3" min="0" label="Minimum branch length" help="Minimum length of the branch to consider for recombination events. Default = 3." />
<param argument="--min-coordinate-range" type="integer" value="1000" min="0" label="Minimal coordinate range" help="Minimum range of the genomic coordinates of the mutations on the recombinant branch. Default = 1,000." />
<param argument="--max-coordinate-range" type="integer" value="10000000" min="0" label="Maximal coordinate range" help="Maximum range of the genomic coordinates of the mutations on the recombinant branch. Default = 10,000,000." />
<param argument="--samples-filename" type="data" format="txt" label="Sample restriction file" help="Restrict the search to the ancestors of the samples specified in the input file." />
<param argument="--parsimony-improvement" type="integer" value="3" min="0" label="Parsimony improvement" help="Minimum improvement in parsimony score of the recombinant sequences during the partial placement. Default = 3." />
<param argument="--num-descendants" type="integer" value="10" label="Number of descendants" help="Minimum number of leaves that node should have to be considered for recombinatino. Default = 10." />
</inputs>
<outputs>
<data name="recombination" format="tabular" from_work_dir='recombination.tsv' label="${tool.name} on ${on_string}: recombinations" >
<actions>
<action name="column_names" type="metadata" default="recomb_node_id,breakpoint-1_interval,breakpoint-2_interval,donor_node_id,donor_is_sibling,donor_parsimony,acceptor_node_id,acceptor_is_sibling,acceptor_parsimony,original_parsimony,min_starting_parsimony,recomb_parsimony" />
</actions>
</data>
<data name="descendants" format="tabular" from_work_dir='descendants.tsv' label="${tool.name} on ${on_string}: descendants" >
<actions>
<action name="column_names" type="metadata" default="node_id,descendants" />
</actions>
</data>

</outputs>
<tests>
<test expect_num_outputs="2">
<param name="input_mat" value="mutation_annotation.pb" ftype="protobuf3"/>
<param name="samples_filename" value="sample_names.txt" ftype="txt"/>
<output name="descendants" file="test_26_descendants.tabular" ftype="tabular"/>
<output name="recombination" file="test_26_recombination.tabular" ftype="tabular"/>
</test>
<test expect_num_outputs="2">
<param name="input_mat" value="mutation_annotation.pb" ftype="protobuf3"/>
<param name="samples_filename" value="sample_names.txt" ftype="txt"/>
<param name="num_descendants" value="20" />
<param name="parsimony_improvement" value="5" />
<param name="branch_length" value="2" />
<output name="descendants" file="test_27_descendants.tabular" ftype="tabular"/>
<output name="recombination" file="test_27_recombination.tabular" ftype="tabular"/>
</test>
</tests>
<help><![CDATA[

.. class:: infomark

**Purpose**

RIPPLES (Recombination Inference using Phylogenetic PLacEmentS) is a program used to detect recombination events in large mutation annotated tree (MAT) files.

----

RIPPLES is a program to rapidly and sensitively detect recombinant nodes and their ancestors in a mutation-annotated tree (MAT). RIPPLES exploits the fact that recombinant lineages arising from diverse genomes will often be found on “long branches” which result from accommodating the divergent evolutionary histories of the two parental haplotypes. Therefore, RIPPLES first identifies long branches in a MAT. RIPPLES then exhaustively breaks the potential recombinant sequence into distinct segments that are differentiated by mutations on the recombinant sequence and separated by up to two breakpoints. For each set of breakpoints, RIPPLES places each of its corresponding segments using maximum parsimony to find the two parental nodes – a donor and an acceptor – that result in the highest parsimony score improvement relative to the original placement on the global phylogeny. The nodes for which a set of breakpoints along with two parental nodes can be identified that provide a parsimony score improvement above a user-specified threshold are reported as recombinants.

.. class:: infomark

**RIPPLES Common Options**

- input-mat: Input mutation-annotated tree file [REQUIRED]. If only this argument is set, print the count of samples and nodes in the tree.
- branch-length (-l): Minimum length of the branch to consider for recombination events. Default = 3.
- min-coordinate-range (-r): Minimum range of the genomic coordinates of the mutations on the recombinant branch. Default = 1,000.
- max-coordinate-range (-R): Maximum range of the genomic coordinates of the mutations on the recombinant branch. Default = 10,000,000.
- samples-filename (-s): Restrict the search to the ancestors of the samples specified in the input file.
- parsimony-improvement (-p): Minimum improvement in parsimony score of the recombinant sequences during the partial placement. Default = 3.
- num-descendants (-n): Minimum number of leaves that node should have to be considered for recombinatino. Default = 10.

You can find more information in the `RIPPLES official documentation page <https://usher-wiki.readthedocs.io/en/latest/ripples.html>`_.

]]> </help>
<expand macro="citations" />
</tool>
8 changes: 4 additions & 4 deletions tools/usher/test-data/rename_samples.tabular
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
England/BRIS-1853249/2020|20-04-02 Spain/BRIS-1853249/2020|20-04-02
Wales/PHWC-25B04/2020|20-03-24 Spain/BRIS-1853249/2020|20-04-02
NPL/61-TW/2020|MT072688.1|20-01-13 Spain/BRIS-1853249/2020|20-04-02
Wales/LIVE-A6831/2020|20-03-16 Spain/BRIS-1853249/2020|20-04-02
England/BRIS-1853249/2020|20-04-02 Spain/BRIS-1853249/2020|20-04-02_A
Wales/PHWC-25B04/2020|20-03-24 Spain/BRIS-1853249/2020|20-04-02_B
NPL/61-TW/2020|MT072688.1|20-01-13 Spain/BRIS-1853249/2020|20-04-02_C
Wales/LIVE-A6831/2020|20-03-16 Spain/BRIS-1853249/2020|20-04-02_D
Binary file modified tools/usher/test-data/test_01_annotated_tree.pb
Binary file not shown.
Loading