From a808e46d26be69239c7bc4fc9dd1a259982ace65 Mon Sep 17 00:00:00 2001 From: Helena Rasche Date: Wed, 17 May 2023 19:21:19 +0200 Subject: [PATCH] fix remaining discovered issues --- topics/admin/tutorials/sentry/tutorial.md | 6 +- .../tutorials/mrsa-illumina/tutorial.md | 9 +- .../tutorials/mrsa-nanopore/tutorial.md | 9 +- .../tutorials/unicycler-assembly/tutorial.md | 3 +- .../vgp_workflow_training/tutorial.md | 2 +- .../tutorials/pangeo-notebook/tutorial.md | 1 - .../tutorials/cheminformatics/tutorial.md | 102 +++++++++--------- .../setting-up-molecular-systems/tutorial.md | 18 ++-- .../tutorials/learning-principles/tutorial.md | 5 +- .../tutorials/bash-git/tutorial.md | 12 +-- .../bash-variant-calling/tutorial.md | 2 +- .../tutorials/python-conda/tutorial.md | 2 +- .../tutorials/python-iterables/tutorial.md | 5 +- .../tutorials/python-linting/tutorial.md | 3 +- .../tutorials/python-venv/tutorial.md | 30 +++--- topics/dev/tutorials/debugging/tutorial.md | 9 +- .../tutorials/champs-blocs/tutorial.md | 14 +-- .../ecology/tutorials/regionalGAM/tutorial.md | 4 +- .../tutorials/x-array-map-plot/tutorial.md | 30 +++--- .../tutorial.md | 4 +- .../tutorials/rstudio/tutorial.md | 16 +-- .../tutorials/upload-data-to-ena/tutorial.md | 18 ++-- .../tutorials/workflow-automation/tutorial.md | 43 ++++---- .../tutorials/crispr-screen/tutorial.md | 4 +- .../tutorials/hpc-for-lsgc/tutorial.md | 4 +- .../multiplex-tissue-imaging-TMA/tutorial.md | 12 +-- .../tutorials/gc_ms_with_xcms/tutorial.md | 12 +-- .../metabolomics/tutorials/gcms/tutorial.md | 14 +-- .../tutorials/lcms-preprocessing/tutorial.md | 14 +-- .../tutorial.md | 4 +- .../tutorial.md | 10 +- .../bulk-music-4-compare/tutorial.md | 2 +- .../scrna-case_monocle3-rstudio/tutorial.md | 2 +- .../tutorial.md | 17 +-- .../tutorials/scrna-umis/tutorial.md | 2 +- .../tutorial.md | 55 +++++----- .../tutorials/gpu_jupyter_lab/tutorial.md | 10 +- .../tutorials/ref-based/tutorial.md | 2 +- .../rna-seq-bash-star-align/tutorial.md | 6 +- .../tutorial.md | 12 +-- .../tutorials/sars-cov-2/tutorial.md | 2 +- .../somatic-variant-discovery/tutorial.md | 4 +- .../tutorials/tb-variant-analysis/tutorial.md | 8 +- .../tutorials/circos/tutorial.md | 2 +- 44 files changed, 277 insertions(+), 268 deletions(-) diff --git a/topics/admin/tutorials/sentry/tutorial.md b/topics/admin/tutorials/sentry/tutorial.md index 92e2f5727b9c3..eededae764eda 100644 --- a/topics/admin/tutorials/sentry/tutorial.md +++ b/topics/admin/tutorials/sentry/tutorial.md @@ -53,7 +53,6 @@ To proceed from here it is expected that: > > 1. You have set up a working Galaxy instance as described in the [ansible-galaxy](../ansible-galaxy/tutorial.html) tutorial. > - {: .comment} # Installing and Configuring @@ -354,7 +353,7 @@ In addition to sending logging errors to Sentry you can also collect failing too > {: data-commit="Configure error reporting"} > > 2. Create a testing tool in `files/galaxy/tools/job_properties.xml`. -. +> > {% raw %} > ```diff > --- /dev/null @@ -481,7 +480,6 @@ To generate a tool error, run the job properties testing tool and set the `failb > Open the Galaxy Project in Sentry > 1. Go to your Sentry instance and click on issues. You should see an issue for the tool run error. - {: .hands_on } ## Reporting errors from the Pulsar server @@ -518,7 +516,7 @@ It is also possible to report errors from the Pulsar server. You can either use > ``` > {: data-commit="Configure pulsar for error reporting"} > -> > 4. Run the pulsar playbook. +> 4. Run the pulsar playbook. > > > Bash > > ```bash diff --git a/topics/assembly/tutorials/mrsa-illumina/tutorial.md b/topics/assembly/tutorials/mrsa-illumina/tutorial.md index 3352af7ae8ef4..f9658f13c258b 100644 --- a/topics/assembly/tutorials/mrsa-illumina/tutorial.md +++ b/topics/assembly/tutorials/mrsa-illumina/tutorial.md @@ -58,10 +58,11 @@ In this training you're going to make an assembly of data produced by Japan" from {% cite Hikichi_2019 %} which describes: > Methicillin-resistant *Staphylococcus aureus* (MRSA) is a major pathogen -causing nosocomial infections, and the clinical manifestations of MRSA -range from asymptomatic colonization of the nasal mucosa to soft tissue -infection to fulminant invasive disease. Here, we report the complete -genome sequences of eight MRSA strains isolated from patients in Japan. +> causing nosocomial infections, and the clinical manifestations of MRSA +> range from asymptomatic colonization of the nasal mucosa to soft tissue +> infection to fulminant invasive disease. Here, we report the complete +> genome sequences of eight MRSA strains isolated from patients in Japan. +{: .quote} > > diff --git a/topics/assembly/tutorials/mrsa-nanopore/tutorial.md b/topics/assembly/tutorials/mrsa-nanopore/tutorial.md index d79263d753f53..b709d3d73c76c 100644 --- a/topics/assembly/tutorials/mrsa-nanopore/tutorial.md +++ b/topics/assembly/tutorials/mrsa-nanopore/tutorial.md @@ -61,10 +61,11 @@ In this training you're going to make an assembly of data produced by Japan" from {% cite Hikichi_2019 %} which describes: > Methicillin-resistant *Staphylococcus aureus* (MRSA) is a major pathogen -causing nosocomial infections, and the clinical manifestations of MRSA -range from asymptomatic colonization of the nasal mucosa to soft tissue -infection to fulminant invasive disease. Here, we report the complete -genome sequences of eight MRSA strains isolated from patients in Japan. +> causing nosocomial infections, and the clinical manifestations of MRSA +> range from asymptomatic colonization of the nasal mucosa to soft tissue +> infection to fulminant invasive disease. Here, we report the complete +> genome sequences of eight MRSA strains isolated from patients in Japan. +{: .quote} > > diff --git a/topics/assembly/tutorials/unicycler-assembly/tutorial.md b/topics/assembly/tutorials/unicycler-assembly/tutorial.md index c44a340270f98..99caf163917a9 100644 --- a/topics/assembly/tutorials/unicycler-assembly/tutorial.md +++ b/topics/assembly/tutorials/unicycler-assembly/tutorial.md @@ -70,8 +70,7 @@ There are 12,738 [2d-reads](http://www.nature.com/nmeth/journal/v12/n4/fig_tab/n You can see that there many reads under the second peak with median of approximately 7.5 kb. > Oxford Nanopore Data Format -> Oxford Nanopore machines output - data in [fast5](http://bioinformatics.cvr.ac.uk/blog/exploring-the-fast5-format/) format that contains additional information besides sequence data. In this tutorial we assume that this data is *already* converted into [fastq](https://en.wikipedia.org/wiki/FASTQ_format). An additional tutorial dedicated to handling fast5 datasets will be developed shortly. +> Oxford Nanopore machines output data in [fast5](http://bioinformatics.cvr.ac.uk/blog/exploring-the-fast5-format/) format that contains additional information besides sequence data. In this tutorial we assume that this data is *already* converted into [fastq](https://en.wikipedia.org/wiki/FASTQ_format). An additional tutorial dedicated to handling fast5 datasets will be developed shortly. {: .warning} diff --git a/topics/assembly/tutorials/vgp_workflow_training/tutorial.md b/topics/assembly/tutorials/vgp_workflow_training/tutorial.md index ff3bb8e736bcd..4028e7cb1441c 100644 --- a/topics/assembly/tutorials/vgp_workflow_training/tutorial.md +++ b/topics/assembly/tutorials/vgp_workflow_training/tutorial.md @@ -72,7 +72,7 @@ This tutorial assumes you are comfortable getting data into Galaxy, running jobs The {VGP} assembly pipeline has a modular organization, consisting in five main subworkflows (fig. 1), each one integrated by a series of data manipulation steps. Firstly, it allows the evaluation of intermediate steps, which facilitates the modification of parameters if necessary, without the need to start from the initial stage. Secondly, it allows to adapt the workflow to the available data. -> ![Figure 1: VGP pipeline modules](../../images/vgp_assembly/VGP_workflow_modules.png "VGP assembly pipeline. The VGP workflow is implemented in a modular fashion: it consists of five independent subworkflows. In addition, it includes some additional workflows (not shown in the figure), required for exporting the results to GenomeArk.") +![Figure 1: VGP pipeline modules](../../images/vgp_assembly/VGP_workflow_modules.png "VGP assembly pipeline. The VGP workflow is implemented in a modular fashion: it consists of five independent subworkflows. In addition, it includes some additional workflows (not shown in the figure), required for exporting the results to GenomeArk.") The VGP pipeline first uses an assembly program to generate {contigs}. When {Hi-C} data and Bionano data are avilable, then they are used to generate {scaffolds}. When both data types are available, then Bionano scaffolding is run first before Hi-C scaffolding, but if optical maps are not available then HiC scaffolding can be run on the contigs. diff --git a/topics/climate/tutorials/pangeo-notebook/tutorial.md b/topics/climate/tutorials/pangeo-notebook/tutorial.md index e0a63dad2b7d9..f7283fea5c20e 100755 --- a/topics/climate/tutorials/pangeo-notebook/tutorial.md +++ b/topics/climate/tutorials/pangeo-notebook/tutorial.md @@ -348,7 +348,6 @@ dset.sel(time=(np.timedelta64(2,'D') + np.timedelta64(12,'h')))['pm2p5_conc'].pl plt.title("Copernicus Atmosphere Monitoring Service PM2.5, 2 day forecasts\n 24th December 2021 at 12:00 UTC", fontsize=18) plt.savefig("CAMS-PM2_5-fc-20211224.png") ``` -> {: .code-in} And you should get the following plot: diff --git a/topics/computational-chemistry/tutorials/cheminformatics/tutorial.md b/topics/computational-chemistry/tutorials/cheminformatics/tutorial.md index 8241027e6144f..38683e7696f42 100644 --- a/topics/computational-chemistry/tutorials/cheminformatics/tutorial.md +++ b/topics/computational-chemistry/tutorials/cheminformatics/tutorial.md @@ -163,56 +163,58 @@ We will generate our compound library by searching ChEMBL for compounds which ha > Problems using the ChEMBL tool? > A number of users encounter issues with the ChEMBL tool - sometimes the tool fails, or the output is returned successfully but is empty. If this happens to you, try the following: -> > Rerun the tool - if a transient error on the ChEMBL server was at fault, this might be enough to fix it. -> > Try modifying some of the parameters. For example, reducing the Tanimoto coefficient should increase the number of compounds returned. -> > If all else fails, you can use the following list of SMILES: -> ``` -> Cc1n[nH]c(c2ccc(O)c(Cl)c2)c1c3ccc4OCCOc4c3 CHEMBL187670 -> COc1ccc(cc1)c2c(C)[nH]nc2c3ccc(O)cc3O CHEMBL192894 -> COc1ccc(c(O)c1)c2onc(C)c2c3ccc4OCCOc4c3 CHEMBL1541585 -> CCOc1ccc(c(O)c1)c2[nH]nc(C)c2c3ccccc3OC CHEMBL1504505 -> CN(CCc1c(C)n[nH]c1C)Cc2cn(C)nc2c3ccc4OCCOc4c3 CHEMBL1560480 -> COc1ccc(c(O)c1)c2[nH]nc(C)c2c3ccc4OCCCOc4c3 CHEMBL362893 -> CCCc1c(OCCCOc2cc(O)c(cc2CC)c3cc[nH]n3)ccc4CCC(Oc14)C(=O)O CHEMBL81401 -> Cc1cccc(n1)c2[nH]nc(C)c2c3ccnc4ccccc34 CHEMBL129153 -> COc1ccc(cc1OC)c2c(C)n[nH]c2c3ccc(O)cc3O CHEMBL1595327 -> COc1ccc(c(O)c1)c2noc(C)c2c3ccc4ccccc4n3 CHEMBL1486235 -> CCc1cc(c(O)cc1O)c2[nH]nc(C)c2c3ccc4OCCOc4c3 CHEMBL399530 -> Cc1[nH]nc(c2cc(Cl)c(O)cc2O)c1c3ccc4OCCOc4c3 CHEMBL191074 -> COc1ccc(c(O)c1)c2[nH]ncc2c3ccc4OCCOc4c3 CHEMBL1415374 -> Cc1n[nH]c(c2cc(Cl)ccc2O)c1c3ccc4OCCOc4c3 CHEMBL187678 -> Cc1noc(c2ccc(O)cc2O)c1c3ccc4OCCOc4c3 CHEMBL582320 -> CCOC(=O)c1oc(cc1)c2c(C)[nH]nc2c3cc(CC)c(O)cc3O CHEMBL3932805 -> NC(=O)c1ccc2[nH]nc(c3ccc4OCCOc4c3)c2c1 CHEMBL3900406 -> COc1ccc(cc1OC)c2cc([nH]n2)c3c(O)c(OC)c4occc4c3OC CHEMBL1351838 -> CCCc1cc(c(O)cc1OC)c2[nH]ncc2c3ccc4OCCCOc4c3 CHEMBL1443258 -> Oc1cc(O)c(cc1Cl)c2[nH]ncc2c3ccc4OCCOc4c3 CHEMBL191228 -> CCc1cc(c(O)cc1O)c2n[nH]cc2c3ccc4OCCOc4c3 CHEMBL3187010 -> Oc1ccccc1c2cc([nH]n2)c3ccc4OCCOc4c3 CHEMBL1567097 -> CCCc1cc(c(O)cc1O)c2[nH]ncc2c3ccc4OCCCOc4c3 CHEMBL1578064 -> COc1ccc(c(O)c1)c2[nH]nc(C)c2c3ccc(OC)c(OC)c3 CHEMBL1335688 -> COc1ccc(cc1)c2c(N)n[nH]c2c3cc(OC)c4OCCOc4c3 CHEMBL2408971 -> CCC(C)c1cc(c(O)cc1O)c2[nH]ncc2c3ccc4OCOc4c3 CHEMBL187674 -> Cc1noc(c2ccc(O)cc2O)c1c3ccc4OCCCOc4c3 CHEMBL587334 -> OCc1cn(nc1c2ccc3OCCOc3c2)c4ccccc4 CHEMBL1549407 -> Oc1ccc(c(O)c1)c2[nH]ncc2c3cccc4cccnc34 CHEMBL1305951 -> Cc1n[nH]c(c2ccc(O)cc2O)c1c3ccc4OCCOc4c3 CHEMBL188965 -> Cc1[nH]nc(c2ccc3OCC(=O)Nc3c2)c1c4ccc(F)cc4 CHEMBL3337723 -> CCOc1ccc(c(O)c1)c2n[nH]c(C)c2c3ccc(OC)cc3 CHEMBL1698243 -> COc1ccc(cc1)c2cc([nH]n2)c3c(O)c(OC)c4occc4c3OC CHEMBL1402615 -> CCc1cc(c(O)cc1O)c2[nH]ncc2c3ccc4OCOc4c3 CHEMBL1412538 -> CCOc1cc(O)c(cc1CC)c2nc(N)ncc2c3ccc4OCCOc4c3 CHEMBL547662 -> COc1ccc(cc1)c2c(N)onc2c3cc(OC)c4OCCOc4c3 CHEMBL3113121 -> CCCc1cc(c(O)cc1O)c2n[nH]cc2c3ccc4OCCOc4c3 CHEMBL3956397 -> CCC(C)c1cc(c(O)cc1O)c2[nH]nc(C)c2c3ccc4OCCOc4c3 CHEMBL190919 -> CCC(C)c1cc(c(O)cc1OC)c2[nH]ncc2c3ccc4OCCOc4c3 CHEMBL435501 -> Cn1cc(CNCc2c[nH]nc2c3ccc(F)cc3)c(n1)c4ccc5OCCOc5c4 CHEMBL1537178 -> Oc1ccc(F)cc1c2cc([nH]n2)c3ccc4OCCOc4c3 CHEMBL1451528 -> CCc1cc(c(O)cc1OCC(=O)O)c2n[nH]c(C)c2c3ccc4OCCCOc4c3 CHEMBL3952001 -> Cc1[nH]nc(c2ccc(O)c(O)c2O)c1c3ccc(Cl)cc3 CHEMBL1092945 -> COc1cc(cc(OC)c1OC)c2n[nH]nc2c3ccc4OCCOc4c3 CHEMBL3740841 -> Oc1ccc(c(O)c1)c2n[nH]cc2c3ccc4OCOc4c3 CHEMBL577176 -> ``` +> 1. Rerun the tool - if a transient error on the ChEMBL server was at fault, this might be enough to fix it. +> 1. Try modifying some of the parameters. For example, reducing the Tanimoto coefficient should increase the number of compounds returned. +> 1. If all else fails, you can use the following list of SMILES: +> +> ``` +> Cc1n[nH]c(c2ccc(O)c(Cl)c2)c1c3ccc4OCCOc4c3 CHEMBL187670 +> COc1ccc(cc1)c2c(C)[nH]nc2c3ccc(O)cc3O CHEMBL192894 +> COc1ccc(c(O)c1)c2onc(C)c2c3ccc4OCCOc4c3 CHEMBL1541585 +> CCOc1ccc(c(O)c1)c2[nH]nc(C)c2c3ccccc3OC CHEMBL1504505 +> CN(CCc1c(C)n[nH]c1C)Cc2cn(C)nc2c3ccc4OCCOc4c3 CHEMBL1560480 +> COc1ccc(c(O)c1)c2[nH]nc(C)c2c3ccc4OCCCOc4c3 CHEMBL362893 +> CCCc1c(OCCCOc2cc(O)c(cc2CC)c3cc[nH]n3)ccc4CCC(Oc14)C(=O)O CHEMBL81401 +> Cc1cccc(n1)c2[nH]nc(C)c2c3ccnc4ccccc34 CHEMBL129153 +> COc1ccc(cc1OC)c2c(C)n[nH]c2c3ccc(O)cc3O CHEMBL1595327 +> COc1ccc(c(O)c1)c2noc(C)c2c3ccc4ccccc4n3 CHEMBL1486235 +> CCc1cc(c(O)cc1O)c2[nH]nc(C)c2c3ccc4OCCOc4c3 CHEMBL399530 +> Cc1[nH]nc(c2cc(Cl)c(O)cc2O)c1c3ccc4OCCOc4c3 CHEMBL191074 +> COc1ccc(c(O)c1)c2[nH]ncc2c3ccc4OCCOc4c3 CHEMBL1415374 +> Cc1n[nH]c(c2cc(Cl)ccc2O)c1c3ccc4OCCOc4c3 CHEMBL187678 +> Cc1noc(c2ccc(O)cc2O)c1c3ccc4OCCOc4c3 CHEMBL582320 +> CCOC(=O)c1oc(cc1)c2c(C)[nH]nc2c3cc(CC)c(O)cc3O CHEMBL3932805 +> NC(=O)c1ccc2[nH]nc(c3ccc4OCCOc4c3)c2c1 CHEMBL3900406 +> COc1ccc(cc1OC)c2cc([nH]n2)c3c(O)c(OC)c4occc4c3OC CHEMBL1351838 +> CCCc1cc(c(O)cc1OC)c2[nH]ncc2c3ccc4OCCCOc4c3 CHEMBL1443258 +> Oc1cc(O)c(cc1Cl)c2[nH]ncc2c3ccc4OCCOc4c3 CHEMBL191228 +> CCc1cc(c(O)cc1O)c2n[nH]cc2c3ccc4OCCOc4c3 CHEMBL3187010 +> Oc1ccccc1c2cc([nH]n2)c3ccc4OCCOc4c3 CHEMBL1567097 +> CCCc1cc(c(O)cc1O)c2[nH]ncc2c3ccc4OCCCOc4c3 CHEMBL1578064 +> COc1ccc(c(O)c1)c2[nH]nc(C)c2c3ccc(OC)c(OC)c3 CHEMBL1335688 +> COc1ccc(cc1)c2c(N)n[nH]c2c3cc(OC)c4OCCOc4c3 CHEMBL2408971 +> CCC(C)c1cc(c(O)cc1O)c2[nH]ncc2c3ccc4OCOc4c3 CHEMBL187674 +> Cc1noc(c2ccc(O)cc2O)c1c3ccc4OCCCOc4c3 CHEMBL587334 +> OCc1cn(nc1c2ccc3OCCOc3c2)c4ccccc4 CHEMBL1549407 +> Oc1ccc(c(O)c1)c2[nH]ncc2c3cccc4cccnc34 CHEMBL1305951 +> Cc1n[nH]c(c2ccc(O)cc2O)c1c3ccc4OCCOc4c3 CHEMBL188965 +> Cc1[nH]nc(c2ccc3OCC(=O)Nc3c2)c1c4ccc(F)cc4 CHEMBL3337723 +> CCOc1ccc(c(O)c1)c2n[nH]c(C)c2c3ccc(OC)cc3 CHEMBL1698243 +> COc1ccc(cc1)c2cc([nH]n2)c3c(O)c(OC)c4occc4c3OC CHEMBL1402615 +> CCc1cc(c(O)cc1O)c2[nH]ncc2c3ccc4OCOc4c3 CHEMBL1412538 +> CCOc1cc(O)c(cc1CC)c2nc(N)ncc2c3ccc4OCCOc4c3 CHEMBL547662 +> COc1ccc(cc1)c2c(N)onc2c3cc(OC)c4OCCOc4c3 CHEMBL3113121 +> CCCc1cc(c(O)cc1O)c2n[nH]cc2c3ccc4OCCOc4c3 CHEMBL3956397 +> CCC(C)c1cc(c(O)cc1O)c2[nH]nc(C)c2c3ccc4OCCOc4c3 CHEMBL190919 +> CCC(C)c1cc(c(O)cc1OC)c2[nH]ncc2c3ccc4OCCOc4c3 CHEMBL435501 +> Cn1cc(CNCc2c[nH]nc2c3ccc(F)cc3)c(n1)c4ccc5OCCOc5c4 CHEMBL1537178 +> Oc1ccc(F)cc1c2cc([nH]n2)c3ccc4OCCOc4c3 CHEMBL1451528 +> CCc1cc(c(O)cc1OCC(=O)O)c2n[nH]c(C)c2c3ccc4OCCCOc4c3 CHEMBL3952001 +> Cc1[nH]nc(c2ccc(O)c(O)c2O)c1c3ccc(Cl)cc3 CHEMBL1092945 +> COc1cc(cc(OC)c1OC)c2n[nH]nc2c3ccc4OCCOc4c3 CHEMBL3740841 +> Oc1ccc(c(O)c1)c2n[nH]cc2c3ccc4OCOc4c3 CHEMBL577176 +> ``` +> > Don't worry if you can't get it to work - successfully generating this list is a very minor part of the tutorial! {: .tip} diff --git a/topics/computational-chemistry/tutorials/setting-up-molecular-systems/tutorial.md b/topics/computational-chemistry/tutorials/setting-up-molecular-systems/tutorial.md index 90f38482b83f0..4c16bd222ce34 100644 --- a/topics/computational-chemistry/tutorials/setting-up-molecular-systems/tutorial.md +++ b/topics/computational-chemistry/tutorials/setting-up-molecular-systems/tutorial.md @@ -78,10 +78,10 @@ In this section we'll access the PDB, download the correct structure, import it > > More resources: > - - [https://en.wikipedia.org/wiki/Cellulase](https://en.wikipedia.org/wiki/Cellulase) - - [https://en.wikipedia.org/wiki/Biofuel](https://en.wikipedia.org/wiki/Biofuel) - - [Fungal Cellulases](https://pubs.acs.org/doi/full/10.1021/cr500351c) - - [Cellobiohydrolase I Induced Conformational Stability and Glycosidic Bond Polarization ](https://pubs.acs.org/doi/10.1021/ja103766w) +> - [https://en.wikipedia.org/wiki/Cellulase](https://en.wikipedia.org/wiki/Cellulase) +> - [https://en.wikipedia.org/wiki/Biofuel](https://en.wikipedia.org/wiki/Biofuel) +> - [Fungal Cellulases](https://pubs.acs.org/doi/full/10.1021/cr500351c) +> - [Cellobiohydrolase I Induced Conformational Stability and Glycosidic Bond Polarization ](https://pubs.acs.org/doi/10.1021/ja103766w) {: .details} ## Get data @@ -237,11 +237,11 @@ Go to the correct section depending on which MD engine you will be using. ### Upload to Galaxy > Upload files to Galaxy -Upload the following files to your Galaxy instance and ensure the correct datatype is selected: - - step3_pbcsetup.psf -> xplor psf input (psf format) - - step3_pbcsetup.pdb -> pdb input (pdb format) - - Checkfft.str -> PME grid specs (txt format) -- step2.1_waterbox.prm -> waterbox prm input (txt format) +> Upload the following files to your Galaxy instance and ensure the correct datatype is selected: +> - step3_pbcsetup.psf -> xplor psf input (psf format) +> - step3_pbcsetup.pdb -> pdb input (pdb format) +> - Checkfft.str -> PME grid specs (txt format) +> - step2.1_waterbox.prm -> waterbox prm input (txt format) {: .hands_on} You are now ready to run the NAMD workflow, which is discussed in another [tutorial]({% link topics/computational-chemistry/tutorials/md-simulation-namd/tutorial.md %}). diff --git a/topics/contributing/tutorials/learning-principles/tutorial.md b/topics/contributing/tutorials/learning-principles/tutorial.md index c774c9c4e3513..818fddbabac18 100644 --- a/topics/contributing/tutorials/learning-principles/tutorial.md +++ b/topics/contributing/tutorials/learning-principles/tutorial.md @@ -32,10 +32,12 @@ contributions: --- + > *Learning results from what the student does and thinks and only from what the student does and thinks. The teacher can advance learning only by influencing what the student does to learn* > > [H.A. Simon](https://en.wikipedia.org/wiki/Herbert_A._Simon) (one of the founders of the field of [Cognitive Science](https://en.wikipedia.org/wiki/Cognitive_science) and Nobel Laureate) -> +{: .quote} + --- This quotation from Herbert A. Simon clearly indicates that we cannot talk about teaching, teaching practices or effective teaching techniques if we don't understand first how people learn. @@ -211,6 +213,7 @@ We can use metaphors to shape and reveal our way of thinking about learning and, > > Plutard > +{: .quote} --- How the mind-as-a-vessel-to-be-filled metaphor may affect your way of teaching? You are likely to spend your time in the class at the blackboard, trying to 'transmit' to the students your own knowledge. diff --git a/topics/data-science/tutorials/bash-git/tutorial.md b/topics/data-science/tutorials/bash-git/tutorial.md index 2be00cc4e87bc..1aead2eb92652 100644 --- a/topics/data-science/tutorials/bash-git/tutorial.md +++ b/topics/data-science/tutorials/bash-git/tutorial.md @@ -170,7 +170,7 @@ Before diving in the tutorial, we need to open {% tool [RStudio](interactive_too > > $ conda create -n name_of_your_env nano git > > $ conda activate name_of_your_env > > ``` -> {: .code_in} +> {: .code-in} > > > | Software | Version | Manual | Available for | Description | @@ -359,10 +359,9 @@ same commands to choose another editor or update your email address. > More generally, you can get the list of available `git` commands and further resources of the Git manual typing: > > > Access available commands -> >```bash -> >$ git help -> >``` -> +> > ```bash +> > $ git help +> > ``` > {: .code-in} > {: .tip} @@ -499,7 +498,6 @@ wording of the output might be slightly different. > > to the `suspects` directory. > > > {: .solution} - {: .question} > "Nested" repositories @@ -1143,7 +1141,7 @@ Let's save our changes: > > ``` > {: .code-in} > -> > Note, our newly created empty directory `mysteries` does not appear in +> Note, our newly created empty directory `mysteries` does not appear in > the list of untracked files even if we explicitly add it (_via_ `git add`) to our > repository. This is the reason why you will sometimes see `.gitkeep` files > in otherwise empty directories. Unlike `.gitignore`, these files are not special diff --git a/topics/data-science/tutorials/bash-variant-calling/tutorial.md b/topics/data-science/tutorials/bash-variant-calling/tutorial.md index f13a7c0cc0f40..3aca4e29413e9 100644 --- a/topics/data-science/tutorials/bash-variant-calling/tutorial.md +++ b/topics/data-science/tutorials/bash-variant-calling/tutorial.md @@ -124,7 +124,7 @@ The alignment process consists of two steps: > > $ mv sub/ ~/dc_workshop/data/trimmed_fastq_small > > ``` > {: .code-in} -> > +> > You will also need to create directories for the results that will be generated as part of this workflow. We can do this in a single line of code, because `mkdir` can accept multiple new directory names as input. > > > Create result directories diff --git a/topics/data-science/tutorials/python-conda/tutorial.md b/topics/data-science/tutorials/python-conda/tutorial.md index c4ce063e812e3..0e0b9473a2ae3 100644 --- a/topics/data-science/tutorials/python-conda/tutorial.md +++ b/topics/data-science/tutorials/python-conda/tutorial.md @@ -135,7 +135,7 @@ They also enable you to use a specific older version of a package for your proje > A Specific Package Version is Only Ever Installed Once > Note that you will not have a separate package installations for each of your projects - they will only -ever be installed once on your system (in `$CONDA/pkgs`) but will be referenced from different environments. +> ever be installed once on your system (in `$CONDA/pkgs`) but will be referenced from different environments. {: .tip} ### Managing Conda Environments diff --git a/topics/data-science/tutorials/python-iterables/tutorial.md b/topics/data-science/tutorials/python-iterables/tutorial.md index 916f2bd8d918c..919fda7f2dd59 100644 --- a/topics/data-science/tutorials/python-iterables/tutorial.md +++ b/topics/data-science/tutorials/python-iterables/tutorial.md @@ -573,8 +573,9 @@ print(f'second time: {values}') # should print [3, 5] > > lithium > > > > ``` -> The first statement prints the whole string, since the slice goes beyond the total length of the string. -> The second statement returns an empty string, because the slice goes "out of bounds" of the string. +> > +> > The first statement prints the whole string, since the slice goes beyond the total length of the string. +> > The second statement returns an empty string, because the slice goes "out of bounds" of the string. > {: .solution} {: .question} diff --git a/topics/data-science/tutorials/python-linting/tutorial.md b/topics/data-science/tutorials/python-linting/tutorial.md index f2cc4c37aaae7..322dcc19a7aa4 100644 --- a/topics/data-science/tutorials/python-linting/tutorial.md +++ b/topics/data-science/tutorials/python-linting/tutorial.md @@ -51,6 +51,7 @@ worth spending some time learning a bit about Python coding style conventions to make sure that your code is consistently formatted and readable by yourself and others. > *"Any fool can write code that a computer can understand. Good programmers write code that humans can understand."* - [Martin Fowler](https://en.wikiquote.org/wiki/Martin_Fowler), British software engineer, author and international speaker on software development +{: .quote} > > @@ -483,4 +484,4 @@ PyCharm also displays the docstring for a function/module in a little help popup ```python help(fibonacci) -``` \ No newline at end of file +``` diff --git a/topics/data-science/tutorials/python-venv/tutorial.md b/topics/data-science/tutorials/python-venv/tutorial.md index f75fc43281657..9bc22f6421360 100644 --- a/topics/data-science/tutorials/python-venv/tutorial.md +++ b/topics/data-science/tutorials/python-venv/tutorial.md @@ -117,7 +117,7 @@ They also enable you to use a specific older version of a package for your proje > A Specific Python or Package Version is Only Ever Installed Once > Note that you will not have a separate Python or package installations for each of your projects - they will only -ever be installed once on your system but will be referenced from different virtual environments. +> ever be installed once on your system but will be referenced from different virtual environments. {: .tip} ### Managing Python Virtual Environments @@ -211,20 +211,20 @@ environment and the standard Python library, > Naming Virtual Environments > What is a good name to use for a virtual environment? Using "venv" or ".venv" as the -name for an environment and storing it within the project's directory seems to be the recommended way - -this way when you come across such a subdirectory within a software project, -by convention you know it contains its virtual environment details. -A slight downside is that all different virtual environments -on your machine then use the same name and the current one is determined by the context of the path -you are currently located in. A (non-conventional) alternative is to -use your project name for the name of the virtual environment, with the downside that there is nothing to indicate -that such a directory contains a virtual environment. In our case, we have settled to use the name "venv" since it is -not a hidden directory and we want it to be displayed by the command line when listing directory contents (hence, -no need for the "." in its name that would, by convention, make it hidden). In the future, -you will decide what naming convention works best for you. Here are some references for each of the naming conventions: -- [The Hitchhiker's Guide to Python](https://docs.python-guide.org/dev/virtualenvs/) notes that "venv" is the general convention used globally -- [The Python Documentation](https://docs.python.org/3/library/venv.html) indicates that ".venv" is common -- ["venv" vs ".venv" discussion](https://discuss.python.org/t/trying-to-come-up-with-a-default-directory-name-for-virtual-environments/3750) +> name for an environment and storing it within the project's directory seems to be the recommended way - +> this way when you come across such a subdirectory within a software project, +> by convention you know it contains its virtual environment details. +> A slight downside is that all different virtual environments +> on your machine then use the same name and the current one is determined by the context of the path +> you are currently located in. A (non-conventional) alternative is to +> use your project name for the name of the virtual environment, with the downside that there is nothing to indicate +> that such a directory contains a virtual environment. In our case, we have settled to use the name "venv" since it is +> not a hidden directory and we want it to be displayed by the command line when listing directory contents (hence, +> no need for the "." in its name that would, by convention, make it hidden). In the future, +> you will decide what naming convention works best for you. Here are some references for each of the naming conventions: +> - [The Hitchhiker's Guide to Python](https://docs.python-guide.org/dev/virtualenvs/) notes that "venv" is the general convention used globally +> - [The Python Documentation](https://docs.python.org/3/library/venv.html) indicates that ".venv" is common +> - ["venv" vs ".venv" discussion](https://discuss.python.org/t/trying-to-come-up-with-a-default-directory-name-for-virtual-environments/3750) {: .tip} Once you’ve created a virtual environment, you will need to activate it: diff --git a/topics/dev/tutorials/debugging/tutorial.md b/topics/dev/tutorials/debugging/tutorial.md index c1f24ca586a7c..2c9d2cc8bde3e 100644 --- a/topics/dev/tutorials/debugging/tutorial.md +++ b/topics/dev/tutorials/debugging/tutorial.md @@ -560,7 +560,8 @@ Our last error happens at runtime, which means we don't have a failing test; ins > > Here's the bug report: > -> ```In the User menu, clicking the Datasets option causes an error message to be displayed on the page: "Uncaught exception in exposed API method".``` +> > In the User menu, clicking the Datasets option causes an error message to be displayed on the page: "Uncaught exception in exposed API method". +> {: .quote} > > Make sure you are in `GALAXY_ROOT`. Then start your local Galaxy using the `uvicorn` command*: > @@ -716,9 +717,9 @@ Our last error happens at runtime, which means we don't have a failing test; ins > > > Pdb > > ``` -> (Pdb) str_as_bool('True') -> False -> ``` +> > (Pdb) str_as_bool('True') +> > False +> > ``` > {: .code-in} > > That doesn't look right! Exit the debugger (type in `q`, then `Enter`, then `CTRL-C ` to exit Galaxy. diff --git a/topics/ecology/tutorials/champs-blocs/tutorial.md b/topics/ecology/tutorials/champs-blocs/tutorial.md index 7dff8cc0c6375..22c4df6b10def 100644 --- a/topics/ecology/tutorials/champs-blocs/tutorial.md +++ b/topics/ecology/tutorials/champs-blocs/tutorial.md @@ -80,12 +80,14 @@ Now let's focus on our workflow on boulder field ecological state {% include _includes/cyoa-choices.html option1="Yes" option2="No" default="Yes" text="Are your ESTAMP data ready ?" %}
-> 1. Download your data on ESTAMP [estamp.afbiodiversite.fr](https://estamp.afbiodiversite.fr/) website, clicking on "Accédez aux données" at the bottom of the page. You will get a zip folder. -> -> 2. Unzip your folder. In the folder three files .csv will interest us : -> - champbloc_ivr.csv -> - champbloc_qecb.csv -> - ficheterrain.csv + +1. Download your data on ESTAMP [estamp.afbiodiversite.fr](https://estamp.afbiodiversite.fr/) website, clicking on "Accédez aux données" at the bottom of the page. You will get a zip folder. +2. Unzip your folder. In the folder three files .csv will interest us : + + - champbloc_ivr.csv + - champbloc_qecb.csv + - ficheterrain.csv +
diff --git a/topics/ecology/tutorials/regionalGAM/tutorial.md b/topics/ecology/tutorials/regionalGAM/tutorial.md index 6d165a0243147..c047f191ea1e2 100644 --- a/topics/ecology/tutorials/regionalGAM/tutorial.md +++ b/topics/ecology/tutorials/regionalGAM/tutorial.md @@ -143,8 +143,8 @@ Here, we will only keep the sites that are in the Netherlands (NLBMS.XX). We wan > > > > > > You can do that using: > > > 1. **Paste two files side by side tool** {% icon tool %} with the following parameters: -> > - {% icon param-file %} *"paste"*: output from **RData parser** {% icon tool %} headed with "SPECIES" -> > - {% icon param-file %}*"and"*: output from **RData parser** {% icon tool %} with headed with "SITE" +> > > - {% icon param-file %} *"paste"*: output from **RData parser** {% icon tool %} headed with "SPECIES" +> > > - {% icon param-file %}*"and"*: output from **RData parser** {% icon tool %} with headed with "SITE" > > > 2. Repeat **Paste two files side by side** {% icon tool %} executions as many times as there are separated files in order to create a final dataset with all the columns: > > > 1. Repeat **Paste two files side by side tool** {% icon tool %} to paste the file containing 2 columns with the one headed by `YEAR` > > > 1. Repeat **Paste two files side by side tool** {% icon tool %} to paste the file containing 3 columns with the one headed by `MONTH` diff --git a/topics/ecology/tutorials/x-array-map-plot/tutorial.md b/topics/ecology/tutorials/x-array-map-plot/tutorial.md index 607de204cc665..7494ce593047f 100644 --- a/topics/ecology/tutorials/x-array-map-plot/tutorial.md +++ b/topics/ecology/tutorials/x-array-map-plot/tutorial.md @@ -47,14 +47,14 @@ contributors: > {: .agenda} ->Background +> Background > ->According to [UN](https://www.un.org/en/climatechange/what-is-climate-change) , Climate is the long term shift in temperature and weather patterns which may be due to natural or artificial causes. To learn more about climate, refer this [tutorial]({% link topics/climate/tutorials/climate-101/tutorial.md %}) from the GTN. Due to the frequently changing nature of the weather patterns, the size of the collected data is huge. -The climate data is mainly represented in these three categories : [NetCDF](https://en.wikipedia.org/wiki/NetCDF) (Network Common Data Form), [HDF](https://en.wikipedia.org/wiki/Hierarchical_Data_Format) (Hierarchical Data Format) , [GRIB](https://en.wikipedia.org/wiki/GRIB) (GRIdded Binary or General Regularly-distributed Information in Binary form). +> According to [UN](https://www.un.org/en/climatechange/what-is-climate-change) , Climate is the long term shift in temperature and weather patterns which may be due to natural or artificial causes. To learn more about climate, refer this [tutorial]({% link topics/climate/tutorials/climate-101/tutorial.md %}) from the GTN. Due to the frequently changing nature of the weather patterns, the size of the collected data is huge. +> The climate data is mainly represented in these three categories : [NetCDF](https://en.wikipedia.org/wiki/NetCDF) (Network Common Data Form), [HDF](https://en.wikipedia.org/wiki/Hierarchical_Data_Format) (Hierarchical Data Format) , [GRIB](https://en.wikipedia.org/wiki/GRIB) (GRIdded Binary or General Regularly-distributed Information in Binary form). > ->The NetCDF file format is basically used for storing multidimensional data which generally consists of variables such as temperature, precipitation, wind direction, etc. The variation of climate variables over a period of time is suitably plotted using this dataset. The entire earth is divided into both horizontal as well as vertical coordinates which makes plotting of the variables such as the ocean temperatures possible. +> The NetCDF file format is basically used for storing multidimensional data which generally consists of variables such as temperature, precipitation, wind direction, etc. The variation of climate variables over a period of time is suitably plotted using this dataset. The entire earth is divided into both horizontal as well as vertical coordinates which makes plotting of the variables such as the ocean temperatures possible. > ->The coordinate system, types of projections and colormaps are some of the very important considerations in achieving the most suitable visualization option. +> The coordinate system, types of projections and colormaps are some of the very important considerations in achieving the most suitable visualization option. {: .comment} @@ -257,13 +257,14 @@ We have hourly data. In order to plot it, we must first extract the hours from t > > > > The syntax of using the `seltimestep` is `(initial data number / final data entry)`. An important thing to pay attention to is how the data entries are numbered: are they numbered starting from 0 or 1. Accordingly we can add or skip adding 1 to the data number to achieve the desired result. > > -Although we are not using `splithour` here, you can find below the syntax for future uses. +> > Although we are not using `splithour` here, you can find below the syntax for future uses. > > -> >1. {% tool [CDO Operations](toolshed.g2.bx.psu.edu/repos/climate/cdo_operations/cdo_operations/2.0.0+galaxy0) %} with the following parameters: -> >- In *"CDO Operators"*: -> >- {% icon param-repeat %} *"Insert CDO Operators"* -> >- *"Select cdo operator"*: `splithour (Split hours)` -> >- {% icon param-file %} *"Additional input file"*: `outfile.netcdf` generated from the previous step. +> > 1. {% tool [CDO Operations](toolshed.g2.bx.psu.edu/repos/climate/cdo_operations/cdo_operations/2.0.0+galaxy0) %} with the following parameters: +> > +> > - In *"CDO Operators"*: +> > - {% icon param-repeat %} *"Insert CDO Operators"* +> > - *"Select cdo operator"*: `splithour (Split hours)` +> > - {% icon param-file %} *"Additional input file"*: `outfile.netcdf` generated from the previous step. > > > > This step generates that `N` number of `outfiles.netcdf` files where `N` is the range of selection. > > Suppose your selected range was `744/744` for the `seltimestep` , then it will generate `2` files which can be plotted further. @@ -875,9 +876,10 @@ Although we are not using `splithour` here, you can find below the syntax for fu > > > > 1. Every piece of data recites a story. The air temperature at a certain height has a lot of significance in major commercial and day to day activities. Read this data-blog on the above analysis. [Click Here](https://quickbeasts51429.github.io/Outreachy_Galaxy_Community_contributor/). > > 2. The tutorial has summed up a proper way of plotting data from a netcdf file. It has discussed everything from loading of data to its final display. Some other key points to keep in mind are : -> > > 1. It may take some time while plotting the maps. It depends on traffic / load on the Galaxy server. It is suggested to have a 64-bit processor with 8GB RAM storage. Be patient. -> > > 2. You can view as well as download the generated plots to use further. -> > > 3. Plotting over global maps is very convinient as you saw above. But many a times, you want to plot a specific region, it becomes very easy using CDO tool. Refer to [this tutorial]({% link topics/climate/tutorials/pangeo/tutorial.md %}) for more info. +> > +> > 1. It may take some time while plotting the maps. It depends on traffic / load on the Galaxy server. It is suggested to have a 64-bit processor with 8GB RAM storage. Be patient. +> > 2. You can view as well as download the generated plots to use further. +> > 3. Plotting over global maps is very convinient as you saw above. But many a times, you want to plot a specific region, it becomes very easy using CDO tool. Refer to [this tutorial]({% link topics/climate/tutorials/pangeo/tutorial.md %}) for more info. > > > > 3. If you wish to present all the plotted maps at one place for comparision or analysis. It is a short and simple step and can be doe using the tool name `Image Montage`. > > diff --git a/topics/epigenetics/tutorials/formation_of_super-structures_on_xi/tutorial.md b/topics/epigenetics/tutorials/formation_of_super-structures_on_xi/tutorial.md index 963f18f54feb3..43016dc82e323 100644 --- a/topics/epigenetics/tutorials/formation_of_super-structures_on_xi/tutorial.md +++ b/topics/epigenetics/tutorials/formation_of_super-structures_on_xi/tutorial.md @@ -410,8 +410,8 @@ To learn how to do the normalization, we will take the `wt_H3K4me3_rep1` sample > > > > > > > > 1. This tool generates a table with 4 columns: reference sequence identifier, reference sequence length, number of mapped reads and number of placed but unmapped reads. Here it estimates how many reads mapped to which chromosome. Furthermore, it tells the chromosome lengths and naming convention (with or without 'chr' in the beginning) -> > 2. 1,204,821 for ChIP-seq samples and 1,893,595 for the input -> > 3. The number of reads can be different because of different sequencing depth. It can bias the interpretation of the number of reads mapped to a specific genome region and the identification of the H3K4me3 sites. Specially here, as the number of reads for the input is higher than the ChIP data less regions could be identified having a significantly higher read coverage for the ChIP data comparing to the corresponding input. +> > > 2. 1,204,821 for ChIP-seq samples and 1,893,595 for the input +> > > 3. The number of reads can be different because of different sequencing depth. It can bias the interpretation of the number of reads mapped to a specific genome region and the identification of the H3K4me3 sites. Specially here, as the number of reads for the input is higher than the ChIP data less regions could be identified having a significantly higher read coverage for the ChIP data comparing to the corresponding input. > > {: .solution } > {: .question} {: .hands_on} diff --git a/topics/galaxy-interface/tutorials/rstudio/tutorial.md b/topics/galaxy-interface/tutorials/rstudio/tutorial.md index d14d7a00e14b1..5de2e550d737a 100644 --- a/topics/galaxy-interface/tutorials/rstudio/tutorial.md +++ b/topics/galaxy-interface/tutorials/rstudio/tutorial.md @@ -198,13 +198,15 @@ You have hopefully noticed a pattern - an R function has three key properties: 2. A pair of `()` after the name 3. 0 or more arguments inside the parentheses - An argument may be a specific input for your function and/or may modify the function's behavior. For example the function `round()` will round a number with a decimal: - - ```R - # This will round a number to the nearest integer - > round(3.14) - [1] 3 - ``` + An argument may be a specific input for your function and/or may modify the function's behavior. For example the function `round()` will round a number with a decimal: + + > + > ```R + > # This will round a number to the nearest integer + > > round(3.14) + > [1] 3 + > ``` + {: .code_in} ## Getting help diff --git a/topics/galaxy-interface/tutorials/upload-data-to-ena/tutorial.md b/topics/galaxy-interface/tutorials/upload-data-to-ena/tutorial.md index d644e55609deb..e306f1bade580 100644 --- a/topics/galaxy-interface/tutorials/upload-data-to-ena/tutorial.md +++ b/topics/galaxy-interface/tutorials/upload-data-to-ena/tutorial.md @@ -263,15 +263,15 @@ We will link it to the reads submitted in the first step using the accession num > - *"Submit to test ENA server?"*: `yes` > - *"Validate files and metadata but do not submit"*: `no` > - Fill the assembly metadata. For our assembly: -> > - *"Assembly type"*: `Clone` -> > - *"Assembly program"*: `BWA-MEM` -> > - *"Molecule type"*: `genomic RNA` -> > - *"Coverage"*: `1000` -> > - *"Select the method to load study and sample metadata"*: `Fill in required metadata` -> > - *"Assembly name"*: give a name to your assembly. -> > - *"Study accession"*: `ERP139884` (you can find the Study accession number from your raw data submission metadata ticket) -> > - *"Sample accession"*: `ERS12519941` (you can find the Sample accession number from your raw data submission metadata ticket) -> > - *"Sequencing platform"*: `Illumina` +> - *"Assembly type"*: `Clone` +> - *"Assembly program"*: `BWA-MEM` +> - *"Molecule type"*: `genomic RNA` +> - *"Coverage"*: `1000` +> - *"Select the method to load study and sample metadata"*: `Fill in required metadata` +> - *"Assembly name"*: give a name to your assembly. +> - *"Study accession"*: `ERP139884` (you can find the Study accession number from your raw data submission metadata ticket) +> - *"Sample accession"*: `ERS12519941` (you can find the Sample accession number from your raw data submission metadata ticket) +> - *"Sequencing platform"*: `Illumina` > - Select the consensus sequence assembly file from your history: `SRR10903401.fasta` > {: .hands_on} diff --git a/topics/galaxy-interface/tutorials/workflow-automation/tutorial.md b/topics/galaxy-interface/tutorials/workflow-automation/tutorial.md index 8e07b9611f336..30873dd673ecc 100644 --- a/topics/galaxy-interface/tutorials/workflow-automation/tutorial.md +++ b/topics/galaxy-interface/tutorials/workflow-automation/tutorial.md @@ -173,15 +173,16 @@ The `tutorial.ga` file defines the workflow in JSON format; if we are confident > > 3. Replace the placeholder values in the job file, so that it looks like the following: > -> > ```yaml -> > Dataset 1: -> > class: File -> > path: dataset1.txt -> > Dataset 2: -> > class: File -> > path: dataset2.txt -> > Number of lines: 3 -> > ``` +> ```yaml +> Dataset 1: +> class: File +> path: dataset1.txt +> Dataset 2: +> class: File +> path: dataset2.txt +> Number of lines: 3 +> ``` +> > Now we are ready to execute the workflow with our chosen parameters! {: .hands_on} @@ -225,17 +226,19 @@ Every object associated with Galaxy, including workflows, datasets and dataset c > 1. Click on the {% icon galaxy-info %} *View details* icon on the dataset in the history. > 2. Under the heading `Dataset Information`, find the row `History Content API ID` and copy the hexadecimal ID next to it. > 2. Modify `tutorial-init-job.yml` to look like the following: -> > ```yaml -> > Dataset 1: -> > class: File -> > # path: dataset1.txt -> > galaxy_id: -> > Dataset 2: -> > class: File -> > # path: dataset2.txt -> > galaxy_id: -> > Number of lines: 3 -> > ``` +> +> ```yaml +> Dataset 1: +> class: File +> # path: dataset1.txt +> galaxy_id: +> Dataset 2: +> class: File +> # path: dataset2.txt +> galaxy_id: +> Number of lines: 3 +> ``` +> > 3. Now we need to get the workflow ID: > 1. Go to the workflows panel in Galaxy and find one of the workflows that have just been uploaded. > 2. From the dropdown menu, select `Edit`, to take you to the workflow editing interface. diff --git a/topics/genome-annotation/tutorials/crispr-screen/tutorial.md b/topics/genome-annotation/tutorials/crispr-screen/tutorial.md index e986e8960e864..021f3c4647396 100644 --- a/topics/genome-annotation/tutorials/crispr-screen/tutorial.md +++ b/topics/genome-annotation/tutorials/crispr-screen/tutorial.md @@ -382,8 +382,8 @@ If we want to compare the drug treatment (T8-APR-246) to the vehicle control (T8 > Replicates > > If we have biological and/or technical replicates we can handle them in a similar way to that described on the [MAGeCK website](https://sourceforge.net/p/mageck/wiki/QA/#how-to-deal-with-biological-replicates-and-technical-replicates). -For biological replicates, we input them in MAGeCK test Treated Sample Labels/Control Sample Labels fields separated by a comma. -For technical replicates, we could combine the fastqs for each sample/biological replicate, for example with the **Concatenate datasets** tool, before running MAGeCK count. +> For biological replicates, we input them in MAGeCK test Treated Sample Labels/Control Sample Labels fields separated by a comma. +> For technical replicates, we could combine the fastqs for each sample/biological replicate, for example with the **Concatenate datasets** tool, before running MAGeCK count. > {: .comment} diff --git a/topics/genome-annotation/tutorials/hpc-for-lsgc/tutorial.md b/topics/genome-annotation/tutorials/hpc-for-lsgc/tutorial.md index eeeae4aee3445..96dff181db0d3 100644 --- a/topics/genome-annotation/tutorials/hpc-for-lsgc/tutorial.md +++ b/topics/genome-annotation/tutorials/hpc-for-lsgc/tutorial.md @@ -322,7 +322,7 @@ Let us now jump into the hands-on! We will learn how to compare chromosomes with > > 4. The *"Add grid to plot for multi-fasta data sets"* parameter adds grid lines to the plot to separate multiple sequences. This is useful when using multi-fasta inputs. > > 5. The *"Generate image of detected events"* parameter will include, if enabled, a plot of the rearrangements detected with colors, one per type of rearrangement. More on that later! > {: .comment} -> > +> > {: .hands_on} @@ -339,7 +339,7 @@ Figure 3 shows the comparison plot for the plant chromosomes. Notice that the or - Blocks that are not in the main diagonal are "transposed", meaning that they have been rearranged in one of the sequences but not in the other. These can also be inverted. > Note on the comparison plot -Notice that the comparison plot is only an approximation. It is aimed at showing the general location and direction of syntenies. For example, if `CHROMEISTER` was run with parameter **Output dotplot size** equal to `1000`, then each pixel in the plot contains the averaged information of nearly 500,000 base pairs! Thus, any block can contain lots of smaller rearrangements, mutations, inversions, etc., that are ignored for the sake of providing a clean overview of the general alignment direction in a pairwise comparison. +> Notice that the comparison plot is only an approximation. It is aimed at showing the general location and direction of syntenies. For example, if `CHROMEISTER` was run with parameter **Output dotplot size** equal to `1000`, then each pixel in the plot contains the averaged information of nearly 500,000 base pairs! Thus, any block can contain lots of smaller rearrangements, mutations, inversions, etc., that are ignored for the sake of providing a clean overview of the general alignment direction in a pairwise comparison. {: .comment} Also note that a `score` value can be seen in the title of the plot (this value is also available in the **Comparison score** file). This value is calculated based on the alignment coverage and number of rearrangements and can be used to automatically filter out similar from dissimilar sequence comparisons. A value of `0` means that the sequences are nearly equal (rearrangement-wise), whereas a value closer to `1` means that the sequences are more dissimilar. diff --git a/topics/imaging/tutorials/multiplex-tissue-imaging-TMA/tutorial.md b/topics/imaging/tutorials/multiplex-tissue-imaging-TMA/tutorial.md index 2ff16b11c8aa4..bd8571b3b1221 100644 --- a/topics/imaging/tutorials/multiplex-tissue-imaging-TMA/tutorial.md +++ b/topics/imaging/tutorials/multiplex-tissue-imaging-TMA/tutorial.md @@ -136,13 +136,13 @@ After illumination is corrected across round tiles, the tiles must be stitched t > **Imaging platform differences** > > ASHLAR, among other tools in the MCMICRO and Galaxy-ME pre-processing tools have some parameters that are specific to the -imaging patform used. By default, ASHLAR is oriented to work with images from RareCyte scanners. AxioScan scanners render images -in a different orientation. Because of this, when using ASHLAR on AxioScan images, it is important to select the **Flip Y-Axis** -parameter to *Yes* +> imaging patform used. By default, ASHLAR is oriented to work with images from RareCyte scanners. AxioScan scanners render images +> in a different orientation. Because of this, when using ASHLAR on AxioScan images, it is important to select the **Flip Y-Axis** +> parameter to *Yes* > > ASHLAR will work for most imaging modalities; however, certain modalities require different tools to be registered. For example, -multiplex immunohistochemistry (mIHC) images must use an aligner that registers each moving image to a reference Hematoxylin image. -For this, Galaxy-ME includes the alternative registration tool {% tool **PALOM** %}. +> multiplex immunohistochemistry (mIHC) images must use an aligner that registers each moving image to a reference Hematoxylin image. +> For this, Galaxy-ME includes the alternative registration tool {% tool **PALOM** %}. > {: .warning} @@ -253,7 +253,7 @@ Learn more about this file format at the [anndata documentation](https://anndata > > Important parameter: Unique names for cells/rows > > > > Setting *"Whether to use unique name for cells/rows"* to `No` to ensures that downstream interactive visualizations will be able to map observational features to the mask CellIDs. -> {: .comment} +> {: .warning} > {: .hands_on} diff --git a/topics/metabolomics/tutorials/gc_ms_with_xcms/tutorial.md b/topics/metabolomics/tutorials/gc_ms_with_xcms/tutorial.md index b031ad366dd4e..917ff51c0409e 100644 --- a/topics/metabolomics/tutorials/gc_ms_with_xcms/tutorial.md +++ b/topics/metabolomics/tutorials/gc_ms_with_xcms/tutorial.md @@ -328,7 +328,7 @@ The spectral data comes as an `.msp` file, which is a text file structured accor > > Click *"View data"* {% icon galaxy-eye %} icon next to the dataset in the Galaxy history. The contents of the file would look like this: > -> > {% snippet faqs/galaxy/datasets_icons.md %} +> {% snippet faqs/galaxy/datasets_icons.md %} > > ``` > NAME:C001 @@ -454,19 +454,19 @@ We use the cosine score with a greedy peak pairing heuristic to compute the numb {: .hands_on} > Overview of the spectral similarity scores -> -> > ### Cosine Greedy +> +> ### Cosine Greedy > The cosine score, also known as the dot product, is based on representing the similarity of two spectra through the cosine of an angle between the vectors that the spectra produce. Two peaks are considered as matching if their *m/z* values lie within the given tolerance. Cosine greedy looks up matching peaks in a "greedy" way, which does not always lead to the most optimal alignments. > > This score was among the first to be used for looking up matching spectra in spectral libraries and, to this day, remains one of the most popular scoring methods for both library matching and molecular networking workflows. > -> > ### Cosine Hungarian +> ### Cosine Hungarian > This method computes the similarities in the same way as the *Cosine Greedy* but with a difference in *m/z* peak alignment. The difference lies in that the Hungarian algorithm is used here to find matching peaks. This leads to the best peak pairs match but can take significantly longer than the "greedy" algorithm. > -> > ### Modified Cosine +> ### Modified Cosine > Modified Cosine is another, as its name states, representative of the family of cosine-based scores. This method aligns peaks by finding the best possible matches and considers two peaks a match if their *m/z* values are within tolerance before or after a mass-shift is applied. A mass shift is essentially a difference of precursor-*m/z* of two compared spectra. The similarity is then again expressed as a cosine of the angle between two vectors. > -> > ### Neutral Losses Cosine +> ### Neutral Losses Cosine > Neutral Loss metric works similarly to all described above with one major difference: instead of encoding the spectra as "intensity vs *m/z*" vector, it encodes it to an "intensity vs *Δm/z*", where delta is computed as an *m/z* difference between precursor and a fragment *m/z*. This, in theory, could better capture the underlying structural similarities between molecules. > {: .details} diff --git a/topics/metabolomics/tutorials/gcms/tutorial.md b/topics/metabolomics/tutorials/gcms/tutorial.md index 83f95fa5969f4..25ebdfd7032e1 100644 --- a/topics/metabolomics/tutorials/gcms/tutorial.md +++ b/topics/metabolomics/tutorials/gcms/tutorial.md @@ -147,11 +147,11 @@ Concerning the current GC-MS tutorial, you **just have to compute the following > > > > > > To merge your data, you need to **input a sampleMetadata file** containing filenames and their metadata informations like their class for example. -If you don't add a sampleMetadata file, the **xcms findChromPeaks Merger** {% icon tool %} tool will **group all your files together**. -You can also **create your sampleMetadata file** with W4M Galaxy tool **xcms get a sampleMetadata file** {% icon tool %} with the following parameters: *"RData file"* outputed from **MSnbase readMSData** {% icon tool %}. -Here is an example of the minimum expectations about a sampleMetadata file (**important**: don't write the format of the file, just their names): -> > {: .text-justify} -> > +> > If you don't add a sampleMetadata file, the **xcms findChromPeaks Merger** {% icon tool %} tool will **group all your files together**. +> > You can also **create your sampleMetadata file** with W4M Galaxy tool **xcms get a sampleMetadata file** {% icon tool %} with the following parameters: *"RData file"* outputed from **MSnbase readMSData** {% icon tool %}. +> > Here is an example of the minimum expectations about a sampleMetadata file (**important**: don't write the format of the file, just their names): +> > +> > > > | sample_name | class | > > |:-----------:|:-------:| > > | file1 | man | @@ -159,7 +159,7 @@ Here is an example of the minimum expectations about a sampleMetadata file (**im > > | file2 | woman | > > |-------------+---------| > > | file3 | man | -> > +> > > {: .comment} > {: .hands_on} @@ -198,7 +198,7 @@ The outputs of this strategy are similar to the ones discribed in the LC-MS tuto > > During each step of pre-processing, your dataset has its format changed and can have also its name changed. > To be able to continue to MSMS processing, you need to have a RData object wich is **merged and grouped** (from **xcms findChromPeaks Merger** {% icon tool %} and **xcms groupChromPeaks (group)** {% icon tool %}) at least. -It means that you should have a file named `xset.merged.groupChromPeaks.RData` (and maybe with some step more in it). +> It means that you should have a file named `xset.merged.groupChromPeaks.RData` (and maybe with some step more in it). {: .comment} diff --git a/topics/metabolomics/tutorials/lcms-preprocessing/tutorial.md b/topics/metabolomics/tutorials/lcms-preprocessing/tutorial.md index 6ae3d08c9594f..4b99f30d26176 100644 --- a/topics/metabolomics/tutorials/lcms-preprocessing/tutorial.md +++ b/topics/metabolomics/tutorials/lcms-preprocessing/tutorial.md @@ -390,10 +390,10 @@ Once your sampleMetadata table is ready, you can proceed to the upload. In this > > > > > > > > > 1. At least 2, with the identifiers and the class column. But as many as you need to describe the potential variability of your samples -> (*e.g.* the person in charge of the sample preparation, the temperature...). This will allow later statistical analysis to expose the relevant parameters. +> > > (*e.g.* the person in charge of the sample preparation, the temperature...). This will allow later statistical analysis to expose the relevant parameters. > > > 2. Sample, QC, blank... The class (the 2nd column) is useful for the preprocessing step with XCMS to detect the metabolite across the samples. -> Thus, it can be important to separate very different types of samples, as biological ones and blank ones for example. If you don't have any specific class -> that you want to consider in XCMS preprocessing, just fill everywhere with `sample` or a dot `.` for example. +> > > Thus, it can be important to separate very different types of samples, as biological ones and blank ones for example. If you don't have any specific class +> > > that you want to consider in XCMS preprocessing, just fill everywhere with `sample` or a dot `.` for example. > > > > > {: .solution} > > @@ -746,10 +746,10 @@ The algorithm uses statistical smoothing methods. You can choose between linear > > > > > > If you have a very large number of samples (*e.g.* a thousand), it might be impossible to find peaks that are present in 100% of your samples. -> If that is the case and you still set a very high value for the minimum required fraction of samples, the tool can not complete successfully the retention -> time correction. -> A special attention should also be given to this parameter when you expect a large number of peaks not to be present in part of your samples -> (*e.g.* when dealing with some blank samples). +> > If that is the case and you still set a very high value for the minimum required fraction of samples, the tool can not complete successfully the retention +> > time correction. +> > A special attention should also be given to this parameter when you expect a large number of peaks not to be present in part of your samples +> > (*e.g.* when dealing with some blank samples). > {: .comment} > > > Comment to W4M users diff --git a/topics/metagenomics/tutorials/pathogen-detection-from-nanopore-foodborne-data/tutorial.md b/topics/metagenomics/tutorials/pathogen-detection-from-nanopore-foodborne-data/tutorial.md index ac016adeb4842..172158069cfec 100644 --- a/topics/metagenomics/tutorials/pathogen-detection-from-nanopore-foodborne-data/tutorial.md +++ b/topics/metagenomics/tutorials/pathogen-detection-from-nanopore-foodborne-data/tutorial.md @@ -908,7 +908,7 @@ To identify variants, we > > [__Medaka consensus tool__ and __medaka variant tool__](https://github.com/nanoporetech/medaka) can be also used instead of **Clair3**, they give similar results but they are much slower then **Clair3** and offer fewer options. - {: .comment-on} + {: .comment}
@@ -948,7 +948,7 @@ To identify variants, we > > [__LoFreq filter__](https://csb5.github.io/lofreq/) can be also used instead, both tools performs equal and fast results. - {: .comment-on} + {: .comment}
diff --git a/topics/sequence-analysis/tutorials/ncbi-blast-against-the-madland/tutorial.md b/topics/sequence-analysis/tutorials/ncbi-blast-against-the-madland/tutorial.md index 04f9f0ad8d241..d470a281e8922 100644 --- a/topics/sequence-analysis/tutorials/ncbi-blast-against-the-madland/tutorial.md +++ b/topics/sequence-analysis/tutorials/ncbi-blast-against-the-madland/tutorial.md @@ -58,11 +58,11 @@ MAdLandDB is a protein database comprising of a comprehensive collection of full > {: .hands_on} -> We just imported a FASTA file into Galaxy. Now, the next would be to perfrom the BLAST analysis against MAdLandDB. +We just imported a FASTA file into Galaxy. Now, the next would be to perfrom the BLAST analysis against MAdLandDB. ## Perform NCBI Blast+ on Galaxy -> Since MAdLandDB is the collection of protein sequences, You can perform {% tool [BLASTp](toolshed.g2.bx.psu.edu/repos/devteam/ncbi_blast_plus/ncbi_blastp_wrapper/2.10.1+galaxy2) %} and {% tool [BLASTx](toolshed.g2.bx.psu.edu/repos/devteam/ncbi_blast_plus/ncbi_blastx_wrapper/2.10.1+galaxy2) %} tools. +Since MAdLandDB is the collection of protein sequences, You can perform {% tool [BLASTp](toolshed.g2.bx.psu.edu/repos/devteam/ncbi_blast_plus/ncbi_blastp_wrapper/2.10.1+galaxy2) %} and {% tool [BLASTx](toolshed.g2.bx.psu.edu/repos/devteam/ncbi_blast_plus/ncbi_blastx_wrapper/2.10.1+galaxy2) %} tools. > Similarity search against MAdLand Database > @@ -80,9 +80,7 @@ MAdLandDB is a protein database comprising of a comprehensive collection of full ## Blast output -> {% icon tool %} The BLAST output will be in tabular format (you can select the desired output format from the drop down menu) and include the following fields : - -> +{% icon tool %} The BLAST output will be in tabular format (you can select the desired output format from the drop down menu) and include the following fields : | Column | NCBI name | Description | |-------|------------|-------------| @@ -99,7 +97,7 @@ MAdLandDB is a protein database comprising of a comprehensive collection of full | 11 | evalue | Expectation value (E-value) | | 12 | bitscore | Bit score | -> The fields are separated by tabs, and each row represents a single hit. For more details for BLAST analysis and output, we recommand you to follow the [Similarity-searches-blast](https://training.galaxyproject.org/training-material/topics/genome-annotation/tutorials/genome-annotation/tutorial.html#similarity-searches-blast) tutorial. +The fields are separated by tabs, and each row represents a single hit. For more details for BLAST analysis and output, we recommand you to follow the [Similarity-searches-blast](https://training.galaxyproject.org/training-material/topics/genome-annotation/tutorials/genome-annotation/tutorial.html#similarity-searches-blast) tutorial. > Further Reading about BLAST Tools in Galaxy diff --git a/topics/single-cell/tutorials/bulk-music-4-compare/tutorial.md b/topics/single-cell/tutorials/bulk-music-4-compare/tutorial.md index 54a2025be6725..c94c3505b1e57 100644 --- a/topics/single-cell/tutorials/bulk-music-4-compare/tutorial.md +++ b/topics/single-cell/tutorials/bulk-music-4-compare/tutorial.md @@ -262,7 +262,7 @@ Let's try one more inference - this time, we'll use only healthy cells as a refe > > > > > -> > > > ![Three graphs showing two rows for each cell type (gamma, ductal, delta, beta, alpha, and acinar cells) comparison normal or T2D proportions by either read or by sample, with the top graph labelled #altogether; the middle labelled #like4like; and the bottom labelled #healthyscref. Differences are most pronounced in the bottom #healthyscref graph.](../../images/bulk-music/compare_3.png "The impact of the single cell reference") +> > ![Three graphs showing two rows for each cell type (gamma, ductal, delta, beta, alpha, and acinar cells) comparison normal or T2D proportions by either read or by sample, with the top graph labelled #altogether; the middle labelled #like4like; and the bottom labelled #healthyscref. Differences are most pronounced in the bottom #healthyscref graph.](../../images/bulk-music/compare_3.png "The impact of the single cell reference") > > > > 1. If using a like4like inference reduced the difference between the phenotype, aligning both phenotypes to the same (healthy) reference exacerbated them - there are even fewer beta cells in the output of this analysis. > > diff --git a/topics/single-cell/tutorials/scrna-case_monocle3-rstudio/tutorial.md b/topics/single-cell/tutorials/scrna-case_monocle3-rstudio/tutorial.md index 18ff68b6813cc..321e540a1b1b1 100644 --- a/topics/single-cell/tutorials/scrna-case_monocle3-rstudio/tutorial.md +++ b/topics/single-cell/tutorials/scrna-case_monocle3-rstudio/tutorial.md @@ -503,7 +503,7 @@ plot_cells(cds_clustered, genes=c('Il2ra','Cd8b1','Cd8a','Cd4','Itm2a','Aif1','H > > - `Itm2a` (T-mat): expressed in cluster 3 > > - `Aif1` (macrophages): barely anything here, minimal expression spread across the sample with some more cells in cluster 4 and 3 – not enough to form a distinct cluster though). In theory, we shouldn’t have any macrophages in our sample. If you remember from the previous tutorials, we actually filtered out macrophages from the sample during the processing step, because we worked on annotated data. When analysing unannotated data, we could only assign macrophages and then filter them out, provided that Monocle clusters them into a separate group. As you can see, it’s not the case here, so we will just carry on with the analysis, interpreting this as a contamination. > > - `Hba-a1` (RBC): appears throughout the entire sample in low numbers suggesting some background contamination of red blood cell debris in the cell samples during library generation, but also shows higher expression in a distinct tiny bit of cluster 3, at the border between clusters 1 and 5. However, it’s too small to be clustered into a separate group and filtered out in this case. -If you remember, this gene was found to be expressed in the previous Scanpy tutorial also in low numbers across the sample, and in the other Monocle tutorial (using Galaxy tools and annotated data) algorithms allowed us to gather the cells expressing that gene into a distinct group. Our result now sits somewhere in between. +> > If you remember, this gene was found to be expressed in the previous Scanpy tutorial also in low numbers across the sample, and in the other Monocle tutorial (using Galaxy tools and annotated data) algorithms allowed us to gather the cells expressing that gene into a distinct group. Our result now sits somewhere in between. > > ![In Scanpy graph the marker gene appears throughout the entire sample in low numbers, in Monocle in Galaxy cells expressing hemoglobin gene were grouped into a small branch of DP-M4, allowing to group those cells. Monocle in RStudio graph is somewhere in between showing mostly low expression across the sample, but also having a tiny bit of grouped cells, less distinct than in Galaxy though.](../../images/scrna-casestudy-monocle/hb_all.png "Hemoglobin across clusters - comparison between Scanpy, Monocle using Galaxy tools and Monocle run in RStudio.") > > > {: .solution} diff --git a/topics/single-cell/tutorials/scrna-case_monocle3-trajectories/tutorial.md b/topics/single-cell/tutorials/scrna-case_monocle3-trajectories/tutorial.md index 5585fbdd9dada..d23edc73e0b1f 100644 --- a/topics/single-cell/tutorials/scrna-case_monocle3-trajectories/tutorial.md +++ b/topics/single-cell/tutorials/scrna-case_monocle3-trajectories/tutorial.md @@ -483,9 +483,9 @@ In the mentioned tutorial, we annotated the cells so that we know what type they ## Clustering Don't get confused - we haven't clustered our cells yet, for now we have only plotted them based on cell type annotation. Now it's time to create clusters, which - in an ideal world where all computation picks up the exact biological phenomenons - would yield the same areas as the clusters determined by the Scanpy algorithms. Is this the case here? Do Monocle and Scanpy identify the same clusters? -> + Monocle uses a technique called "community detection" ({% cite Traag_2019 %}) to group cells. This approach was introduced by {% cite Levine_2015 %} as part of the phenoGraph algorithm. -> + Monocle also divides the cells into larger, more well separated groups called partitions, using a statistical test from {% cite Wolf_2019 %}, introduced as part of their [PAGA](https://github.com/theislab/paga) algorithm. > Clusters vs partitions @@ -540,8 +540,9 @@ If we compare the annotated cell types and the clusters that were just formed, w ## Gene expression -> We haven't looked at gene expression yet! This step is particularly important when working with data which is not annotated. Then, based on the expression of marker genes, you are able to identify which clusters correspond to which cell types. This is indeed what we did in the previous tutorial using scanpy. We can do the same using Monocle3! Since we work on annotated data, we can directly check if the expressed genes actually correspond to the previously assigned cell types. If they do, that’s great - if two different methods are consistent, that gives us more confidence that our results are valid. -> Below is the table that we used in the previous tutorial to identify the cell types. + +We haven't looked at gene expression yet! This step is particularly important when working with data which is not annotated. Then, based on the expression of marker genes, you are able to identify which clusters correspond to which cell types. This is indeed what we did in the previous tutorial using scanpy. We can do the same using Monocle3! Since we work on annotated data, we can directly check if the expressed genes actually correspond to the previously assigned cell types. If they do, that’s great - if two different methods are consistent, that gives us more confidence that our results are valid. +Below is the table that we used in the previous tutorial to identify the cell types. | Marker | Cell type | |--------------------| @@ -646,9 +647,9 @@ We’re getting closer and closer! The next step is to learn the trajectory grap {: .hands_on} As you can see, the learned trajectory path is just a line connecting the clusters. However, there are some important points to understand here. -> If the resolution of the clusters is high, then the trajectory path will be very meticulous, strongly branched and curved. There's a danger here that we might start seeing things that don't really exist. -> You can set an option to learn a single tree structure for all the partitions or use the partitions calculated when clustering and identify disjoint graphs in each. To make the right decision, you have to understand how/if the partitions are related and what would make more biolgical sense. In our case, we were only interested in a big partition containing all the cells and we ignored the small 'dot' classified as another partition. -> There are many trajectory patterns: linear, cycle, bifurcation, tree and so on. Those patterns might correspond to various biological processes: transition events for different phases, cell cycle, cell differentiation. Therefore, branching points are quite important on the trajectory path. You can always plot them, {% icon history-share %} checking the correct box in {% tool Monocle3 plotCells %}. +If the resolution of the clusters is high, then the trajectory path will be very meticulous, strongly branched and curved. There's a danger here that we might start seeing things that don't really exist. +You can set an option to learn a single tree structure for all the partitions or use the partitions calculated when clustering and identify disjoint graphs in each. To make the right decision, you have to understand how/if the partitions are related and what would make more biolgical sense. In our case, we were only interested in a big partition containing all the cells and we ignored the small 'dot' classified as another partition. +There are many trajectory patterns: linear, cycle, bifurcation, tree and so on. Those patterns might correspond to various biological processes: transition events for different phases, cell cycle, cell differentiation. Therefore, branching points are quite important on the trajectory path. You can always plot them, {% icon history-share %} checking the correct box in {% tool Monocle3 plotCells %}. ![A trajectory path, branching out to connect all the clusters and thus show their relationships.](../../images/scrna-casestudy-monocle/learned_trajectory.png "Learned trajectory path") @@ -700,7 +701,7 @@ Finally, it's time to see our cells in pseudotime! We have already learned a tra {: .tip} Now we can see how all our hard work has come together to give a final pseudotime trajectory analysis. DN cells gently switching to DP-M which change into DP-L to finally become mature T-cells. Isn't it beautiful? But wait, don't be too enthusiastic - why on earth DP-M1 group branches out? We didn't expect that... What could that mean? -> + There are a lot of such questions in bioinformatics, and we're always get excited to try to answer them. However, with analysing scRNA-seq data, it's almost like you need to know about 75% of your data to make sure that your analysis is reasonable, before you can identify the 25% new information. Additionally, pseudotime analysis crucially depends on choosing the right analysis and parameter values, as we showed for example with initial dimensionality reduction during pre-processing. The outputs here, at least in our hands, are more sensitive to parameter choice than standard clustering analysis with Scanpy. ![Pseudotime plot, showing the development of T-cells – starting in dark blue on DN cells and ending up on mature T-cells, marked in yellow on pseudotime scale and (going in the opposite direction) DP-M1 branch which is marked in light orange.](../../images/scrna-casestudy-monocle/pseudotime.png "Trajectory analysis - pseudotime") diff --git a/topics/single-cell/tutorials/scrna-umis/tutorial.md b/topics/single-cell/tutorials/scrna-umis/tutorial.md index 9ee242fafbcd0..27da075a1d8ed 100644 --- a/topics/single-cell/tutorials/scrna-umis/tutorial.md +++ b/topics/single-cell/tutorials/scrna-umis/tutorial.md @@ -179,7 +179,7 @@ This then provides us with the true count of the number of true transcripts for > > > > > -> 1. Yes, UMIs are not specific to genes and the same UMI barcode can tag the transcripts of different genes. UMIs are not universal tags, they are just 'added randomness' that help reduce amplification bias. +> > 1. Yes, UMIs are not specific to genes and the same UMI barcode can tag the transcripts of different genes. UMIs are not universal tags, they are just 'added randomness' that help reduce amplification bias. > > 2. Yes, UMIs are not precise but operate probabilistically. In most cases, two transcripts of the same gene will be tagged by different UMIs. In rarer (but still prevalent) cases, the same UMI will capture different transcripts of the same gene. > > * One helpful way to think about how quantification is performed is to observe the following hierarchy of data `Cell Barcode → Gene → UMI` > > diff --git a/topics/statistics/tutorials/aberrant_pi3k_pathway_analysis/tutorial.md b/topics/statistics/tutorials/aberrant_pi3k_pathway_analysis/tutorial.md index 18d1e91ef4c5f..a288d317dd1cf 100644 --- a/topics/statistics/tutorials/aberrant_pi3k_pathway_analysis/tutorial.md +++ b/topics/statistics/tutorials/aberrant_pi3k_pathway_analysis/tutorial.md @@ -50,7 +50,8 @@ In this tutorial we plan to measure aberrant PI3K pathway activity in TCGA datas {: .agenda} # **Pre-installed tutorial tools, datasets and workflows from the docker image** -> An efficient way to install and run the tutorial using papaa tools is available on docker based galaxy instance that has pre-installed papaa tool-suite as **papaa** under tools section. Additionally this local galaxy instance comes with datasets and workflow for generating PI3K_OG classifier. Instructions to run the docker image is below. + +An efficient way to install and run the tutorial using papaa tools is available on docker based galaxy instance that has pre-installed papaa tool-suite as **papaa** under tools section. Additionally this local galaxy instance comes with datasets and workflow for generating PI3K_OG classifier. Instructions to run the docker image is below. > Tutorial for galaxy docker container installation and running the workflow: > 1. Pulling the docker image from docker hub: Open a terminal and type the following command: @@ -115,29 +116,29 @@ In this tutorial we plan to measure aberrant PI3K pathway activity in TCGA datas > Datasets descriptions > -- **pancan_rnaseq_freeze.tsv:** Publicly available gene expression data for the TCGA Pan-cancer dataset. This file has gene-expression data for ~20,000 genes (columns) in ~10,000 samples (rows). -- **pancan_mutation_freeze.tsv:** Publicly available Mutational information for TCGA Pan-cancer dataset. This file has mutational data for all genes (columns) as binary valued (0/1) in all samples (rows). -- **mutation_burden_freeze.tsv:** Publicly available Mutational information for TCGA Pan-cancer dataset. This file has mutational burden information for all samples(rows). -- **sample_freeze.tsv:** The file lists the frozen samples as determined by TCGA PanCancer Atlas consortium along with raw RNA-Seq and mutation data. These were previously determined and included for all downstream analysis All other datasets were processed and subset according to the frozen samples. -- **cosmic_cancer_classification.tsv:** Compendium of OG and TSG used for the analysis. This file has list of cancer genes(rows) from [cosmic database](https://cancer.sanger.ac.uk/cosmic) classified as Oncogene or tumor suppressor (columns). -- **CCLE_DepMap_18Q1_maf_20180207.txt:** Publicly available Mutational data for CCLE cell lines from Broad Institute Cancer Cell Line Encyclopedia [CCLE](https://portals.broadinstitute.org/ccle)/[DepMap Portal](https://depmap.org/portal/). Variant classification along with nucleotide and protein level changes are provided in the columns for genes(rows). -- **ccle_rnaseq_genes_rpkm_20180929_mod.tsv:** Publicly available Expression data for 1,019 cell lines (RPKM) from Broad Institute Cancer Cell Line Encyclopedia (CCLE) / DepMap Portal. This file has gene-expression data for genes(rows) in various cell lines (columns). -- **CCLE_MUT_CNA_AMP_DEL_binary_Revealer.tsv:** Publicly available merged Mutational and copy number alterations that include gene amplifications and deletions for the CCLE cell lines. This file has mutational/copy number variation data for all cancer genes (rows) as binary valued (0/1) in all CCLE cell lines (columns). -- **GDSC_cell_lines_EXP_CCLE_names.tsv:** Publicly available RMA normalized expression data for Genomics of Drug Sensitivity in Cancer [GDSC](https://www.cancerrxgene.org/) cell-lines. This data was subset to 382 cell lines that are common among CCLE and GDSC. This file has gene-expression data for genes(rows) in various cell lines (columns). -- **GDSC_CCLE_common_mut_cnv_binary.tsv:** A subset of merged Mutational and copy number alterations that include gene amplifications and deletions for common cell lines between GDSC and CCLE. -- **gdsc1_ccle_pharm_fitted_dose_data.txt:** Pharmacological data for GDSC-1 cell lines. This data was subset to 379 cell lines that are common among CCLE and GDSC. This file has pharmacological data like IC50, Z-scores, drug information, concentrations used, AUC(columns) of 304 tested compounds in various cell-lines(rows). -- **gdsc2_ccle_pharm_fitted_dose_data.txt:** Pharmacological data for GDSC-2 cell lines. This data was subset to 347 cell lines that are common among CCLE and GDSC. This file has pharmacological data like IC50, Z-scores, drug information, concentrations used, AUC(columns) of 170 tested compounds in various cell-lines(rows). -- **compounds_of_interest.txt:** This file contains the compounds of interest for generation of pharmacological correlations with classifier scores. List of inhibitor compounds against EGFR-signaling, ERK-MAPK-signaling, Other-kinases, PI3K/MTOR-signaling, and RTK-signaling pathways. -- **tcga_dictonary.tsv:** List of cancer types used in the analysis. -- **GSE69822_pi3k_sign.txt:** File with values assigned for tumor [1] or normal [-1] for external data samples deposited in Gene Expression Omnibus database accession:[GSE69822](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE69822). -- **GSE69822_pi3k_trans.csv:** Variant stabilized transformed values for the RNA expression levels in the external samples from [GSE69822](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE69822). -- **path_rtk_ras_pi3k_genes.txt:** List of genes belong to RTK,RAS,PI3K used in our study. -- **path_myc_genes.txt:** List of genes belong to Myc signaling pathway [Sanchez Vega et.al 2018](https://www.cell.com/action/showPdf?pii=S0092-8674%2818%2930359-3). -- **path_ras_genes.txt:** List of genes belong to Ras signaling Pathway [Sanchez Vega et.al 2018](https://www.cell.com/action/showPdf?pii=S0092-8674%2818%2930359-3). -- **path_cell_cycle_genes.txt:** List of genes belong to cell cycle pathway [Sanchez Vega et.al 2018](https://www.cell.com/action/showPdf?pii=S0092-8674%2818%2930359-3). -- **path_wnt_genes.txt:** List of genes belong to wnt signaling pathway [Sanchez Vega et.al 2018](https://www.cell.com/action/showPdf?pii=S0092-8674%2818%2930359-3). -- **GSE94937_rpkm_kras.csv:** RNA expression levels in the external samples from [GSE94937](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE94937). -- **GSE94937_kras_sign.txt:** File with values assigned for tumor [1] or normal [-1] for external data samples deposited in Gene Expression Omnibus database accession: [GSE94937](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE94937). +> - **pancan_rnaseq_freeze.tsv:** Publicly available gene expression data for the TCGA Pan-cancer dataset. This file has gene-expression data for ~20,000 genes (columns) in ~10,000 samples (rows). +> - **pancan_mutation_freeze.tsv:** Publicly available Mutational information for TCGA Pan-cancer dataset. This file has mutational data for all genes (columns) as binary valued (0/1) in all samples (rows). +> - **mutation_burden_freeze.tsv:** Publicly available Mutational information for TCGA Pan-cancer dataset. This file has mutational burden information for all samples(rows). +> - **sample_freeze.tsv:** The file lists the frozen samples as determined by TCGA PanCancer Atlas consortium along with raw RNA-Seq and mutation data. These were previously determined and included for all downstream analysis All other datasets were processed and subset according to the frozen samples. +> - **cosmic_cancer_classification.tsv:** Compendium of OG and TSG used for the analysis. This file has list of cancer genes(rows) from [cosmic database](https://cancer.sanger.ac.uk/cosmic) classified as Oncogene or tumor suppressor (columns). +> - **CCLE_DepMap_18Q1_maf_20180207.txt:** Publicly available Mutational data for CCLE cell lines from Broad Institute Cancer Cell Line Encyclopedia [CCLE](https://portals.broadinstitute.org/ccle)/[DepMap Portal](https://depmap.org/portal/). Variant classification along with nucleotide and protein level changes are provided in the columns for genes(rows). +> - **ccle_rnaseq_genes_rpkm_20180929_mod.tsv:** Publicly available Expression data for 1,019 cell lines (RPKM) from Broad Institute Cancer Cell Line Encyclopedia (CCLE) / DepMap Portal. This file has gene-expression data for genes(rows) in various cell lines (columns). +> - **CCLE_MUT_CNA_AMP_DEL_binary_Revealer.tsv:** Publicly available merged Mutational and copy number alterations that include gene amplifications and deletions for the CCLE cell lines. This file has mutational/copy number variation data for all cancer genes (rows) as binary valued (0/1) in all CCLE cell lines (columns). +> - **GDSC_cell_lines_EXP_CCLE_names.tsv:** Publicly available RMA normalized expression data for Genomics of Drug Sensitivity in Cancer [GDSC](https://www.cancerrxgene.org/) cell-lines. This data was subset to 382 cell lines that are common among CCLE and GDSC. This file has gene-expression data for genes(rows) in various cell lines (columns). +> - **GDSC_CCLE_common_mut_cnv_binary.tsv:** A subset of merged Mutational and copy number alterations that include gene amplifications and deletions for common cell lines between GDSC and CCLE. +> - **gdsc1_ccle_pharm_fitted_dose_data.txt:** Pharmacological data for GDSC-1 cell lines. This data was subset to 379 cell lines that are common among CCLE and GDSC. This file has pharmacological data like IC50, Z-scores, drug information, concentrations used, AUC(columns) of 304 tested compounds in various cell-lines(rows). +> - **gdsc2_ccle_pharm_fitted_dose_data.txt:** Pharmacological data for GDSC-2 cell lines. This data was subset to 347 cell lines that are common among CCLE and GDSC. This file has pharmacological data like IC50, Z-scores, drug information, concentrations used, AUC(columns) of 170 tested compounds in various cell-lines(rows). +> - **compounds_of_interest.txt:** This file contains the compounds of interest for generation of pharmacological correlations with classifier scores. List of inhibitor compounds against EGFR-signaling, ERK-MAPK-signaling, Other-kinases, PI3K/MTOR-signaling, and RTK-signaling pathways. +> - **tcga_dictonary.tsv:** List of cancer types used in the analysis. +> - **GSE69822_pi3k_sign.txt:** File with values assigned for tumor [1] or normal [-1] for external data samples deposited in Gene Expression Omnibus database accession:[GSE69822](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE69822). +> - **GSE69822_pi3k_trans.csv:** Variant stabilized transformed values for the RNA expression levels in the external samples from [GSE69822](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE69822). +> - **path_rtk_ras_pi3k_genes.txt:** List of genes belong to RTK,RAS,PI3K used in our study. +> - **path_myc_genes.txt:** List of genes belong to Myc signaling pathway [Sanchez Vega et.al 2018](https://www.cell.com/action/showPdf?pii=S0092-8674%2818%2930359-3). +> - **path_ras_genes.txt:** List of genes belong to Ras signaling Pathway [Sanchez Vega et.al 2018](https://www.cell.com/action/showPdf?pii=S0092-8674%2818%2930359-3). +> - **path_cell_cycle_genes.txt:** List of genes belong to cell cycle pathway [Sanchez Vega et.al 2018](https://www.cell.com/action/showPdf?pii=S0092-8674%2818%2930359-3). +> - **path_wnt_genes.txt:** List of genes belong to wnt signaling pathway [Sanchez Vega et.al 2018](https://www.cell.com/action/showPdf?pii=S0092-8674%2818%2930359-3). +> - **GSE94937_rpkm_kras.csv:** RNA expression levels in the external samples from [GSE94937](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE94937). +> - **GSE94937_kras_sign.txt:** File with values assigned for tumor [1] or normal [-1] for external data samples deposited in Gene Expression Omnibus database accession: [GSE94937](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE94937). {:.details} # PanCancer aberrant pathway activity analysis (PAPAA) @@ -155,11 +156,9 @@ Where *alpha* and *l* are regularization and elastic net mixing hyperparameters ***Sample Processing step:*** -- **x-matrix:** - > Gene-expression data comprises of expression levels for ~20,000 genes/sample and ~10,000 samples. Top 8,000 highly variable genes per sample with in each disease were measured by median absolute deviation (MAD) and considered for analysis. +- **x-matrix:**: Gene-expression data comprises of expression levels for ~20,000 genes/sample and ~10,000 samples. Top 8,000 highly variable genes per sample with in each disease were measured by median absolute deviation (MAD) and considered for analysis. -- **y-matrix:** - > Copy number and mutational data as binary valued (0/1) datasets for all samples. This matrix is subset to given pathway target genes and cancer types. +- **y-matrix:**: Copy number and mutational data as binary valued (0/1) datasets for all samples. This matrix is subset to given pathway target genes and cancer types. We then randomly held out 10% of the samples to create a test set and rest 90% for training. The testing set is used as the validation to evaluate the performance of any machine learning algorithm and the remaining parts are used for learning/training. The training set is balanced for different cancer-types and PI3K status. diff --git a/topics/statistics/tutorials/gpu_jupyter_lab/tutorial.md b/topics/statistics/tutorials/gpu_jupyter_lab/tutorial.md index 1160f46cecc5e..cd1960b3c3c3c 100644 --- a/topics/statistics/tutorials/gpu_jupyter_lab/tutorial.md +++ b/topics/statistics/tutorials/gpu_jupyter_lab/tutorial.md @@ -111,9 +111,9 @@ To use Git version control for cloning any codebase from GitHub, the following s > 1. Create a new folder named `covid_ct_segmentation` alongside other folders such as "data", "outputs", "elyra" or you can use your favourite folder name. > 2. Inside the created folder, clone a code repository by clicking on "Git" icon as shown in Figure 6. > 3. In the shown popup, provide the repository path as shown below and then, click on "clone": -> > ``` -> > https://github.com/anuprulez/gpu_jupyterlab_ct_image_segmentation -> > ``` +> ``` +> https://github.com/anuprulez/gpu_jupyterlab_ct_image_segmentation +> ``` > 4. The repository "anuprulez/gpu_jupyterlab_ct_image_segmentation" gets immediately cloned. > 5. Move inside the created folder `gpu_jupyterlab_ct_image_segmentation`. A few notebooks can be found inside that are numbered. > ![Clone repository](../../images/git_clone.png "Clone a code repository using Git") @@ -171,7 +171,7 @@ The training task completed in the notebook above can also be sent to a Galaxy c > > > {: .comment} > -> > ![Galaxy history](../../images/finished_history_remote_ai.png "Galaxy history showing finished datasets after remote training on a Galaxy cluster") +> ![Galaxy history](../../images/finished_history_remote_ai.png "Galaxy history showing finished datasets after remote training on a Galaxy cluster") > > **Note**: The training may take longer depending on how busy Galaxy's queueing is as it sends the training task to be done on a Galaxy cluster. Therefore, this feature should be used when the training task is expected to run for several hours. The training time is higher because a large Docker container is downloaded on the assigned cluster and only then, the training task can proceed. > @@ -211,7 +211,7 @@ In this mode, the GPU Jupyterlab tool executes the input `ipynb` file and produc When the parameter `Execute notebook and return a new one` is set to `yes`, the GPU Jupyterlab tool can be used as a part of any workflow. In this mode, it requires an `ipynb` file/notebook that gets executed in Galaxy and output datasets if any become available in the Galaxy history. Along with a notebook, multiple input datasets can also be attached that become automatically available inside the notebook. They can be accessed inside the notebook and processed to produce desired output datasets. These output datasets can further be used with other Galaxy tools. The following image shows a sample workflow for illustration purposes. Similarly, high-quality workflows to analyse scientific datasets can be created. -> !["A sample Galaxy workflow that uses GPU Jupyterlab as a tool"](../../images/workflow_gpu_jupyterlab.png "A sample Galaxy workflow that uses GPU Jupyterlab as a tool which takes input datasets from one tool, trains a machine learning model to predict classes and then the predicted datasets is used as input to another Galaxy tool.") +!["A sample Galaxy workflow that uses GPU Jupyterlab as a tool"](../../images/workflow_gpu_jupyterlab.png "A sample Galaxy workflow that uses GPU Jupyterlab as a tool which takes input datasets from one tool, trains a machine learning model to predict classes and then the predicted datasets is used as input to another Galaxy tool.") Let's look at how can this workflow be created in a step-wise manner. There are 3 steps - first, the training dataset is filtered using the `Filter` tool. The output of this tool along with 2 other datasets (`test_rows` and `test_rows_labels`), a sample IPython notebook is executed by the GPU Jupyterlab tool. The sample IPython notebook trains a simple machine learning model using the train dataset and creates a classification model using `RandomForestClassifier`. The trained model is then used to predict classes using the test dataset. The predicted classes is produced as a file in an output collection by the GPU Jupyterlab tool. As a last step, `Cut` tool is used to extract the first column of the output collection. Together, these steps showcase how the GPU Jupyterlab tool is used with other Galaxy tools in a workflow. diff --git a/topics/transcriptomics/tutorials/ref-based/tutorial.md b/topics/transcriptomics/tutorials/ref-based/tutorial.md index 42f7ccc306f50..ffaf5e41f4cfd 100644 --- a/topics/transcriptomics/tutorials/ref-based/tutorial.md +++ b/topics/transcriptomics/tutorials/ref-based/tutorial.md @@ -1615,7 +1615,7 @@ For more information about **DESeq2** and its outputs, you can have a look at th > > The log2 fold-change is negative so it is indeed downregulated and the adjusted p-value is below 0.05 so it is part of the significantly changed genes. > > > > 3. DESeq2 in Galaxy returns the comparison between the different levels for the 1st factor, after -correction for the variability due to the 2nd factor. In our current case, treated against untreated for any sequencing type. To compare sequencing types, we should run DESeq2 again switching factors: factor 1 (treatment) becomes factor 2 and factor 2 (sequencing) becomes factor 1. +> > correction for the variability due to the 2nd factor. In our current case, treated against untreated for any sequencing type. To compare sequencing types, we should run DESeq2 again switching factors: factor 1 (treatment) becomes factor 2 and factor 2 (sequencing) becomes factor 1. > > 4. To add the interaction between two factors (e.g. treated for paired-end data vs untreated for single-end), we should run DESeq2 another time but with only one factor with the following 4 levels: > > - treated-PE > > - untreated-PE diff --git a/topics/transcriptomics/tutorials/rna-seq-bash-star-align/tutorial.md b/topics/transcriptomics/tutorials/rna-seq-bash-star-align/tutorial.md index 8e1c4f4a96ac6..ee3df320a5be6 100644 --- a/topics/transcriptomics/tutorials/rna-seq-bash-star-align/tutorial.md +++ b/topics/transcriptomics/tutorials/rna-seq-bash-star-align/tutorial.md @@ -55,7 +55,6 @@ Each sample constitutes a separate biological replicate of the corresponding con > > > This tutorial is significantly based on Galaxy's ["Reference-based RNA-Seq data analysis"]({% link topics/transcriptomics/tutorials/ref-based/tutorial.md %}) tutorial. - > {: .comment} @@ -96,7 +95,7 @@ The "Data Upload" process is the only one in this tutorial that takes place dire > > > {: .comment} > -> >Change the datatype from `fastqsanger` to `fastq`. +> Change the datatype from `fastqsanger` to `fastq`. > > {% snippet faqs/galaxy/datasets_change_datatype.md datatype="fastq" %} > @@ -189,7 +188,7 @@ Sequence quality control is therefore an essential first step in your analysis. > > ``` > {: .code-in} > The same trimming procedure should take place for the second pair of reads (forward and reverse as above). After that, the files we are going to work with are the ones located in the **trimmedData** folder (4 in our case). - +> {: .hands_on} > FastQC on trimmed data @@ -227,7 +226,6 @@ The alignment process consists of two steps: > Our first step is to index the reference genome for use by STAR. Indexing allows the aligner to quickly find potential alignment sites for query sequences in a genome, which saves time during alignment. Indexing the reference only has to be run once. The only reason you would want to create a new index is if you are working with a different reference genome or you are using a different tool for alignment. > > > Indexing with `STAR` - > > ```bash > > $ mkdir index > > $ STAR --runThreadN 16 --runMode genomeGenerate --genomeDir ~/index --genomeFastaFiles /import/14 --sjdbGTFfile /import/15 --sjdbOverhang 100 --genomeSAindexNbases 12 diff --git a/topics/transcriptomics/tutorials/rna-seq-viz-with-volcanoplot-r/tutorial.md b/topics/transcriptomics/tutorials/rna-seq-viz-with-volcanoplot-r/tutorial.md index 3467c17847ba6..c538a20242ae1 100644 --- a/topics/transcriptomics/tutorials/rna-seq-viz-with-volcanoplot-r/tutorial.md +++ b/topics/transcriptomics/tutorials/rna-seq-viz-with-volcanoplot-r/tutorial.md @@ -298,9 +298,9 @@ We'll make the points a bit smaller. We'll change to 0.5. > > > > > > We could use `alpha =`. For example -> ```R -> geom_point(aes(colour = sig), alpha = 0.5) -> ``` +> > ```R +> > geom_point(aes(colour = sig), alpha = 0.5) +> > ``` > > > {: .solution} {: .question} @@ -336,9 +336,9 @@ We'll make the font size of the labels a bit smaller. > > > > > > We could change the 10 to 20 here -> ```R -> top <- slice_min(results, order_by = pvalue, n = 20) -> ``` +> > ```R +> > top <- slice_min(results, order_by = pvalue, n = 20) +> > ``` > > > {: .solution} {: .question} diff --git a/topics/variant-analysis/tutorials/sars-cov-2/tutorial.md b/topics/variant-analysis/tutorials/sars-cov-2/tutorial.md index b6528e96a9b70..5fe9c7d5b4539 100644 --- a/topics/variant-analysis/tutorials/sars-cov-2/tutorial.md +++ b/topics/variant-analysis/tutorials/sars-cov-2/tutorial.md @@ -141,7 +141,7 @@ SRA can be reached either directly through it's website, or through the tool pan > > You may have noticed this text earlier when you were exploring Entrez search. This text only appears some of the time, when the number of search results falls within a fairly broad window. You won't see it if you only have a few results, and you won't see it if you have more results than the Run Selector can accept. > > > > *You need to get to Run Selector to send your results to Galaxy.* What if you don't have enough results to trigger this link being shown? In that case you call get to the Run Selector by **clicking** on the `Send to` pulldown menu at the top right of the results panel. To get to Run Selector, **select** `Run Selector` and then **click** the `Go` button. -> ![sra entrez send to](../../images/sra_entrez_send_to.png) +> > ![sra entrez send to](../../images/sra_entrez_send_to.png) > {: .tip} > > diff --git a/topics/variant-analysis/tutorials/somatic-variant-discovery/tutorial.md b/topics/variant-analysis/tutorials/somatic-variant-discovery/tutorial.md index 86a31d2208fdc..c8afcbb26de9e 100644 --- a/topics/variant-analysis/tutorials/somatic-variant-discovery/tutorial.md +++ b/topics/variant-analysis/tutorials/somatic-variant-discovery/tutorial.md @@ -318,9 +318,9 @@ However, because of the high average data quality, there was no need to perform > - *"Read group sample name (SM)"*: `Not available.` > - *"Platform/technology used to produce the reads (PL)"*: `ILLUNINA` > - *"Select analysis mode"*: `Simple illumina mode` - {: .hands_on} +{: .hands_on} - > Name the created list as **Mapping-lsit** +Name the created list as **Mapping-lsit** diff --git a/topics/variant-analysis/tutorials/tb-variant-analysis/tutorial.md b/topics/variant-analysis/tutorials/tb-variant-analysis/tutorial.md index ed331ccf3c6fa..54a4fc3b791c2 100644 --- a/topics/variant-analysis/tutorials/tb-variant-analysis/tutorial.md +++ b/topics/variant-analysis/tutorials/tb-variant-analysis/tutorial.md @@ -269,8 +269,8 @@ We still cannot entirely trust the proposed variants. In particular, there are r > variants predicted in the M. tuberculosis > genome, using multiple different strategies. > Firstly, certain regions of the Mtb genome -contain repetitive sequences, e.g. from -the PE/PPE gene family. Historically all of the genomic regions corresponding to +> contain repetitive sequences, e.g. from +> the PE/PPE gene family. Historically all of the genomic regions corresponding to > those genes were filtered out but > the new default draws on work from > Maximillian Marin and others. This @@ -278,12 +278,12 @@ the PE/PPE gene family. Historically all of the genomic regions corresponding to > regions is the current region filter in > TB Variant Filter for reads over 100 bp. > If you are using shorter reads (e.g. from Illumina iSeq) the "Refined Low Confidence and Low Mappability" region list should be used instead. ->For more on how these regions were calculated read the [paper](https://academic.oup.com/bioinformatics/article-abstract/38/7/1781/6502279?login=false) or [preprint](https://www.biorxiv.org/content/10.1101/2021.04.08.438862v3.full). +> For more on how these regions were calculated read the [paper](https://academic.oup.com/bioinformatics/article-abstract/38/7/1781/6502279?login=false) or [preprint](https://www.biorxiv.org/content/10.1101/2021.04.08.438862v3.full). > > In addition to region filters, filters for variant type, allele frequency, coverage depth and distance from indels are provided. > Older variant callers struggled to accurately > call insertions and deletions (indels) but more recent tools (e.g. GATK v4 and the variant caller used in Snippy, Freebayes) no longer have this weakness. One remaining reason to filter SNVs/SNPs near indels is that they might have a different -evolutionary history to "free standing" SNVs/SNPs, so the "close to indel filter" is still available in TB Variant Filter in case such SNPs/SNVs should be filtered out. +> evolutionary history to "free standing" SNVs/SNPs, so the "close to indel filter" is still available in TB Variant Filter in case such SNPs/SNVs should be filtered out. {: .details} Now that we have a collection of *high quality variants* we can search them against variants known to be associated with drug resistance. The *TB Profiler* tool does this using a database of variants curated by Dr Jody Phelan at the London School of Hygiene and Tropical Medicine. It can do its own mapping and variant calling but also accepts mapped reads in BAM format as input. It does its own variant calling and filtering. diff --git a/topics/visualisation/tutorials/circos/tutorial.md b/topics/visualisation/tutorials/circos/tutorial.md index 1d8fd7f261514..601da8188a390 100644 --- a/topics/visualisation/tutorials/circos/tutorial.md +++ b/topics/visualisation/tutorials/circos/tutorial.md @@ -531,7 +531,7 @@ You should see a plot like: > Background: Chromothripsis > > **Chromothripsis** is a phenomenon whereby (part of) a chromosome is shattered in a single catastrophic event, and subsequently imprecisely stitched -together by the cell's repair mechanisms. This leads to a huge number of SV junctions. +> together by the cell's repair mechanisms. This leads to a huge number of SV junctions. > > ![Chromothripsis](../../images/circos/chromothripsis.png "Chromothripsis is a scattering of the DNA, followed by an imprecise repair process, leading to many structural rearrangements."){: width="60%"} >