@@ -948,7 +948,7 @@ To identify variants, we
>
> [__LoFreq filter__](https://csb5.github.io/lofreq/) can be also used instead, both tools performs equal and fast results.
- {: .comment-on}
+ {: .comment}
diff --git a/topics/sequence-analysis/tutorials/ncbi-blast-against-the-madland/tutorial.md b/topics/sequence-analysis/tutorials/ncbi-blast-against-the-madland/tutorial.md
index 04f9f0ad8d241..d470a281e8922 100644
--- a/topics/sequence-analysis/tutorials/ncbi-blast-against-the-madland/tutorial.md
+++ b/topics/sequence-analysis/tutorials/ncbi-blast-against-the-madland/tutorial.md
@@ -58,11 +58,11 @@ MAdLandDB is a protein database comprising of a comprehensive collection of full
>
{: .hands_on}
-> We just imported a FASTA file into Galaxy. Now, the next would be to perfrom the BLAST analysis against MAdLandDB.
+We just imported a FASTA file into Galaxy. Now, the next would be to perfrom the BLAST analysis against MAdLandDB.
## Perform NCBI Blast+ on Galaxy
-> Since MAdLandDB is the collection of protein sequences, You can perform {% tool [BLASTp](toolshed.g2.bx.psu.edu/repos/devteam/ncbi_blast_plus/ncbi_blastp_wrapper/2.10.1+galaxy2) %} and {% tool [BLASTx](toolshed.g2.bx.psu.edu/repos/devteam/ncbi_blast_plus/ncbi_blastx_wrapper/2.10.1+galaxy2) %} tools.
+Since MAdLandDB is the collection of protein sequences, You can perform {% tool [BLASTp](toolshed.g2.bx.psu.edu/repos/devteam/ncbi_blast_plus/ncbi_blastp_wrapper/2.10.1+galaxy2) %} and {% tool [BLASTx](toolshed.g2.bx.psu.edu/repos/devteam/ncbi_blast_plus/ncbi_blastx_wrapper/2.10.1+galaxy2) %} tools.
> Similarity search against MAdLand Database
>
@@ -80,9 +80,7 @@ MAdLandDB is a protein database comprising of a comprehensive collection of full
## Blast output
-> {% icon tool %} The BLAST output will be in tabular format (you can select the desired output format from the drop down menu) and include the following fields :
-
->
+{% icon tool %} The BLAST output will be in tabular format (you can select the desired output format from the drop down menu) and include the following fields :
| Column | NCBI name | Description |
|-------|------------|-------------|
@@ -99,7 +97,7 @@ MAdLandDB is a protein database comprising of a comprehensive collection of full
| 11 | evalue | Expectation value (E-value) |
| 12 | bitscore | Bit score |
-> The fields are separated by tabs, and each row represents a single hit. For more details for BLAST analysis and output, we recommand you to follow the [Similarity-searches-blast](https://training.galaxyproject.org/training-material/topics/genome-annotation/tutorials/genome-annotation/tutorial.html#similarity-searches-blast) tutorial.
+The fields are separated by tabs, and each row represents a single hit. For more details for BLAST analysis and output, we recommand you to follow the [Similarity-searches-blast](https://training.galaxyproject.org/training-material/topics/genome-annotation/tutorials/genome-annotation/tutorial.html#similarity-searches-blast) tutorial.
> Further Reading about BLAST Tools in Galaxy
diff --git a/topics/single-cell/tutorials/bulk-music-4-compare/tutorial.md b/topics/single-cell/tutorials/bulk-music-4-compare/tutorial.md
index 54a2025be6725..c94c3505b1e57 100644
--- a/topics/single-cell/tutorials/bulk-music-4-compare/tutorial.md
+++ b/topics/single-cell/tutorials/bulk-music-4-compare/tutorial.md
@@ -262,7 +262,7 @@ Let's try one more inference - this time, we'll use only healthy cells as a refe
>
> >
> >
-> > > > data:image/s3,"s3://crabby-images/7fc86/7fc862607ae0bcd18ff91f634d97fedb9ec7b36d" alt="Three graphs showing two rows for each cell type (gamma, ductal, delta, beta, alpha, and acinar cells) comparison normal or T2D proportions by either read or by sample, with the top graph labelled #altogether; the middle labelled #like4like; and the bottom labelled #healthyscref. Differences are most pronounced in the bottom #healthyscref graph."
+> > data:image/s3,"s3://crabby-images/7fc86/7fc862607ae0bcd18ff91f634d97fedb9ec7b36d" alt="Three graphs showing two rows for each cell type (gamma, ductal, delta, beta, alpha, and acinar cells) comparison normal or T2D proportions by either read or by sample, with the top graph labelled #altogether; the middle labelled #like4like; and the bottom labelled #healthyscref. Differences are most pronounced in the bottom #healthyscref graph."
> >
> > 1. If using a like4like inference reduced the difference between the phenotype, aligning both phenotypes to the same (healthy) reference exacerbated them - there are even fewer beta cells in the output of this analysis.
> >
diff --git a/topics/single-cell/tutorials/scrna-case_monocle3-rstudio/tutorial.md b/topics/single-cell/tutorials/scrna-case_monocle3-rstudio/tutorial.md
index 18ff68b6813cc..321e540a1b1b1 100644
--- a/topics/single-cell/tutorials/scrna-case_monocle3-rstudio/tutorial.md
+++ b/topics/single-cell/tutorials/scrna-case_monocle3-rstudio/tutorial.md
@@ -503,7 +503,7 @@ plot_cells(cds_clustered, genes=c('Il2ra','Cd8b1','Cd8a','Cd4','Itm2a','Aif1','H
> > - `Itm2a` (T-mat): expressed in cluster 3
> > - `Aif1` (macrophages): barely anything here, minimal expression spread across the sample with some more cells in cluster 4 and 3 – not enough to form a distinct cluster though). In theory, we shouldn’t have any macrophages in our sample. If you remember from the previous tutorials, we actually filtered out macrophages from the sample during the processing step, because we worked on annotated data. When analysing unannotated data, we could only assign macrophages and then filter them out, provided that Monocle clusters them into a separate group. As you can see, it’s not the case here, so we will just carry on with the analysis, interpreting this as a contamination.
> > - `Hba-a1` (RBC): appears throughout the entire sample in low numbers suggesting some background contamination of red blood cell debris in the cell samples during library generation, but also shows higher expression in a distinct tiny bit of cluster 3, at the border between clusters 1 and 5. However, it’s too small to be clustered into a separate group and filtered out in this case.
-If you remember, this gene was found to be expressed in the previous Scanpy tutorial also in low numbers across the sample, and in the other Monocle tutorial (using Galaxy tools and annotated data) algorithms allowed us to gather the cells expressing that gene into a distinct group. Our result now sits somewhere in between.
+> > If you remember, this gene was found to be expressed in the previous Scanpy tutorial also in low numbers across the sample, and in the other Monocle tutorial (using Galaxy tools and annotated data) algorithms allowed us to gather the cells expressing that gene into a distinct group. Our result now sits somewhere in between.
> > data:image/s3,"s3://crabby-images/9915a/9915a4daca6ce6bb3bb74fbcbefd66d14d10e137" alt="In Scanpy graph the marker gene appears throughout the entire sample in low numbers, in Monocle in Galaxy cells expressing hemoglobin gene were grouped into a small branch of DP-M4, allowing to group those cells. Monocle in RStudio graph is somewhere in between showing mostly low expression across the sample, but also having a tiny bit of grouped cells, less distinct than in Galaxy though."
> >
> {: .solution}
diff --git a/topics/single-cell/tutorials/scrna-case_monocle3-trajectories/tutorial.md b/topics/single-cell/tutorials/scrna-case_monocle3-trajectories/tutorial.md
index 5585fbdd9dada..d23edc73e0b1f 100644
--- a/topics/single-cell/tutorials/scrna-case_monocle3-trajectories/tutorial.md
+++ b/topics/single-cell/tutorials/scrna-case_monocle3-trajectories/tutorial.md
@@ -483,9 +483,9 @@ In the mentioned tutorial, we annotated the cells so that we know what type they
## Clustering
Don't get confused - we haven't clustered our cells yet, for now we have only plotted them based on cell type annotation. Now it's time to create clusters, which - in an ideal world where all computation picks up the exact biological phenomenons - would yield the same areas as the clusters determined by the Scanpy algorithms. Is this the case here? Do Monocle and Scanpy identify the same clusters?
->
+
Monocle uses a technique called "community detection" ({% cite Traag_2019 %}) to group cells. This approach was introduced by {% cite Levine_2015 %} as part of the phenoGraph algorithm.
->
+
Monocle also divides the cells into larger, more well separated groups called partitions, using a statistical test from {% cite Wolf_2019 %}, introduced as part of their [PAGA](https://github.com/theislab/paga) algorithm.
> Clusters vs partitions
@@ -540,8 +540,9 @@ If we compare the annotated cell types and the clusters that were just formed, w
## Gene expression
-> We haven't looked at gene expression yet! This step is particularly important when working with data which is not annotated. Then, based on the expression of marker genes, you are able to identify which clusters correspond to which cell types. This is indeed what we did in the previous tutorial using scanpy. We can do the same using Monocle3! Since we work on annotated data, we can directly check if the expressed genes actually correspond to the previously assigned cell types. If they do, that’s great - if two different methods are consistent, that gives us more confidence that our results are valid.
-> Below is the table that we used in the previous tutorial to identify the cell types.
+
+We haven't looked at gene expression yet! This step is particularly important when working with data which is not annotated. Then, based on the expression of marker genes, you are able to identify which clusters correspond to which cell types. This is indeed what we did in the previous tutorial using scanpy. We can do the same using Monocle3! Since we work on annotated data, we can directly check if the expressed genes actually correspond to the previously assigned cell types. If they do, that’s great - if two different methods are consistent, that gives us more confidence that our results are valid.
+Below is the table that we used in the previous tutorial to identify the cell types.
| Marker | Cell type |
|--------------------|
@@ -646,9 +647,9 @@ We’re getting closer and closer! The next step is to learn the trajectory grap
{: .hands_on}
As you can see, the learned trajectory path is just a line connecting the clusters. However, there are some important points to understand here.
-> If the resolution of the clusters is high, then the trajectory path will be very meticulous, strongly branched and curved. There's a danger here that we might start seeing things that don't really exist.
-> You can set an option to learn a single tree structure for all the partitions or use the partitions calculated when clustering and identify disjoint graphs in each. To make the right decision, you have to understand how/if the partitions are related and what would make more biolgical sense. In our case, we were only interested in a big partition containing all the cells and we ignored the small 'dot' classified as another partition.
-> There are many trajectory patterns: linear, cycle, bifurcation, tree and so on. Those patterns might correspond to various biological processes: transition events for different phases, cell cycle, cell differentiation. Therefore, branching points are quite important on the trajectory path. You can always plot them, {% icon history-share %} checking the correct box in {% tool Monocle3 plotCells %}.
+If the resolution of the clusters is high, then the trajectory path will be very meticulous, strongly branched and curved. There's a danger here that we might start seeing things that don't really exist.
+You can set an option to learn a single tree structure for all the partitions or use the partitions calculated when clustering and identify disjoint graphs in each. To make the right decision, you have to understand how/if the partitions are related and what would make more biolgical sense. In our case, we were only interested in a big partition containing all the cells and we ignored the small 'dot' classified as another partition.
+There are many trajectory patterns: linear, cycle, bifurcation, tree and so on. Those patterns might correspond to various biological processes: transition events for different phases, cell cycle, cell differentiation. Therefore, branching points are quite important on the trajectory path. You can always plot them, {% icon history-share %} checking the correct box in {% tool Monocle3 plotCells %}.
data:image/s3,"s3://crabby-images/cf6fb/cf6fba37b2f943d4b559cb0d39784960181c324f" alt="A trajectory path, branching out to connect all the clusters and thus show their relationships."
@@ -700,7 +701,7 @@ Finally, it's time to see our cells in pseudotime! We have already learned a tra
{: .tip}
Now we can see how all our hard work has come together to give a final pseudotime trajectory analysis. DN cells gently switching to DP-M which change into DP-L to finally become mature T-cells. Isn't it beautiful? But wait, don't be too enthusiastic - why on earth DP-M1 group branches out? We didn't expect that... What could that mean?
->
+
There are a lot of such questions in bioinformatics, and we're always get excited to try to answer them. However, with analysing scRNA-seq data, it's almost like you need to know about 75% of your data to make sure that your analysis is reasonable, before you can identify the 25% new information. Additionally, pseudotime analysis crucially depends on choosing the right analysis and parameter values, as we showed for example with initial dimensionality reduction during pre-processing. The outputs here, at least in our hands, are more sensitive to parameter choice than standard clustering analysis with Scanpy.
data:image/s3,"s3://crabby-images/78de8/78de80b132253eec9e5999e7698d3b6a35aa2cac" alt="Pseudotime plot, showing the development of T-cells – starting in dark blue on DN cells and ending up on mature T-cells, marked in yellow on pseudotime scale and (going in the opposite direction) DP-M1 branch which is marked in light orange."
diff --git a/topics/single-cell/tutorials/scrna-umis/tutorial.md b/topics/single-cell/tutorials/scrna-umis/tutorial.md
index 9ee242fafbcd0..27da075a1d8ed 100644
--- a/topics/single-cell/tutorials/scrna-umis/tutorial.md
+++ b/topics/single-cell/tutorials/scrna-umis/tutorial.md
@@ -179,7 +179,7 @@ This then provides us with the true count of the number of true transcripts for
>
> >
> >
-> 1. Yes, UMIs are not specific to genes and the same UMI barcode can tag the transcripts of different genes. UMIs are not universal tags, they are just 'added randomness' that help reduce amplification bias.
+> > 1. Yes, UMIs are not specific to genes and the same UMI barcode can tag the transcripts of different genes. UMIs are not universal tags, they are just 'added randomness' that help reduce amplification bias.
> > 2. Yes, UMIs are not precise but operate probabilistically. In most cases, two transcripts of the same gene will be tagged by different UMIs. In rarer (but still prevalent) cases, the same UMI will capture different transcripts of the same gene.
> > * One helpful way to think about how quantification is performed is to observe the following hierarchy of data `Cell Barcode → Gene → UMI`
> >
diff --git a/topics/statistics/tutorials/aberrant_pi3k_pathway_analysis/tutorial.md b/topics/statistics/tutorials/aberrant_pi3k_pathway_analysis/tutorial.md
index 18d1e91ef4c5f..a288d317dd1cf 100644
--- a/topics/statistics/tutorials/aberrant_pi3k_pathway_analysis/tutorial.md
+++ b/topics/statistics/tutorials/aberrant_pi3k_pathway_analysis/tutorial.md
@@ -50,7 +50,8 @@ In this tutorial we plan to measure aberrant PI3K pathway activity in TCGA datas
{: .agenda}
# **Pre-installed tutorial tools, datasets and workflows from the docker image**
-> An efficient way to install and run the tutorial using papaa tools is available on docker based galaxy instance that has pre-installed papaa tool-suite as **papaa** under tools section. Additionally this local galaxy instance comes with datasets and workflow for generating PI3K_OG classifier. Instructions to run the docker image is below.
+
+An efficient way to install and run the tutorial using papaa tools is available on docker based galaxy instance that has pre-installed papaa tool-suite as **papaa** under tools section. Additionally this local galaxy instance comes with datasets and workflow for generating PI3K_OG classifier. Instructions to run the docker image is below.
> Tutorial for galaxy docker container installation and running the workflow:
> 1. Pulling the docker image from docker hub: Open a terminal and type the following command:
@@ -115,29 +116,29 @@ In this tutorial we plan to measure aberrant PI3K pathway activity in TCGA datas
> Datasets descriptions
>
-- **pancan_rnaseq_freeze.tsv:** Publicly available gene expression data for the TCGA Pan-cancer dataset. This file has gene-expression data for ~20,000 genes (columns) in ~10,000 samples (rows).
-- **pancan_mutation_freeze.tsv:** Publicly available Mutational information for TCGA Pan-cancer dataset. This file has mutational data for all genes (columns) as binary valued (0/1) in all samples (rows).
-- **mutation_burden_freeze.tsv:** Publicly available Mutational information for TCGA Pan-cancer dataset. This file has mutational burden information for all samples(rows).
-- **sample_freeze.tsv:** The file lists the frozen samples as determined by TCGA PanCancer Atlas consortium along with raw RNA-Seq and mutation data. These were previously determined and included for all downstream analysis All other datasets were processed and subset according to the frozen samples.
-- **cosmic_cancer_classification.tsv:** Compendium of OG and TSG used for the analysis. This file has list of cancer genes(rows) from [cosmic database](https://cancer.sanger.ac.uk/cosmic) classified as Oncogene or tumor suppressor (columns).
-- **CCLE_DepMap_18Q1_maf_20180207.txt:** Publicly available Mutational data for CCLE cell lines from Broad Institute Cancer Cell Line Encyclopedia [CCLE](https://portals.broadinstitute.org/ccle)/[DepMap Portal](https://depmap.org/portal/). Variant classification along with nucleotide and protein level changes are provided in the columns for genes(rows).
-- **ccle_rnaseq_genes_rpkm_20180929_mod.tsv:** Publicly available Expression data for 1,019 cell lines (RPKM) from Broad Institute Cancer Cell Line Encyclopedia (CCLE) / DepMap Portal. This file has gene-expression data for genes(rows) in various cell lines (columns).
-- **CCLE_MUT_CNA_AMP_DEL_binary_Revealer.tsv:** Publicly available merged Mutational and copy number alterations that include gene amplifications and deletions for the CCLE cell lines. This file has mutational/copy number variation data for all cancer genes (rows) as binary valued (0/1) in all CCLE cell lines (columns).
-- **GDSC_cell_lines_EXP_CCLE_names.tsv:** Publicly available RMA normalized expression data for Genomics of Drug Sensitivity in Cancer [GDSC](https://www.cancerrxgene.org/) cell-lines. This data was subset to 382 cell lines that are common among CCLE and GDSC. This file has gene-expression data for genes(rows) in various cell lines (columns).
-- **GDSC_CCLE_common_mut_cnv_binary.tsv:** A subset of merged Mutational and copy number alterations that include gene amplifications and deletions for common cell lines between GDSC and CCLE.
-- **gdsc1_ccle_pharm_fitted_dose_data.txt:** Pharmacological data for GDSC-1 cell lines. This data was subset to 379 cell lines that are common among CCLE and GDSC. This file has pharmacological data like IC50, Z-scores, drug information, concentrations used, AUC(columns) of 304 tested compounds in various cell-lines(rows).
-- **gdsc2_ccle_pharm_fitted_dose_data.txt:** Pharmacological data for GDSC-2 cell lines. This data was subset to 347 cell lines that are common among CCLE and GDSC. This file has pharmacological data like IC50, Z-scores, drug information, concentrations used, AUC(columns) of 170 tested compounds in various cell-lines(rows).
-- **compounds_of_interest.txt:** This file contains the compounds of interest for generation of pharmacological correlations with classifier scores. List of inhibitor compounds against EGFR-signaling, ERK-MAPK-signaling, Other-kinases, PI3K/MTOR-signaling, and RTK-signaling pathways.
-- **tcga_dictonary.tsv:** List of cancer types used in the analysis.
-- **GSE69822_pi3k_sign.txt:** File with values assigned for tumor [1] or normal [-1] for external data samples deposited in Gene Expression Omnibus database accession:[GSE69822](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE69822).
-- **GSE69822_pi3k_trans.csv:** Variant stabilized transformed values for the RNA expression levels in the external samples from [GSE69822](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE69822).
-- **path_rtk_ras_pi3k_genes.txt:** List of genes belong to RTK,RAS,PI3K used in our study.
-- **path_myc_genes.txt:** List of genes belong to Myc signaling pathway [Sanchez Vega et.al 2018](https://www.cell.com/action/showPdf?pii=S0092-8674%2818%2930359-3).
-- **path_ras_genes.txt:** List of genes belong to Ras signaling Pathway [Sanchez Vega et.al 2018](https://www.cell.com/action/showPdf?pii=S0092-8674%2818%2930359-3).
-- **path_cell_cycle_genes.txt:** List of genes belong to cell cycle pathway [Sanchez Vega et.al 2018](https://www.cell.com/action/showPdf?pii=S0092-8674%2818%2930359-3).
-- **path_wnt_genes.txt:** List of genes belong to wnt signaling pathway [Sanchez Vega et.al 2018](https://www.cell.com/action/showPdf?pii=S0092-8674%2818%2930359-3).
-- **GSE94937_rpkm_kras.csv:** RNA expression levels in the external samples from [GSE94937](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE94937).
-- **GSE94937_kras_sign.txt:** File with values assigned for tumor [1] or normal [-1] for external data samples deposited in Gene Expression Omnibus database accession: [GSE94937](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE94937).
+> - **pancan_rnaseq_freeze.tsv:** Publicly available gene expression data for the TCGA Pan-cancer dataset. This file has gene-expression data for ~20,000 genes (columns) in ~10,000 samples (rows).
+> - **pancan_mutation_freeze.tsv:** Publicly available Mutational information for TCGA Pan-cancer dataset. This file has mutational data for all genes (columns) as binary valued (0/1) in all samples (rows).
+> - **mutation_burden_freeze.tsv:** Publicly available Mutational information for TCGA Pan-cancer dataset. This file has mutational burden information for all samples(rows).
+> - **sample_freeze.tsv:** The file lists the frozen samples as determined by TCGA PanCancer Atlas consortium along with raw RNA-Seq and mutation data. These were previously determined and included for all downstream analysis All other datasets were processed and subset according to the frozen samples.
+> - **cosmic_cancer_classification.tsv:** Compendium of OG and TSG used for the analysis. This file has list of cancer genes(rows) from [cosmic database](https://cancer.sanger.ac.uk/cosmic) classified as Oncogene or tumor suppressor (columns).
+> - **CCLE_DepMap_18Q1_maf_20180207.txt:** Publicly available Mutational data for CCLE cell lines from Broad Institute Cancer Cell Line Encyclopedia [CCLE](https://portals.broadinstitute.org/ccle)/[DepMap Portal](https://depmap.org/portal/). Variant classification along with nucleotide and protein level changes are provided in the columns for genes(rows).
+> - **ccle_rnaseq_genes_rpkm_20180929_mod.tsv:** Publicly available Expression data for 1,019 cell lines (RPKM) from Broad Institute Cancer Cell Line Encyclopedia (CCLE) / DepMap Portal. This file has gene-expression data for genes(rows) in various cell lines (columns).
+> - **CCLE_MUT_CNA_AMP_DEL_binary_Revealer.tsv:** Publicly available merged Mutational and copy number alterations that include gene amplifications and deletions for the CCLE cell lines. This file has mutational/copy number variation data for all cancer genes (rows) as binary valued (0/1) in all CCLE cell lines (columns).
+> - **GDSC_cell_lines_EXP_CCLE_names.tsv:** Publicly available RMA normalized expression data for Genomics of Drug Sensitivity in Cancer [GDSC](https://www.cancerrxgene.org/) cell-lines. This data was subset to 382 cell lines that are common among CCLE and GDSC. This file has gene-expression data for genes(rows) in various cell lines (columns).
+> - **GDSC_CCLE_common_mut_cnv_binary.tsv:** A subset of merged Mutational and copy number alterations that include gene amplifications and deletions for common cell lines between GDSC and CCLE.
+> - **gdsc1_ccle_pharm_fitted_dose_data.txt:** Pharmacological data for GDSC-1 cell lines. This data was subset to 379 cell lines that are common among CCLE and GDSC. This file has pharmacological data like IC50, Z-scores, drug information, concentrations used, AUC(columns) of 304 tested compounds in various cell-lines(rows).
+> - **gdsc2_ccle_pharm_fitted_dose_data.txt:** Pharmacological data for GDSC-2 cell lines. This data was subset to 347 cell lines that are common among CCLE and GDSC. This file has pharmacological data like IC50, Z-scores, drug information, concentrations used, AUC(columns) of 170 tested compounds in various cell-lines(rows).
+> - **compounds_of_interest.txt:** This file contains the compounds of interest for generation of pharmacological correlations with classifier scores. List of inhibitor compounds against EGFR-signaling, ERK-MAPK-signaling, Other-kinases, PI3K/MTOR-signaling, and RTK-signaling pathways.
+> - **tcga_dictonary.tsv:** List of cancer types used in the analysis.
+> - **GSE69822_pi3k_sign.txt:** File with values assigned for tumor [1] or normal [-1] for external data samples deposited in Gene Expression Omnibus database accession:[GSE69822](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE69822).
+> - **GSE69822_pi3k_trans.csv:** Variant stabilized transformed values for the RNA expression levels in the external samples from [GSE69822](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE69822).
+> - **path_rtk_ras_pi3k_genes.txt:** List of genes belong to RTK,RAS,PI3K used in our study.
+> - **path_myc_genes.txt:** List of genes belong to Myc signaling pathway [Sanchez Vega et.al 2018](https://www.cell.com/action/showPdf?pii=S0092-8674%2818%2930359-3).
+> - **path_ras_genes.txt:** List of genes belong to Ras signaling Pathway [Sanchez Vega et.al 2018](https://www.cell.com/action/showPdf?pii=S0092-8674%2818%2930359-3).
+> - **path_cell_cycle_genes.txt:** List of genes belong to cell cycle pathway [Sanchez Vega et.al 2018](https://www.cell.com/action/showPdf?pii=S0092-8674%2818%2930359-3).
+> - **path_wnt_genes.txt:** List of genes belong to wnt signaling pathway [Sanchez Vega et.al 2018](https://www.cell.com/action/showPdf?pii=S0092-8674%2818%2930359-3).
+> - **GSE94937_rpkm_kras.csv:** RNA expression levels in the external samples from [GSE94937](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE94937).
+> - **GSE94937_kras_sign.txt:** File with values assigned for tumor [1] or normal [-1] for external data samples deposited in Gene Expression Omnibus database accession: [GSE94937](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE94937).
{:.details}
# PanCancer aberrant pathway activity analysis (PAPAA)
@@ -155,11 +156,9 @@ Where *alpha* and *l* are regularization and elastic net mixing hyperparameters
***Sample Processing step:***
-- **x-matrix:**
- > Gene-expression data comprises of expression levels for ~20,000 genes/sample and ~10,000 samples. Top 8,000 highly variable genes per sample with in each disease were measured by median absolute deviation (MAD) and considered for analysis.
+- **x-matrix:**: Gene-expression data comprises of expression levels for ~20,000 genes/sample and ~10,000 samples. Top 8,000 highly variable genes per sample with in each disease were measured by median absolute deviation (MAD) and considered for analysis.
-- **y-matrix:**
- > Copy number and mutational data as binary valued (0/1) datasets for all samples. This matrix is subset to given pathway target genes and cancer types.
+- **y-matrix:**: Copy number and mutational data as binary valued (0/1) datasets for all samples. This matrix is subset to given pathway target genes and cancer types.
We then randomly held out 10% of the samples to create a test set and rest 90% for training. The testing set is used as the validation to evaluate the performance of any machine learning algorithm and the remaining parts are used for learning/training. The training set is balanced for different cancer-types and PI3K status.
diff --git a/topics/statistics/tutorials/gpu_jupyter_lab/tutorial.md b/topics/statistics/tutorials/gpu_jupyter_lab/tutorial.md
index 1160f46cecc5e..cd1960b3c3c3c 100644
--- a/topics/statistics/tutorials/gpu_jupyter_lab/tutorial.md
+++ b/topics/statistics/tutorials/gpu_jupyter_lab/tutorial.md
@@ -111,9 +111,9 @@ To use Git version control for cloning any codebase from GitHub, the following s
> 1. Create a new folder named `covid_ct_segmentation` alongside other folders such as "data", "outputs", "elyra" or you can use your favourite folder name.
> 2. Inside the created folder, clone a code repository by clicking on "Git" icon as shown in Figure 6.
> 3. In the shown popup, provide the repository path as shown below and then, click on "clone":
-> > ```
-> > https://github.com/anuprulez/gpu_jupyterlab_ct_image_segmentation
-> > ```
+> ```
+> https://github.com/anuprulez/gpu_jupyterlab_ct_image_segmentation
+> ```
> 4. The repository "anuprulez/gpu_jupyterlab_ct_image_segmentation" gets immediately cloned.
> 5. Move inside the created folder `gpu_jupyterlab_ct_image_segmentation`. A few notebooks can be found inside that are numbered.
> data:image/s3,"s3://crabby-images/4289a/4289a7aabe8c6fba6d369cd24ff5a9e80c27f609" alt="Clone repository"
@@ -171,7 +171,7 @@ The training task completed in the notebook above can also be sent to a Galaxy c
> >
> {: .comment}
>
-> > data:image/s3,"s3://crabby-images/3168c/3168cdd97c105c32a628cc55e0436c160f9f7eab" alt="Galaxy history"
+> data:image/s3,"s3://crabby-images/3168c/3168cdd97c105c32a628cc55e0436c160f9f7eab" alt="Galaxy history"
>
> **Note**: The training may take longer depending on how busy Galaxy's queueing is as it sends the training task to be done on a Galaxy cluster. Therefore, this feature should be used when the training task is expected to run for several hours. The training time is higher because a large Docker container is downloaded on the assigned cluster and only then, the training task can proceed.
>
@@ -211,7 +211,7 @@ In this mode, the GPU Jupyterlab tool executes the input `ipynb` file and produc
When the parameter `Execute notebook and return a new one` is set to `yes`, the GPU Jupyterlab tool can be used as a part of any workflow. In this mode, it requires an `ipynb` file/notebook that gets executed in Galaxy and output datasets if any become available in the Galaxy history. Along with a notebook, multiple input datasets can also be attached that become automatically available inside the notebook. They can be accessed inside the notebook and processed to produce desired output datasets. These output datasets can further be used with other Galaxy tools. The following image shows a sample workflow for illustration purposes. Similarly, high-quality workflows to analyse scientific datasets can be created.
-> data:image/s3,"s3://crabby-images/a141a/a141a0105a91b314b91229d3d7bb959fdc5675d5" alt=""A sample Galaxy workflow that uses GPU Jupyterlab as a tool""
+data:image/s3,"s3://crabby-images/a141a/a141a0105a91b314b91229d3d7bb959fdc5675d5" alt=""A sample Galaxy workflow that uses GPU Jupyterlab as a tool""
Let's look at how can this workflow be created in a step-wise manner. There are 3 steps - first, the training dataset is filtered using the `Filter` tool. The output of this tool along with 2 other datasets (`test_rows` and `test_rows_labels`), a sample IPython notebook is executed by the GPU Jupyterlab tool. The sample IPython notebook trains a simple machine learning model using the train dataset and creates a classification model using `RandomForestClassifier`. The trained model is then used to predict classes using the test dataset. The predicted classes is produced as a file in an output collection by the GPU Jupyterlab tool. As a last step, `Cut` tool is used to extract the first column of the output collection. Together, these steps showcase how the GPU Jupyterlab tool is used with other Galaxy tools in a workflow.
diff --git a/topics/transcriptomics/tutorials/ref-based/tutorial.md b/topics/transcriptomics/tutorials/ref-based/tutorial.md
index 42f7ccc306f50..ffaf5e41f4cfd 100644
--- a/topics/transcriptomics/tutorials/ref-based/tutorial.md
+++ b/topics/transcriptomics/tutorials/ref-based/tutorial.md
@@ -1615,7 +1615,7 @@ For more information about **DESeq2** and its outputs, you can have a look at th
> > The log2 fold-change is negative so it is indeed downregulated and the adjusted p-value is below 0.05 so it is part of the significantly changed genes.
> >
> > 3. DESeq2 in Galaxy returns the comparison between the different levels for the 1st factor, after
-correction for the variability due to the 2nd factor. In our current case, treated against untreated for any sequencing type. To compare sequencing types, we should run DESeq2 again switching factors: factor 1 (treatment) becomes factor 2 and factor 2 (sequencing) becomes factor 1.
+> > correction for the variability due to the 2nd factor. In our current case, treated against untreated for any sequencing type. To compare sequencing types, we should run DESeq2 again switching factors: factor 1 (treatment) becomes factor 2 and factor 2 (sequencing) becomes factor 1.
> > 4. To add the interaction between two factors (e.g. treated for paired-end data vs untreated for single-end), we should run DESeq2 another time but with only one factor with the following 4 levels:
> > - treated-PE
> > - untreated-PE
diff --git a/topics/transcriptomics/tutorials/rna-seq-bash-star-align/tutorial.md b/topics/transcriptomics/tutorials/rna-seq-bash-star-align/tutorial.md
index 8e1c4f4a96ac6..ee3df320a5be6 100644
--- a/topics/transcriptomics/tutorials/rna-seq-bash-star-align/tutorial.md
+++ b/topics/transcriptomics/tutorials/rna-seq-bash-star-align/tutorial.md
@@ -55,7 +55,6 @@ Each sample constitutes a separate biological replicate of the corresponding con
>
>
> This tutorial is significantly based on Galaxy's ["Reference-based RNA-Seq data analysis"]({% link topics/transcriptomics/tutorials/ref-based/tutorial.md %}) tutorial.
-
>
{: .comment}
@@ -96,7 +95,7 @@ The "Data Upload" process is the only one in this tutorial that takes place dire
> >
> {: .comment}
>
-> >Change the datatype from `fastqsanger` to `fastq`.
+> Change the datatype from `fastqsanger` to `fastq`.
>
> {% snippet faqs/galaxy/datasets_change_datatype.md datatype="fastq" %}
>
@@ -189,7 +188,7 @@ Sequence quality control is therefore an essential first step in your analysis.
> > ```
> {: .code-in}
> The same trimming procedure should take place for the second pair of reads (forward and reverse as above). After that, the files we are going to work with are the ones located in the **trimmedData** folder (4 in our case).
-
+>
{: .hands_on}
> FastQC on trimmed data
@@ -227,7 +226,6 @@ The alignment process consists of two steps:
> Our first step is to index the reference genome for use by STAR. Indexing allows the aligner to quickly find potential alignment sites for query sequences in a genome, which saves time during alignment. Indexing the reference only has to be run once. The only reason you would want to create a new index is if you are working with a different reference genome or you are using a different tool for alignment.
>
> > Indexing with `STAR`
-
> > ```bash
> > $ mkdir index
> > $ STAR --runThreadN 16 --runMode genomeGenerate --genomeDir ~/index --genomeFastaFiles /import/14 --sjdbGTFfile /import/15 --sjdbOverhang 100 --genomeSAindexNbases 12
diff --git a/topics/transcriptomics/tutorials/rna-seq-viz-with-volcanoplot-r/tutorial.md b/topics/transcriptomics/tutorials/rna-seq-viz-with-volcanoplot-r/tutorial.md
index 3467c17847ba6..c538a20242ae1 100644
--- a/topics/transcriptomics/tutorials/rna-seq-viz-with-volcanoplot-r/tutorial.md
+++ b/topics/transcriptomics/tutorials/rna-seq-viz-with-volcanoplot-r/tutorial.md
@@ -298,9 +298,9 @@ We'll make the points a bit smaller. We'll change to 0.5.
> >
> >
> > We could use `alpha =`. For example
-> ```R
-> geom_point(aes(colour = sig), alpha = 0.5)
-> ```
+> > ```R
+> > geom_point(aes(colour = sig), alpha = 0.5)
+> > ```
> >
> {: .solution}
{: .question}
@@ -336,9 +336,9 @@ We'll make the font size of the labels a bit smaller.
> >
> >
> > We could change the 10 to 20 here
-> ```R
-> top <- slice_min(results, order_by = pvalue, n = 20)
-> ```
+> > ```R
+> > top <- slice_min(results, order_by = pvalue, n = 20)
+> > ```
> >
> {: .solution}
{: .question}
diff --git a/topics/variant-analysis/tutorials/sars-cov-2/tutorial.md b/topics/variant-analysis/tutorials/sars-cov-2/tutorial.md
index b6528e96a9b70..5fe9c7d5b4539 100644
--- a/topics/variant-analysis/tutorials/sars-cov-2/tutorial.md
+++ b/topics/variant-analysis/tutorials/sars-cov-2/tutorial.md
@@ -141,7 +141,7 @@ SRA can be reached either directly through it's website, or through the tool pan
> > You may have noticed this text earlier when you were exploring Entrez search. This text only appears some of the time, when the number of search results falls within a fairly broad window. You won't see it if you only have a few results, and you won't see it if you have more results than the Run Selector can accept.
> >
> > *You need to get to Run Selector to send your results to Galaxy.* What if you don't have enough results to trigger this link being shown? In that case you call get to the Run Selector by **clicking** on the `Send to` pulldown menu at the top right of the results panel. To get to Run Selector, **select** `Run Selector` and then **click** the `Go` button.
-> data:image/s3,"s3://crabby-images/91b35/91b3547751dd8cc5eec3c11553f5d485dd681ecc" alt="sra entrez send to"
+> > data:image/s3,"s3://crabby-images/91b35/91b3547751dd8cc5eec3c11553f5d485dd681ecc" alt="sra entrez send to"
> {: .tip}
>
>
diff --git a/topics/variant-analysis/tutorials/somatic-variant-discovery/tutorial.md b/topics/variant-analysis/tutorials/somatic-variant-discovery/tutorial.md
index 86a31d2208fdc..c8afcbb26de9e 100644
--- a/topics/variant-analysis/tutorials/somatic-variant-discovery/tutorial.md
+++ b/topics/variant-analysis/tutorials/somatic-variant-discovery/tutorial.md
@@ -318,9 +318,9 @@ However, because of the high average data quality, there was no need to perform
> - *"Read group sample name (SM)"*: `Not available.`
> - *"Platform/technology used to produce the reads (PL)"*: `ILLUNINA`
> - *"Select analysis mode"*: `Simple illumina mode`
- {: .hands_on}
+{: .hands_on}
- > Name the created list as **Mapping-lsit**
+Name the created list as **Mapping-lsit**
diff --git a/topics/variant-analysis/tutorials/tb-variant-analysis/tutorial.md b/topics/variant-analysis/tutorials/tb-variant-analysis/tutorial.md
index ed331ccf3c6fa..54a4fc3b791c2 100644
--- a/topics/variant-analysis/tutorials/tb-variant-analysis/tutorial.md
+++ b/topics/variant-analysis/tutorials/tb-variant-analysis/tutorial.md
@@ -269,8 +269,8 @@ We still cannot entirely trust the proposed variants. In particular, there are r
> variants predicted in the M. tuberculosis
> genome, using multiple different strategies.
> Firstly, certain regions of the Mtb genome
-contain repetitive sequences, e.g. from
-the PE/PPE gene family. Historically all of the genomic regions corresponding to
+> contain repetitive sequences, e.g. from
+> the PE/PPE gene family. Historically all of the genomic regions corresponding to
> those genes were filtered out but
> the new default draws on work from
> Maximillian Marin and others. This
@@ -278,12 +278,12 @@ the PE/PPE gene family. Historically all of the genomic regions corresponding to
> regions is the current region filter in
> TB Variant Filter for reads over 100 bp.
> If you are using shorter reads (e.g. from Illumina iSeq) the "Refined Low Confidence and Low Mappability" region list should be used instead.
->For more on how these regions were calculated read the [paper](https://academic.oup.com/bioinformatics/article-abstract/38/7/1781/6502279?login=false) or [preprint](https://www.biorxiv.org/content/10.1101/2021.04.08.438862v3.full).
+> For more on how these regions were calculated read the [paper](https://academic.oup.com/bioinformatics/article-abstract/38/7/1781/6502279?login=false) or [preprint](https://www.biorxiv.org/content/10.1101/2021.04.08.438862v3.full).
>
> In addition to region filters, filters for variant type, allele frequency, coverage depth and distance from indels are provided.
> Older variant callers struggled to accurately
> call insertions and deletions (indels) but more recent tools (e.g. GATK v4 and the variant caller used in Snippy, Freebayes) no longer have this weakness. One remaining reason to filter SNVs/SNPs near indels is that they might have a different
-evolutionary history to "free standing" SNVs/SNPs, so the "close to indel filter" is still available in TB Variant Filter in case such SNPs/SNVs should be filtered out.
+> evolutionary history to "free standing" SNVs/SNPs, so the "close to indel filter" is still available in TB Variant Filter in case such SNPs/SNVs should be filtered out.
{: .details}
Now that we have a collection of *high quality variants* we can search them against variants known to be associated with drug resistance. The *TB Profiler* tool does this using a database of variants curated by Dr Jody Phelan at the London School of Hygiene and Tropical Medicine. It can do its own mapping and variant calling but also accepts mapped reads in BAM format as input. It does its own variant calling and filtering.
diff --git a/topics/visualisation/tutorials/circos/tutorial.md b/topics/visualisation/tutorials/circos/tutorial.md
index 1d8fd7f261514..601da8188a390 100644
--- a/topics/visualisation/tutorials/circos/tutorial.md
+++ b/topics/visualisation/tutorials/circos/tutorial.md
@@ -531,7 +531,7 @@ You should see a plot like:
> Background: Chromothripsis
>
> **Chromothripsis** is a phenomenon whereby (part of) a chromosome is shattered in a single catastrophic event, and subsequently imprecisely stitched
-together by the cell's repair mechanisms. This leads to a huge number of SV junctions.
+> together by the cell's repair mechanisms. This leads to a huge number of SV junctions.
>
> data:image/s3,"s3://crabby-images/9c9db/9c9dbf5e1d7e2281e042343088ec3cab297381ed" alt="Chromothripsis"{: width="60%"}
>