UserGuide: Co Expression Analysis

Gene clustering for Co-expression analysis

ClustAndGO function is based on "coseq" package for gene clustering and enables to highlight DE genes (in at least 1 contrast) that have the same expression profile. Genes profiles are clustered using adapted transformations and mixture models or K-means algorithm. A model selection criteria is proposed to choose an appropriate number of clusters.

Through ClustAndGO, the list of the genes of each cluster is extracted as a table and graphical outputs are produced to visualize cluster profiles. The function provided in AskoR also allows automatically to describe each cluster (gene expression, gene lists shared with several contrasts, GO-enrichmment).

First, you have to define some parameters for the analysis. Here are the parameters to define for gene clustering analysis:

parameters$coseq_data the type of data you want to cluster ...
- "LogScaledData": log2(data+1) - for this you have to put parameters$coseq_transformation="none"
- "ExpressionProfiles": sample expression / sum of expression in all samples - choosen default approach by coseq creators.
parameters$coseq_model the algorythm for the clustering 'kmeans', 'Normal' (gaussian mixture model).
parameters$coseq_transformation the transformation applied to the expression profiles for the clustering - sample expression / sum of expression in all samples, ("voom", "logRPKM", "arcsin", "logit", "logMedianRef", "logclr", "clr", "alr", "ilr", "none").
parameters$coseq_ClustersNb the number of clusters to be build ...
- a fixed number from 2 to 25, i.e. parameters$coseq_ClustersNb=4.
- [by default] fixed range 2:25, i.e. parameters$coseq_ClustersNb=2:25. Coseq choose a number chosen between 2 to 25 clusters based on the Integrated Completed Likelihood (ICL) criterion (in the case of the Gaussian mixture model) or on the slope heuristics (in the case of the K-means algorithm).
parameters$coseq_HeatmapOrderSample choose "TRUE" if you don't want the samples to be clustered and prefer to keep the initial sample order.

Note: If a GO annotation file has been provided by the user, don't forget also to define parameters for GO enrichment analysis in each cluster (Cf. GO enrichment Section for parameters).

You can then run clustering analysis to highlight co-expression profiles:

# Parameters for gene clustering
parameters$coseq_data = "ExpressionProfiles"
parameters$coseq_model = "kmeans"     
parameters$coseq_transformation = "clr"
parameters$coseq_ClustersNb = 4   
parameters$coseq_HeatmapOrderSample = FALSE

# Parameters for GO enrichment 
parameters$GO_threshold = 0.05
parameters$GO_min_num_genes = 10
parameters$GO_algo = "weight01"
parameters$GO_stats = "fisher"

# Parameters for GO enrichment graphs
parameters$GO_max_top_terms = 10
parameters$GO_min_sig_genes = 2
parameters$Ratio_threshold = 2

# run analysis
clust<-ClustAndGO(asko_norm,resDEG,parameters, data)

A directory "DEG_test/Clustering/OnDEgenes/" is created, in which you will find one directory per analysis (the name will depend on the model, the transformation, and the number of clusters chosen).

Recommendations

Coseq is designed to clusterize expression profiles (sample expression / sum of expression in all samples) with one of the transformations available but some users may want to clusterize directly the scaled log+1 expression. In that case, set parameters$coseq_data="LogScaledData" and
parameters$coseq_transformation="none".
"K-means" model is widely used and quite fast but it is a particular case of a gaussian mixture model where samples are suposed to be independant (correlation between samples = 0 and same variance). If you want to use a more flexible model, you set the parameters$coseq_model="Normal" to apply a global gaussian mixture model.
When using "K-means" model, the "clr" transformation is recommended by coseq in many cases but, if you are trying to identify very specific clusters (for example tissue specific clusters) you can test the "logclr" transformation.
When using "Normal" model, "arcsin" or "logit" transformations should be tested at the beginning.
Analyze your results of GO-enrichment very carefully if you have less than 100 genes in the cluster.

Global outputs

Several files (tables and plots) are saved in the directory of the analysis and one directory per cluster is created (which contains detailed files for each cluster).

Tables

A global table is generated including the name of the genes, their cluster, their expression in all experimental conditions (normalized counts, in CPM), their DE status in all contrasts, and their description if an annotation file is provided.

	clusters.coexpr.	AC1	AC2	AC3	BC1	BC2	BC3	AC1vsAC2	AC1vsAC3	AC2vsAC3	BC1vsBC2	BC1vsBC3	BC2vsBC3	AC1vsBC1	AC2vsBC2	AC3vsBC3	Description
Gene_000003	2	0.4419681	1.765120	1.041042	1.698956	1.0034991	0.8513948	-1	0	0	0	0	0	-1	0	0	histone-lysine n-methyltransferase nsd2
Gene_000004	1	47.5222800	33.976130	33.823440	42.246010	51.9250600	46.6207000	1	1	0	0	0	0	0	-1	0	hypothetical protein pbra 009496
Gene_000006	2	0.4632453	2.305171	1.047562	1.402427	0.6600896	0.6161691	-1	0	0	0	0	0	0	1	0	cilia- and flagella-associated 57-like
...

Plots

A boxplot is produced to visualize the log scaled expression of all the genes in each cluster and each experimental condition:

boxplot

This is also represented as a heatmap, which can more easily enable to see gene proportion in each cluster and each experimental condition:

heatmap

Coseq package produces probability graphs that are concatenated in one graph in ClustAndGO function. On the left you can see if clusters are robusts (if the gene is affiliated to the same cluster for more than 80% of the iterations it appears in black and its affiliation is supposed to be robust - the affiliation of red genes is less robust). On the right, you can see the number (and so evaluate the proportion) of robust and non-robust genes in each cluster:

clusterProb

Per-cluster outputs

In the directory of the clustering analysis, a supplementary directory is created for each cluster in which you can find tables, plots, and a sub-directory which contains the lists of the genes for each GO-term enriched in the cluster.

Scaled-expression boxplot

First, for each cluster, log scaled expression is represented for the genes of the cluster.

cluster3a

GO-enrichment

If a GO-annotation file is provided by the user, ClustAndGO performs automatically GO-term enrichment analysis for each cluster.

A) Graphical outputs

Barpgraphs for each GO category are saved with the information of the enrichment ratio, the number of genes behind the enrichment and the p-value :

BP_GOgraph CC_GOgraph MF_GOgraph

And a synthetic plot concatenates the 3 categories :

Ratio_cluster3

B) Tabular outputs

Tabular outputs are created, identical to those produced by the GOenrichment function. Cf. GO enrichment Section

Enrichment of contrast-specific genes

Since not all genes are DE in all contrasts (clustering performed on the list of DE genes in at least 1 contrast of the experiment), a representation highlighting the number of DE genes in each cluster is created.

The plot shows, for each contrast of interest, the percentage of DE genes in the contrast in the cluster compared to the total number of DE genes in the contrast : "*" indicates whether the contrast is enriched in genes of the cluster. To define this significance, a Chi2 test is performed between 1) the (observed) proportion of genes in each contrast belonging to the cluster relative to the total number of genes in the contrast, and 2) the (expected) proportion of total genes in the cluster relative to the total number of DE genes in at least 1 of the contrasts (total number of genes used for clustering).
If the Chi2 p-value is less than 0.05, and the observed proportion is greater than the expected proportion in a particular contrast, the contrast is statistically significantly enriched in genes of the cluster (data embedded in the graph: "***" 0.001; "**" 0.01; "*" 0.05).

For example in cluster 3, the following plot shows that more than 750 genes are DE in contrast AC1vsAC3 and account for 40.5% of the total DE genes in contrast AC1vsAC3 (whereas the cluster 3 represents only 16.8% of the total number of total DE genes); this proportion shows that contrast AC1vsAC3 is significantly ("***") enriched in genes of the cluster 3 (the contrast would not have been significantly enriched with genes of the cluster 3 if 16.8% of its genes had belonged to cluster 3, according to a random distribution of genes in the contrasts).

Be careful: the lower the number of genes (in the contrast and/or in the cluster), the less reliable the Chi2 test is. The interpretation of the significance is then difficult.

GenesInContrasts_cluster3

UpSet plot of DEGs in contrasts

To identify in each cluster the DE genes specific to each contrast or common to several, the intersections between DE gene lists are represented with the UpSetR package.

UpSet_cluster3

To go further : Include genes that are NOT DE

If you want, then you can include genes that are not DE in an additional (artificial) cluster and visualize it with the clusters identified on the DE genes with ClustAndGO function. This will also realize a GO-enrichment on these genes in a "NOT_DE" folder.

NOTE: you have to run this function after creating "clust" object with your ClustAndGO analysis (you can create multiple "clust" objects with different names and run the function on it).

IncludeNonDEgenes_InClustering(data, asko_norm, resDEG, parameters, clust)

It can allow you to visualize the profile of the genes that are not DE and their proportion in the whole experiment:

BP_ScaledCPM_WithNonDEgenes

HT_ScaledCPM_WithNonDEgenes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UserGuide: Co Expression Analysis

Gene clustering for Co-expression analysis

Recommendations

Global outputs

Tables

Plots

Per-cluster outputs

Scaled-expression boxplot

GO-enrichment

Enrichment of contrast-specific genes

UpSet plot of DEGs in contrasts

To go further : Include genes that are NOT DE

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally