Skip to content

An Example of Running TransDecoder

Xiaolong Cao edited this page Oct 9, 2022 · 2 revisions

Install of TransDecoder

Referring the website of TransDecoder for details about how to install and run TransDecoder.
https://github.com/TransDecoder/TransDecoder/wiki

Running TransDecoder with example files

We provided example files for running TransDecoder. The files were stored in PATH_OF_PRECISONPRODB/examples/TransDecoder.

PATH_OF_PRECISONPRODB is the location of PrecisionProDB.
PATH_OF_TRANSDECODER is the location of TransDecoder.

# change working directory to example folder
cd PATH_OF_PRECISONPRODB/examples/TransDecoder

# decompress the example files
gzip -d *.gz

# get transcript sequences from the gff file
$PATH_OF_TRANSDECODER/util/gtf_genome_to_cdna_fasta.pl TransDecoder.gtf TransDecoder.genome.fa >TransDecoder.transcripts.fa

# convert gff file to gff3 format
$PATH_OF_TRANSDECODER/util/gtf_to_alignment_gff3.pl TransDecoder.gtf >TransDecoder.gff3

# translate and predict proteins. -m is the minimum length of proteins.
$PATH_OF_TRANSDECODER/TransDecoder.LongOrfs -t TransDecoder.transcripts.fa -m 60
$PATH_OF_TRANSDECODER/TransDecoder.Predict -t TransDecoder.transcripts.fa

# map translated proteins to the genome
$PATH_OF_TRANSDECODER/util/cdna_alignment_orf_to_genome_orf.pl \
                            TransDecoder.transcripts.fa.transdecoder.gff3 \
                            TransDecoder.gff3 \
                            TransDecoder.transcripts.fa > TransDecoder.transcripts.fa.transdecoder.genome.gff3

# to save disk space, compress these files
gzip *

The output files are

  • TransDecoder.gff3
  • TransDecoder.transcripts.fa
  • TransDecoder.transcripts.fa.transdecoder.bed
  • TransDecoder.transcripts.fa.transdecoder.cds
  • TransDecoder.transcripts.fa.transdecoder.genome.gff3: gff3 annotation based on the genome sequence with proteins.
  • TransDecoder.transcripts.fa.transdecoder.gff3: gff3 annotation of proteins based on transcript sequences.
  • TransDecoder.transcripts.fa.transdecoder.pep: final result of translated proteins.

Of these files, TransDecoder.transcripts.fa.transdecoder.pep and TransDecoder.transcripts.fa.transdecoder.genome.gff3 are inputs for PrecisonProDB.

You might see some warning messages, which are just normal warning messages.

For some gff model generated by StringTie2, the single-exon transcripts might be assigned with strand ".", which is because that the program cannot determine its strand in chromosome. TransDecoder will translate those transcripts, but the assignment of protein strand in file TransDecoder.transcripts.fa.transdecoder.genome.gff3 might be wrong. The PrecisionProDB program will try to get the correct strand in this case.

Clone this wiki locally