-
Notifications
You must be signed in to change notification settings - Fork 2
An Example of Running TransDecoder
Referring the website of TransDecoder for details about how to install and run TransDecoder.
https://github.com/TransDecoder/TransDecoder/wiki
We provided example files for running TransDecoder. The files were stored in PATH_OF_PRECISONPRODB/examples/TransDecoder
.
PATH_OF_PRECISONPRODB
is the location of PrecisionProDB.
PATH_OF_TRANSDECODER
is the location of TransDecoder.
# change working directory to example folder
cd PATH_OF_PRECISONPRODB/examples/TransDecoder
# decompress the example files
gzip -d *.gz
# get transcript sequences from the gff file
$PATH_OF_TRANSDECODER/util/gtf_genome_to_cdna_fasta.pl TransDecoder.gtf TransDecoder.genome.fa >TransDecoder.transcripts.fa
# convert gff file to gff3 format
$PATH_OF_TRANSDECODER/util/gtf_to_alignment_gff3.pl TransDecoder.gtf >TransDecoder.gff3
# translate and predict proteins. -m is the minimum length of proteins.
$PATH_OF_TRANSDECODER/TransDecoder.LongOrfs -t TransDecoder.transcripts.fa -m 60
$PATH_OF_TRANSDECODER/TransDecoder.Predict -t TransDecoder.transcripts.fa
# map translated proteins to the genome
$PATH_OF_TRANSDECODER/util/cdna_alignment_orf_to_genome_orf.pl \
TransDecoder.transcripts.fa.transdecoder.gff3 \
TransDecoder.gff3 \
TransDecoder.transcripts.fa > TransDecoder.transcripts.fa.transdecoder.genome.gff3
# to save disk space, compress these files
gzip *
The output files are
- TransDecoder.gff3
- TransDecoder.transcripts.fa
- TransDecoder.transcripts.fa.transdecoder.bed
- TransDecoder.transcripts.fa.transdecoder.cds
- TransDecoder.transcripts.fa.transdecoder.genome.gff3: gff3 annotation based on the genome sequence with proteins.
- TransDecoder.transcripts.fa.transdecoder.gff3: gff3 annotation of proteins based on transcript sequences.
- TransDecoder.transcripts.fa.transdecoder.pep: final result of translated proteins.
Of these files, TransDecoder.transcripts.fa.transdecoder.pep
and TransDecoder.transcripts.fa.transdecoder.genome.gff3
are inputs for PrecisonProDB.
You might see some warning messages, which are just normal warning messages.
For some gff model generated by StringTie2, the single-exon transcripts might be assigned with strand ".", which is because that the program cannot determine its strand in chromosome. TransDecoder will translate those transcripts, but the assignment of protein strand in file TransDecoder.transcripts.fa.transdecoder.genome.gff3
might be wrong. The PrecisionProDB program will try to get the correct strand in this case.
PrecisonProDB