Prevalence and Length of Open Reading Frames Vary Across Randomly Generated Sequences of Different Nucleotide Compositions.

Citation: Neo, CY, Ling, MHT. 2020. Prevalence and Length of Open Reading Frames Vary Across Randomly Generated Sequences of Different Nucleotide Compositions. EC Microbiology 16(7): 72-78.

Link to [abstract] and [PDF].

Here is the permanent [PDF] and [data set] link to my archive.

The emergence of open reading frames is an important step in the origination of de novo genes. However, the conditions leading to the origination of de novo genes is not well-understood. This study aims to determine the effect of nucleotide composition on the length and occurrence of ORFs by examining various ORF parameters using randomly generated sequences from 85 different nucleotide compositions. Our results suggest that various ORF parameters are significant across different nucleotide compositions (p-value < 1E-120). The average length, standard error of the average length, average maximum length, and standard error of the average maximum length of ORFs can be moderately predictable (0.43 < r^2 < 0.59) by nucleotide compositions. These results suggest that the prevalence and length of ORFs may be function of the underlying nucleotide composition.

