Skip to content

CDS entries in GFF3 file are not merged to a CompoundLocation #95

Open
@mikdur

Description

@mikdur

Given a gene that looks something like this in GFF3 notion:

##gff-version 3
scf_001 maker   gene    36837   38790   .       +       .       ID=BN869_G00000007;Name=BN869_G00000007;
scf_001 maker   mRNA    36837   38790   .       +       .       ID=BN869_T00000007_1;Parent=BN869_G00000007;Name=BN869_T00000007_1;
scf_001 maker   exon    36837   37491   .       +       .       ID=BN869_T00000007_1:exon:0;Parent=BN869_T00000007_1;
scf_001 maker   exon    37547   38790   .       +       .       ID=BN869_T00000007_1:exon:1;Parent=BN869_T00000007_1;
scf_001 maker   CDS     36837   37491   .       +       0       ID=BN869_T00000007_1:cds;Parent=BN869_T00000007_1;
scf_001 maker   CDS     37547   38790   .       +       2       ID=BN869_T00000007_1:cds;Parent=BN869_T00000007_1;

The GFF parser fails to join the two CDSs with the same ID into a single feature with a CompoundLocation. The result of this is that GenBank och EMBL files produced when merging (and flattening) GFF3 annotations get multiple CDSs where the CDS position instead should be a join, eg:

FT   CDS             join(36837..37491,37547..38790)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions