Skip to content

Latest commit

 

History

History
28 lines (24 loc) · 1.44 KB

README.md

File metadata and controls

28 lines (24 loc) · 1.44 KB

This directory contains the knowledge graph data files required to train and evaluate the ProtHGT model.

Downloading the Data

You can download the ProtHGT knowledge graph data files from this Google Drive link.

Available Files

├── prothgt-kg.pt                      # The default full knowledge graph containing TAPE embeddings as the initial protein representations.
├── prothgt-train-graph.pt             # Training set (80% of the default full KG).
├── prothgt-val-graph.pt               # Validation set (10% of the default full KG).
├── prothgt-test-graph.pt              # Test set (10% of the default full KG).
└── alternative_protein_embeddings/    # Contains alternative KGs with different protein representations (e.g., APAAC, ESM-2 and ProtT5).
    ├──apaac/
        ├── prothgt-apaac-train-graph.pt
        ├── prothgt-apaac-val-graph.pt
        └── prothgt-apaac-test-graph.pt
    ├──esm2/
    │   ├── prothgt-esm2-train-graph.pt
    │   ├── prothgt-esm2-val-graph.pt
    │   └── prothgt-esm2-test-graph.pt
    └──prott5/
        ├── prothgt-prott5-train-graph.pt
        ├── prothgt-prott5-val-graph.pt
        └── prothgt-prott5-test-graph.pt

Once downloaded, place the files inside this data/ directory so that train.py and predict.py can access them.