Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update data_processing.py #3

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

KemalOrer
Copy link

No description provided.

@@ -294,13 +306,85 @@ def create_act_inact_files_similarity_based_neg_enrichment_threshold(act_inact_f
act_inact_count_fl.close()
act_inact_comp_fl.close()

def create_final_randomized_training_val_test_sets(activity_data,max_cores,targetid,target_prediction_dataset_path, pchembl_threshold=6):
def create_final_randomized_training_val_test_sets(activity_data,max_cores,scaffold,targetid,target_prediction_dataset_path,moleculenet ,pchembl_threshold=6):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we need to take another approach. Please first look at this code to understand how we did the moleculenet split in the SELFormer. https://github.com/HUBioDataLab/SELFormer/blob/65e686feb72185cc95f5b81176444e20586848e1/prepare_finetuning_data.py#L42

What you did is not wrong but a bit problematic. We can just choose the last column (-1) for the label for classification and regression, and we can select a slice (e.g. 1:17). Now please examine the above code and write the function again. Also you can use a flag for MoleculeNet at the main_training part to route the moleculenet datasets to the new train_test function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants