Update data_processing.py #3

KemalOrer · 2025-02-06T21:17:40Z

No description provided.

atabeyunlu · 2025-02-13T20:50:16Z

data_processing.py

@@ -294,13 +306,85 @@ def create_act_inact_files_similarity_based_neg_enrichment_threshold(act_inact_f
    act_inact_count_fl.close()
    act_inact_comp_fl.close()

-def create_final_randomized_training_val_test_sets(activity_data,max_cores,targetid,target_prediction_dataset_path, pchembl_threshold=6):
+def create_final_randomized_training_val_test_sets(activity_data,max_cores,scaffold,targetid,target_prediction_dataset_path,moleculenet ,pchembl_threshold=6):


Here we need to take another approach. Please first look at this code to understand how we did the moleculenet split in the SELFormer. https://github.com/HUBioDataLab/SELFormer/blob/65e686feb72185cc95f5b81176444e20586848e1/prepare_finetuning_data.py#L42

What you did is not wrong but a bit problematic. We can just choose the last column (-1) for the label for classification and regression, and we can select a slice (e.g. 1:17). Now please examine the above code and write the function again. Also you can use a flag for MoleculeNet at the main_training part to route the moleculenet datasets to the new train_test function.

KemalOrer added 2 commits February 7, 2025 00:09

Update data_processing.py

a854293

Update data_processing.py

dd6b254

atabeyunlu requested changes Feb 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update data_processing.py #3

Update data_processing.py #3

KemalOrer commented Feb 6, 2025

atabeyunlu Feb 13, 2025

Update data_processing.py #3

Are you sure you want to change the base?

Update data_processing.py #3

Conversation

KemalOrer commented Feb 6, 2025

atabeyunlu Feb 13, 2025

Choose a reason for hiding this comment