-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Open
Description
Description
The QlibDataset class loads training/validation data from pickle files (train_data.pkl or val_data.pkl) using Python's pickle.load. This poses a remote code execution (RCE) risk if an attacker can supply a malicious .pkl file.
class QlibDataset(Dataset):
def __init__(self, data_type: str = 'train'):
self.config = Config()
# codes...
if data_type == 'train':
# self.config.dataset_path = "./data/processed_datasets"
self.data_path = f"{self.config.dataset_path}/train_data.pkl"
self.n_samples = self.config.n_train_iter
else:
self.data_path = f"{self.config.dataset_path}/val_data.pkl"
self.n_samples = self.config.n_val_iter
with open(self.data_path, 'rb') as f:
self.data = pickle.load(f)Poc
train_dataset = QlibDataset(data_type='train')By placing a malicious train_data.pkl inside ./data/processed_datasets/, simply instantiating QlibDataset will execute arbitrary code.
Security Impact
If users download or reuse untrusted or third-party datasets, a malicious pickle file can lead to arbitrary code execution at dataset loading time.
This is especially dangerous in scenarios where datasets are shared, downloaded automatically, or reused across environments.
Recommendation
- If pickle must be used, consider adding a clear security warning indicating that only trusted datasets should be loaded.
- It is recommended to introduce a mechanism similar to trust_code (e.g., trust_dataset or trust_pickle) that requires users to explicitly acknowledge the risk before loading pickle-based datasets
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels