Security Issue: Unsafe deserialization in `QlibDataset`

## Description
The `QlibDataset` class loads training/validation data from pickle files (`train_data.pkl` or `val_data.pkl`) using Python's `pickle.load`. This poses a remote code execution (RCE) risk if an attacker can supply a malicious `.pkl` file.
```python
class QlibDataset(Dataset):
    def __init__(self, data_type: str = 'train'):
        self.config = Config()
        # codes...
        if data_type == 'train':
            # self.config.dataset_path = "./data/processed_datasets"
            self.data_path = f"{self.config.dataset_path}/train_data.pkl"
            self.n_samples = self.config.n_train_iter
        else:
            self.data_path = f"{self.config.dataset_path}/val_data.pkl"
            self.n_samples = self.config.n_val_iter

        with open(self.data_path, 'rb') as f:
            self.data = pickle.load(f)
```
##  Poc
```python
train_dataset = QlibDataset(data_type='train')
```
By placing a malicious train_data.pkl inside ./data/processed_datasets/, simply instantiating QlibDataset will execute arbitrary code.

## Security Impact
If users download or reuse untrusted or third-party datasets, a malicious pickle file can lead to arbitrary code execution at dataset loading time.
This is especially dangerous in scenarios where datasets are shared, downloaded automatically, or reused across environments.

## Recommendation
- If pickle must be used, consider adding a clear security warning indicating that only trusted datasets should be loaded.
- It is recommended to introduce a mechanism similar to trust_code (e.g., trust_dataset or trust_pickle) that requires users to explicitly acknowledge the risk before loading pickle-based datasets


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Security Issue: Unsafe deserialization in `QlibDataset` #216

Description

Poc

Security Impact

Recommendation

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Security Issue: Unsafe deserialization in QlibDataset #216

Description

Description

Poc

Security Impact

Recommendation

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Security Issue: Unsafe deserialization in `QlibDataset` #216