Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error importing Open Catalyst 2022 LMDB files #1031

Open
allaffa opened this issue Feb 25, 2025 · 0 comments
Open

Error importing Open Catalyst 2022 LMDB files #1031

allaffa opened this issue Feb 25, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@allaffa
Copy link

allaffa commented Feb 25, 2025

Python version

Python 3.11.5

fairchem-core version

1.2.1

pytorch version

2.6.0

cuda version

12.1

Operating system version

Linux

Minimal example

def traj_to_torch_geom(self, traj_file):
        # Open LMDB
        env = lmdb.open(traj_file, subdir=False, readonly=True, lock=False, readahead=False, meminit=False)

        with env.begin() as txn:
            cursor = txn.cursor()

            for key, value in iterate_tqdm(cursor, verbosity_level=2, desc="Processing OC22 LMDB"):
                old_data = pickle.loads(value)  # Load trajectory data
                print(old_data)

Current behavior

When I import the LMDB files of the Open Catalyst 2022 dataset and try to load the PyG data objects, I obtain the following error

RuntimeError: The 'data' object was created by an older version of PyG. If this error occurred while loading an already existing dataset, remove the 'processed/' directory in the dataset's root folder and try again.

To my understanding, Open Catalyst 2022 was previously released in .traj format, and this enabled a more flexible import that was not strongly dependent off the version of the packages.
Is there any way that:

  1. the incompatibility between PyG versions can be solved - No, I do not want to downgrade my version of PyG
  2. make available the raw data of Open Catalyst 2022, which allows for more flexibility on the user?

Expected Behavior

I would like the code not to complain about versions of PyG when importing Data objects

Releasing the dataset in a format the enforces using a specific version of PyG severely affects the usability of this dataset. Providing the raw output in XYZ formats would enable a much wider usage of the dataset.

Relevant files to reproduce this bug

No response

@allaffa allaffa added the bug Something isn't working label Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant