Cleaning up structure of input files to the pipeline

Currently the split file requires each line to be of the format `<pdb_id>_final`, and the dataset pipeline constructs paths to the file with `<BASE_DIR>/<pdb_id>/<pdb_id>_final.pdb`. A cleaner way to do this would likely be have the user specify `<BASE_DIR>` and each line on the split file should be a path to the pdb/cif file relative to the base dir.