wrong number of shards generated in dataset_info.json #15

aliciaji1993 · 2024-10-03T02:03:55Z

After dataset generation, the output file has total of 32 tf-record files in train and 8 in val, which is expected.

But in reality, the number of shards is only 25 for some reason and some shards are missing (as below). What could be the cause?

dataset folder: (missing some shards)

from dataset_info.json

"splits": [
    {
      "filepathTemplate": "{DATASET}-{SPLIT}.{FILEFORMAT}-{SHARD_X_OF_Y}",
      "name": "train",
      "numBytes": "3124634980",
      "shardLengths": [
        "49",
        "65",
        "51",
        "71",
        "70",
        "54",
        "67",
        "66",
        "61",
        "63",
        "56",
        "53",
        "60",
        "61",
        "71",
        "62",
        "68",
        "71",
        "58",
        "68",
        "64",
        "71",
        "71",
        "70",
        "58"
      ]
    },
    {
      "filepathTemplate": "{DATASET}-{SPLIT}.{FILEFORMAT}-{SHARD_X_OF_Y}",
      "name": "val",
      "numBytes": "758231159",
      "shardLengths": [
        "72",
        "67",
        "58",
        "72",
        "82",
        "68",
        "70",
        "61"
      ]
    }

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wrong number of shards generated in dataset_info.json #15

wrong number of shards generated in dataset_info.json #15

aliciaji1993 commented Oct 3, 2024 •

edited

Loading

wrong number of shards generated in dataset_info.json #15

wrong number of shards generated in dataset_info.json #15

Comments

aliciaji1993 commented Oct 3, 2024 • edited Loading

aliciaji1993 commented Oct 3, 2024 •

edited

Loading