First and foremost, please allow us to express our sincere gratitude and appreciation for your efforts in organizing and open-sourcing these valuable time series anomaly detection datasets.
In our current work, we are attempting to merge the individual data shards from each dataset into consolidated datasets for large-scale analysis. During this process, we've encountered some technical challenges related to feature consistency that we would like to bring to your attention.
Specifically, we've identified inconsistencies that prevent automated merging of the data shards:
In the MITDB dataset, we noticed significant discrepancies in column names across different shards:
Shard 1 uses: {'MLII', 'V4', 'Label'}
Shard 2 uses: {'MLII', 'V1', 'Label'}
Shard 3 uses: {'V5', 'V2', 'Label'}
Similarly, in the LTDB dataset, we observed inconsistencies in feature dimensions: while the majority of shards (1, 3, 4, and 5) contain 2 feature columns, Shard 2 contains 3 feature columns, with an additional column named "ECG3". These inconsistencies require manual intervention to align the data structures before merging, which compromises the reproducibility and scalability of our analysis pipeline.
We would greatly appreciate any guidance or clarification you could provide regarding:
The intended data schema for each dataset
Recommended approaches for handling these inconsistencies
Whether there are established protocols for merging these shards
Thank you again for your valuable contribution to the research community. We look forward to your insights on this matter.
First and foremost, please allow us to express our sincere gratitude and appreciation for your efforts in organizing and open-sourcing these valuable time series anomaly detection datasets.
In our current work, we are attempting to merge the individual data shards from each dataset into consolidated datasets for large-scale analysis. During this process, we've encountered some technical challenges related to feature consistency that we would like to bring to your attention.
Specifically, we've identified inconsistencies that prevent automated merging of the data shards:
In the MITDB dataset, we noticed significant discrepancies in column names across different shards:
Shard 1 uses: {'MLII', 'V4', 'Label'}
Shard 2 uses: {'MLII', 'V1', 'Label'}
Shard 3 uses: {'V5', 'V2', 'Label'}
Similarly, in the LTDB dataset, we observed inconsistencies in feature dimensions: while the majority of shards (1, 3, 4, and 5) contain 2 feature columns, Shard 2 contains 3 feature columns, with an additional column named "ECG3". These inconsistencies require manual intervention to align the data structures before merging, which compromises the reproducibility and scalability of our analysis pipeline.
We would greatly appreciate any guidance or clarification you could provide regarding:
The intended data schema for each dataset
Recommended approaches for handling these inconsistencies
Whether there are established protocols for merging these shards
Thank you again for your valuable contribution to the research community. We look forward to your insights on this matter.