Regarding Inconsistencies in Feature Dimensions and Column Names in TSB-AD-M Datasets

First and foremost, please allow us to express our sincere gratitude and appreciation for your efforts in organizing and open-sourcing these valuable time series anomaly detection datasets. 

In our current work, we are attempting to merge the individual data shards from each dataset into consolidated datasets for large-scale analysis. During this process, we've encountered some technical challenges related to feature consistency that we would like to bring to your attention. 

Specifically, we've identified inconsistencies that prevent automated merging of the data shards:

In the MITDB dataset, we noticed significant discrepancies in column names across different shards:
Shard 1 uses: {'MLII', 'V4', 'Label'}
Shard 2 uses: {'MLII', 'V1', 'Label'}
Shard 3 uses: {'V5', 'V2', 'Label'}

Similarly, in the LTDB dataset, we observed inconsistencies in feature dimensions: while the majority of shards (1, 3, 4, and 5) contain 2 feature columns, Shard 2 contains 3 feature columns, with an additional column named "ECG3". These inconsistencies require manual intervention to align the data structures before merging, which compromises the reproducibility and scalability of our analysis pipeline.

We would greatly appreciate any guidance or clarification you could provide regarding:

The intended data schema for each dataset
Recommended approaches for handling these inconsistencies
Whether there are established protocols for merging these shards

Thank you again for your valuable contribution to the research community. We look forward to your insights on this matter.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regarding Inconsistencies in Feature Dimensions and Column Names in TSB-AD-M Datasets #46

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Regarding Inconsistencies in Feature Dimensions and Column Names in TSB-AD-M Datasets #46

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions