Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors spotted during dataset conversion #7

Open
stachu3478 opened this issue Nov 22, 2022 · 3 comments
Open

Errors spotted during dataset conversion #7

stachu3478 opened this issue Nov 22, 2022 · 3 comments

Comments

@stachu3478
Copy link

Hi,

I have written a script that converts MUSCIMA++ v2.1 dataset into an other, internal, similar dataset. The problems i have met seem to be result from inaccuracies of the annotations. To isolate the problematic objects, i have prepared a log file below. There are three things i have discovered while i was writing my script:

  1. Duplicated bounding boxes - found pairs of objects that have the same features except id, relations and sometimes mask (?!). Those are logged like: Found X duplicates in Y tensor([<coordinates of the duplicated bounding boxes>]) - spotted during exploration of the two next points,
  2. Some of the objects failed to convert due to probably being misclassified (usually as an articStaccato being augmentationDot in real). For articulationStaccato my converter needs to know whether the dot is below or above related notes and for those cases the predictions are ambigous, so the lines start like Found multiple matches for articStaccatoBelow [...]
  3. Missing relations like articulationStaccato at [X1, Y1, X2, Y2] is not related to any object [...]. Conversion also fails in the way of the 2.

And here is the log:

datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-02_N-13_D-ideal.xml:
Found 1 duplicates in barline tensor([[713, 269, 723, 385]])
Found 1 duplicates in measureSeparator tensor([[713, 269, 723, 385]])
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-02_N-17_D-ideal.xml:
Found 1 duplicates in barline tensor([[ 705,  985,  714, 1101]])
Found multiple matches for articStaccatoBelow at datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-02_N-17_D-ideal.xml [331.0, 919.0, 342.0, 926.0], treating previous as augmentationDot
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-03_N-01_D-ideal.xml:
Found 1 duplicates in articulationStaccato tensor([[1615,  270, 1620,  276]])
Found 1 duplicates in tuple tensor([[2271,  521, 2406,  554]])
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-06_N-02_D-ideal.xml:
Found 2 duplicates in barline tensor([[ 585,  261,  594,  363],
        [1082,  495, 1092,  612]])
Found 2 duplicates in measureSeparator tensor([[ 585,  261,  594,  363],
        [1082,  495, 1092,  612]])
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-06_N-16_D-ideal.xml:
Found 8 duplicates in articulationStaccato tensor([[1129, 1591, 1137, 1598],
        [1166, 1583, 1174, 1590],
        [1276, 1629, 1285, 1636],
        [1336, 1627, 1343, 1634],
        [1373, 1628, 1381, 1634],
        [1416, 1622, 1424, 1630],
        [1370, 1654, 1379, 1662],
        [1411, 1654, 1419, 1662]])
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-07_N-05_D-ideal.xml:
Found 1 duplicates in augmentationDot tensor([[2897,  557, 2904,  565]])
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-07_N-08_D-ideal.xml:
articulationStaccato at [1198.0, 346.0, 1206.0, 354.0] is not related to any object. Used tensor([[1188.,  307., 1219.,  329.]]) as fallback
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-08_N-15_D-ideal.xml:
Found 1 duplicates in augmentationDot tensor([[3087,  547, 3095,  556]])
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-10_N-01_D-ideal.xml:
Found 1 duplicates in articulationStaccato tensor([[848, 606, 855, 613]])
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-12_N-04_D-ideal.xml:
Found 1 duplicates in articulationStaccato tensor([[ 691, 1150,  698, 1157]])
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-13_N-02_D-ideal.xml:
Found 1 duplicates in articulationStaccato tensor([[664, 515, 669, 519]])
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-13_N-16_D-ideal.xml:
Found 1 duplicates in articulationStaccato tensor([[ 983, 1068,  991, 1075]])
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-16_N-06_D-ideal.xml:
Found 1 duplicates in accidentalFlat tensor([[2092, 1128, 2110, 1163]])
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-16_N-17_D-ideal.xml:
articulationStaccato at [647.0, 1498.0, 653.0, 1506.0] is not related to any object. Used tensor([[ 634., 1457.,  661., 1486.]]) as fallback
articulationStaccato at [2603.0, 2352.0, 2611.0, 2357.0] is not related to any object. Used tensor([[2601., 2298., 2636., 2334.]]) as fallback
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-18_N-12_D-ideal.xml:
Found 10 duplicates in repeatDot tensor([[ 457, 1258,  464, 1265],
        [ 457, 1278,  466, 1285],
        [ 457, 1495,  465, 1501],
        [ 457, 1516,  466, 1523],
        [ 456, 1732,  463, 1737],
        [ 456, 1748,  463, 1754],
        [ 461, 2008,  469, 2014],
        [2412,  789, 2418,  799],
        [2419, 1026, 2423, 1031],
        [2420, 1045, 2424, 1051]])
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-19_N-04_D-ideal.xml:
articulationStaccato at [2576.0, 223.0, 2584.0, 230.0] is not related to any object. Used tensor([[2566.,  239., 2594.,  268.]]) as fallback
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-21_N-08_D-ideal.xml:
Found 1 duplicates in stem tensor([[1557, 1341, 1567, 1399]])
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-22_N-10_D-ideal.xml:
Found 1 duplicates in characterSmallP tensor([[1290, 1387, 1320, 1439]])
Found 1 duplicates in dynamicLetterP tensor([[1290, 1387, 1320, 1439]])
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-23_N-17_D-ideal.xml:
Found 1 duplicates in beam tensor([[732, 498, 792, 508]])
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-24_N-01_D-ideal.xml:
articulationStaccato at [2401.0, 241.0, 2409.0, 249.0] is not related to any object. Used tensor([[2394.,  271., 2419.,  298.]]) as fallback
articulationStaccato at [2699.0, 261.0, 2706.0, 269.0] is not related to any object. Used tensor([[2691.,  290., 2718.,  312.]]) as fallback
Found multiple matches for articStaccatoBelow at datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-24_N-01_D-ideal.xml [2196.0, 519.0, 2203.0, 527.0], treating previous as augmentationDot
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-24_N-18_D-ideal.xml:
Found 2 duplicates in restHalf tensor([[2435, 2023, 2471, 2045],
        [2178, 2020, 2214, 2041]])
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-27_N-02_D-ideal.xml:
articulationStaccato at [2621.0, 240.0, 2626.0, 248.0] is not related to any object. Used tensor([[2591.,  261., 2618.,  280.]]) as fallback
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-28_N-09_D-ideal.xml:
Found 1 duplicates in repeat tensor([[1151,  488, 1210,  611]])
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-30_N-17_D-ideal.xml:
Found 5 duplicates in articulationStaccato tensor([[1969,  777, 1976,  783],
        [1391,  627, 1401,  634],
        [1503,  634, 1510,  641],
        [2248, 1106, 2255, 1114],
        [1331, 2207, 1341, 2214]])
Found multiple matches for articStaccatoBelow at datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-30_N-17_D-ideal.xml [445.0, 1812.0, 452.0, 1819.0], treating previous as augmentationDot
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-33_N-04_D-ideal.xml:
articulationStaccato at [1131.0, 379.0, 1138.0, 387.0] is not related to any object. Used tensor([[1129.,  328., 1143.,  347.]]) as fallback
articulationStaccato at [1921.0, 213.0, 1927.0, 220.0] is not related to any object. Used tensor([[1914.,  235., 1927.,  266.]]) as fallback
Found multiple matches for articStaccatoBelow at datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-33_N-04_D-ideal.xml [1519.0, 461.0, 1526.0, 468.0], treating previous as augmentationDot
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-34_N-03_D-ideal.xml:
Found 1 duplicates in restWhole tensor([[780, 753, 829, 770]])
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-35_N-05_D-ideal.xml:
Found 1 duplicates in barline tensor([[1290,  959, 1300, 1082]])
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-36_N-14_D-ideal.xml:
Found 3 duplicates in characterDot tensor([[ 494, 1160,  503, 1166],
        [1332, 1139, 1341, 1145],
        [ 589, 1624,  597, 1631]])
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-37_N-13_D-ideal.xml:
Found 1 duplicates in barline tensor([[2579,  979, 2587, 1087]])
Found 1 duplicates in measureSeparator tensor([[2579,  979, 2587, 1087]])
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-37_N-17_D-ideal.xml
Found 1 duplicates in augmentationDot tensor([[2400, 1006, 2408, 1014]])
Found 1 duplicates in articulationStaccato tensor([[ 380, 1622,  388, 1630]])
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-38_N-18_D-ideal.xml:
Found 2 duplicates in augmentationDot tensor([[ 511, 1466,  518, 1472],
        [1323, 1992, 1332, 2000]])
Found 1 duplicates in restWhole tensor([[1100, 1706, 1145, 1727]])
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-41_N-02_D-ideal.xml:
Found multiple matches for articStaccatoBelow at datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-41_N-02_D-ideal.xml [810.0, 241.0, 817.0, 248.0], treating previous as augmentationDot
Found multiple matches for articStaccatoBelow at datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-41_N-02_D-ideal.xml [1269.0, 220.0, 1275.0, 229.0], treating previous as augmentationDot
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-41_N-16_D-ideal.xml:
articulationStaccato at [1410.0, 1653.0, 1418.0, 1659.0] is not related to any object. Used tensor([[1398., 1663., 1429., 1685.]]) as fallback
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-43_N-14_D-ideal.xml:
Found 3 duplicates in articulationStaccato tensor([[2287, 1526, 2294, 1531],
        [2407, 1437, 2414, 1443],
        [2557, 1403, 2564, 1408]])
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-44_N-06_D-ideal.xml:
Found 2 duplicates in barline tensor([[2041, 1202, 2050, 1315],
        [2790, 1434, 2799, 1553]])
Found 2 duplicates in measureSeparator tensor([[2041, 1202, 2050, 1315],
        [2790, 1434, 2799, 1553]])
Found multiple matches for articStaccatoBelow at datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-44_N-06_D-ideal.xml [1513.0, 484.0, 1524.0, 494.0], treating previous as augmentationDot
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-44_N-13_D-ideal.xml:
Found 1 duplicates in characterSmallF tensor([[2943,  871, 2972,  915]])
Found 1 duplicates in dynamicLetterF tensor([[2943,  871, 2972,  915]])
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-44_N-17_D-ideal.xml:
articulationStaccato at [1556.0, 893.0, 1562.0, 899.0] is not related to any object. Used tensor([[1551.,  861., 1573.,  883.]]) as fallback
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-47_N-04_D-ideal.xml:
Found 1 duplicates in barline tensor([[1427,  265, 1435,  377]])
Found 1 duplicates in measureSeparator tensor([[1427,  265, 1435,  377]])
Found 1 duplicates in tuple tensor([[1877,  157, 1894,  182]])
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-48_N-02_D-ideal.xml:
Found 1 duplicates in augmentationDot tensor([[1935, 1403, 1942, 1409]])
articulationStaccato at [2755.0, 244.0, 2762.0, 250.0] is not related to any object. Used tensor([[2741.,  266., 2762.,  292.]]) as fallback
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-48_N-16_D-ideal.xml:
Found multiple matches for articStaccatoBelow at datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-48_N-16_D-ideal.xml [1669.0, 1637.0, 1681.0, 1651.0], treating previous as augmentationDot
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-49_N-11_D-ideal.xml:
Found 1 duplicates in repeatDot tensor([[ 792, 1473,  799, 1483]])
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-03_N-01_D-ideal.xml:
Found 1 duplicates in articulationStaccato tensor([[1615,  270, 1620,  276]])
Found 1 duplicates in tuple tensor([[2271,  521, 2406,  554]])
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-13_N-02_D-ideal.xml:
Found 1 duplicates in articulationStaccato tensor([[664, 515, 669, 519]])
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-07_N-08_D-ideal.xml:
articulationStaccato at [1198.0, 346.0, 1206.0, 354.0] is not related to any object. Used tensor([[1188.,  307., 1219.,  329.]]) as fallback
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-49_N-11_D-ideal.xml:
Found 1 duplicates in repeatDot tensor([[ 792, 1473,  799, 1483]])
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-41_N-02_D-ideal.xml:
Found multiple matches for articStaccatoBelow at datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-41_N-02_D-ideal.xml [810.0, 241.0, 817.0, 248.0], treating previous as augmentationDot
Found multiple matches for articStaccatoBelow at datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-41_N-02_D-ideal.xml [1269.0, 220.0, 1275.0, 229.0], treating previous as augmentationDot
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-38_N-18_D-ideal.xml:
Found 2 duplicates in augmentationDot tensor([[ 511, 1466,  518, 1472],
        [1323, 1992, 1332, 2000]])
Found 1 duplicates in restWhole tensor([[1100, 1706, 1145, 1727]])
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-13_N-16_D-ideal.xml:
Found 1 duplicates in articulationStaccato tensor([[ 983, 1068,  991, 1075]])
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-38_N-18_D-ideal.xml:
Found 2 duplicates in augmentationDot tensor([[ 511, 1466,  518, 1472],
        [1323, 1992, 1332, 2000]])
Found 1 duplicates in restWhole tensor([[1100, 1706, 1145, 1727]])
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-35_N-05_D-ideal.xml:
Found 1 duplicates in barline tensor([[1290,  959, 1300, 1082]])
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-03_N-01_D-ideal.xml:
Found 1 duplicates in articulationStaccato tensor([[1615,  270, 1620,  276]])
Found 1 duplicates in tuple tensor([[2271,  521, 2406,  554]])
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-08_N-15_D-ideal.xml:
Found 1 duplicates in augmentationDot tensor([[3087,  547, 3095,  556]])
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-02_N-17_D-ideal.xml:
Found 1 duplicates in barline tensor([[ 705,  985,  714, 1101]])
Found multiple matches for articStaccatoBelow at datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-02_N-17_D-ideal.xml [331.0, 919.0, 342.0, 926.0], treating previous as augmentationDot
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-13_N-02_D-ideal.xml:
Found 1 duplicates in articulationStaccato tensor([[664, 515, 669, 519]])
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-27_N-02_D-ideal.xml:
articulationStaccato at [2621.0, 240.0, 2626.0, 248.0] is not related to any object. Used tensor([[2591.,  261., 2618.,  280.]]) as fallback
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-47_N-04_D-ideal.xml:
Found 1 duplicates in barline tensor([[1427,  265, 1435,  377]])
Found 1 duplicates in measureSeparator tensor([[1427,  265, 1435,  377]])
Found 1 duplicates in tuple tensor([[1877,  157, 1894,  182]])
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-47_N-04_D-ideal.xml:
Found 1 duplicates in barline tensor([[1427,  265, 1435,  377]])
Found 1 duplicates in measureSeparator tensor([[1427,  265, 1435,  377]])
Found 1 duplicates in tuple tensor([[1877,  157, 1894,  182]])
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-02_N-17_D-ideal.xml:
Found 1 duplicates in barline tensor([[ 705,  985,  714, 1101]])
Found multiple matches for articStaccatoBelow at datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-02_N-17_D-ideal.xml [331.0, 919.0, 342.0, 926.0], treating previous as augmentationDot
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-13_N-02_D-ideal.xml:
Found 1 duplicates in articulationStaccato tensor([[664, 515, 669, 519]])
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-23_N-17_D-ideal.xml:
Found 1 duplicates in beam tensor([[732, 498, 792, 508]])
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-38_N-18_D-ideal.xml:
Found 2 duplicates in augmentationDot tensor([[ 511, 1466,  518, 1472],
        [1323, 1992, 1332, 2000]])
Found 1 duplicates in restWhole tensor([[1100, 1706, 1145, 1727]])
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-47_N-04_D-ideal.xml:
Found 1 duplicates in barline tensor([[1427,  265, 1435,  377]])
Found 1 duplicates in measureSeparator tensor([[1427,  265, 1435,  377]])
Found 1 duplicates in tuple tensor([[1877,  157, 1894,  182]])
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-44_N-06_D-ideal.xml:
Found 2 duplicates in barline tensor([[2041, 1202, 2050, 1315],
        [2790, 1434, 2799, 1553]])
Found 2 duplicates in measureSeparator tensor([[2041, 1202, 2050, 1315],
        [2790, 1434, 2799, 1553]])
Found multiple matches for articStaccatoBelow at datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-44_N-06_D-ideal.xml [1513.0, 484.0, 1524.0, 494.0], treating previous as augmentationDot
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-10_N-01_D-ideal.xml:
Found 1 duplicates in articulationStaccato tensor([[848, 606, 855, 613]])
datasets/muscima/v2.1/data/cropobjects_withstaff/CVC-MUSCIMA_W-44_N-17_D-ideal.xml:
articulationStaccato at [1556.0, 893.0, 1562.0, 899.0] is not related to any object. Used tensor([[1551.,  861., 1573.,  883.]]) as fallback
@apacha
Copy link
Member

apacha commented Sep 3, 2023

That's great that you found those issues. Is there any chance that you could use this script to fix the reported issues, e.g., remove duplicates? I'm willing to create a new release with the issues fixed, but I don't have the time to fix the issues myself.

@stachu3478
Copy link
Author

That's great that you found those issues. Is there any chance that you could use this script to fix the reported issues, e.g., remove duplicates? I'm willing to create a new release with the issues fixed, but I don't have the time to fix the issues myself.

Thank you for the reply. Is the new release coming with some brand new changes or just to fix those issues?

@apacha
Copy link
Member

apacha commented Sep 12, 2023

I'm not planning to add new features or changes myself. However, the new release would include your fixes and if you're willing to add something else, that aswell.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants