During a sanity check of this data I noticed that quite a lot of the training examples have identical sequences, but with different PSSM and entropy. The coordinates for these duplicates are also not identical, even under translation/rotation, though the one example I actually plotted after matching the coordinates under translation and rotation, had coordinates that we close to identical, but deviated in a few places.
See the attached example (it was too long to paste in here)
identical_sequences.zip
Other training examples were repeated 6 times in the data.
Is there any good reason for this or is this an error?
During a sanity check of this data I noticed that quite a lot of the training examples have identical sequences, but with different PSSM and entropy. The coordinates for these duplicates are also not identical, even under translation/rotation, though the one example I actually plotted after matching the coordinates under translation and rotation, had coordinates that we close to identical, but deviated in a few places.
See the attached example (it was too long to paste in here)
identical_sequences.zip
Other training examples were repeated 6 times in the data.
Is there any good reason for this or is this an error?