You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am comparing the behavior of ksw_extd() and ksw_extd2_sse(). One difference I noticed is the case when tlen > qlen + w, i.e. when target is long enough for some rows of the matrix to not contain a band while when doing a "full" alignment (KSW_EZ_EXTZ_ONLY not specified). In that case backtracking starts at position (tlen-1, qlen-1) which is outside of band.
For ksw_extd2_sse() in that case this condition is true:
so i (target index) keeps being reduced until tlen == qlen + w, at which point we are within the band and a good (although likely not optimal) alignment is found.
For ksw_extd() in this case this condition is true:
so in this case j (query index) gets reduced and we are effectively moving away from the band until we reach qlen == 0, after which we move upwards, thus only generating a lot of insertions followed by a lot of deletions.
I would say that in this case ksw_backtrack() should be edited so that for is_rot == false instead of
if (j < off[i]) force_state = 2;
if (off_end && j > off_end[i]) force_state = 1;
there should be
if (j < off[i]) force_state = 1;
if (off_end && j > off_end[i]) force_state = 2;
In this case we would properly move upwards (reducing target index) until reaching the band, like for ksw_extd2_sse().
The same argument holds for cases when qlen > tlen + w, i.e. starting element is right of the band. In that case ksw_extd2_sse() performs insertions (reduces query index) until it reaches the band, whereas ksw_extd() decreases target index index until reaching tlen == 0, after which it just decreases query index, again effectively doing a lot of deletions follow by insertions, completely avoiding the band. The change I proposed above would also solve this issue.
The case when qlen > tlen + w is actually even worse for ksw_extd() as it does not pass off_end:
accesses a wrong element (never set or even out of bound) of p is accessed.
To fix this issue one would have to generate off_end and pass it, which is a relatively easy task.
To test this I created a reproducer. It uses w==3 and qlen==6 and both query and target have all the same basepairs.
For tlen==9 CIGAR has 3 deletions and 6 matches, as expected. For tlen==10 it generates 10 deletions and 6 insertions, in accordance with what I explained above, but significantly worse than 4 deletions and 6 matches it could generate.
For ksw_extd2_sse() I did not manage to get the expected result because this check
leads to a zdrop even when zdrop is disabled (zdrop < 0) (not sure if this is by design, but I am planning to file a separate issue). If this check is deleted the generated CIGAR has 4 deletions and 6 matches, as expected.
To run the reproducer copy the code from below into main.cpp and copy ksw2.h, ksw2_extd.c and ksw2_extd2_sse.c to the same folder. Then build with g++ --std=c++17 main.cpp ksw2_extd.c ksw2_extd2_sse.c -o ksw2_repro and execute with ./ksw2_repro
I am comparing the behavior of
ksw_extd()
andksw_extd2_sse()
. One difference I noticed is the case whentlen > qlen + w
, i.e. when target is long enough for some rows of the matrix to not contain a band while when doing a "full" alignment (KSW_EZ_EXTZ_ONLY
not specified). In that case backtracking starts at position(tlen-1, qlen-1)
which is outside of band.For
ksw_extd2_sse()
in that case this condition is true:ksw2/ksw2.h
Line 129 in 4e0a1cc
As
force_state == 1
this condition is true:ksw2/ksw2.h
Line 141 in 4e0a1cc
so
i
(target index) keeps being reduced untiltlen == qlen + w
, at which point we are within the band and a good (although likely not optimal) alignment is found.For
ksw_extd()
in this case this condition is true:ksw2/ksw2.h
Line 132 in 4e0a1cc
followed by this one:
ksw2/ksw2.h
Line 143 in 4e0a1cc
so in this case
j
(query index) gets reduced and we are effectively moving away from the band until we reachqlen == 0
, after which we move upwards, thus only generating a lot of insertions followed by a lot of deletions.I would say that in this case
ksw_backtrack()
should be edited so that foris_rot == false
instead ofthere should be
In this case we would properly move upwards (reducing target index) until reaching the band, like for
ksw_extd2_sse()
.The same argument holds for cases when
qlen > tlen + w
, i.e. starting element is right of the band. In that caseksw_extd2_sse()
performs insertions (reduces query index) until it reaches the band, whereasksw_extd()
decreases target index index until reachingtlen == 0
, after which it just decreases query index, again effectively doing a lot of deletions follow by insertions, completely avoiding the band. The change I proposed above would also solve this issue.The case when
qlen > tlen + w
is actually even worse forksw_extd()
as it does not passoff_end
:ksw2/ksw2_extd.c
Line 170 in 4e0a1cc
This means that this condition
ksw2/ksw2.h
Line 133 in 4e0a1cc
is never true so on this line
ksw2/ksw2.h
Line 134 in 4e0a1cc
accesses a wrong element (never set or even out of bound) of
p
is accessed.To fix this issue one would have to generate
off_end
and pass it, which is a relatively easy task.To test this I created a reproducer. It uses
w==3
and qlen==6
and both query and target have all the same basepairs.For
tlen==9
CIGAR has 3 deletions and 6 matches, as expected. Fortlen==10
it generates 10 deletions and 6 insertions, in accordance with what I explained above, but significantly worse than 4 deletions and 6 matches it could generate.For
ksw_extd2_sse()
I did not manage to get the expected result because this checkksw2/ksw2_extd2_sse.c
Line 134 in 4e0a1cc
leads to a zdrop even when zdrop is disabled (
zdrop < 0
) (not sure if this is by design, but I am planning to file a separate issue). If this check is deleted the generated CIGAR has 4 deletions and 6 matches, as expected.To run the reproducer copy the code from below into
main.cpp
and copyksw2.h
,ksw2_extd.c
andksw2_extd2_sse.c
to the same folder. Then build withg++ --std=c++17 main.cpp ksw2_extd.c ksw2_extd2_sse.c -o ksw2_repro
and execute with./ksw2_repro
The text was updated successfully, but these errors were encountered: