You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Apologies if I am missing something obvious here, but I was doing a bunch of testing with chardetng and noticed that the second return value from EncodingDetector.guess_assess() (the boolean indicating whether the guess was any good) never seems to be false. Looking at the source, it actually seems like it’s not possible:
The max variable that tracks the best score starts at 0:
I think this should probably be max > 0, which would mean you’d get a false if there were no better guesses than the default encoding for the TLD (or if the only good guess is ISO-8859-8? That’s a bit odd…). I’m not familiar enough with the internals here to know for sure what the right thing would be.
It looks like max used to start with a negative value, so maybe that’s how this issue came to be? (That said, it changed in 0d26e7e, which was before guess_assess() existed. 🤷)
Again, apologies if I’ve missed something obvious here and this isn’t a real issue — I don’t have a lot of experience with Rust. But this does seem to match up with what I’ve seen so far throwing lots of different data at chardetng and never seeing a false result.
The text was updated successfully, but these errors were encountered:
Apologies if I am missing something obvious here, but I was doing a bunch of testing with chardetng and noticed that the second return value from
EncodingDetector.guess_assess()
(the boolean indicating whether the guess was any good) never seems to befalse
. Looking at the source, it actually seems like it’s not possible:The
max
variable that tracks the best score starts at 0:chardetng/src/lib.rs
Line 3003 in 143dadd
It only ever gets updated with scores that are greater than the current value (so: always >= 0):
chardetng/src/lib.rs
Lines 3043 to 3048 in 143dadd
The final boolean is just whether
max
is >= 0, which it always is, per the above points:chardetng/src/lib.rs
Line 3062 in 143dadd
I think this should probably be
max > 0
, which would mean you’d get a false if there were no better guesses than the default encoding for the TLD (or if the only good guess is ISO-8859-8? That’s a bit odd…). I’m not familiar enough with the internals here to know for sure what the right thing would be.It looks like
max
used to start with a negative value, so maybe that’s how this issue came to be? (That said, it changed in 0d26e7e, which was beforeguess_assess()
existed. 🤷)chardetng/src/lib.rs
Line 1734 in f15d0f8
Again, apologies if I’ve missed something obvious here and this isn’t a real issue — I don’t have a lot of experience with Rust. But this does seem to match up with what I’ve seen so far throwing lots of different data at chardetng and never seeing a
false
result.The text was updated successfully, but these errors were encountered: