Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

guess_assess() can’t return false for second return value #13

Open
Mr0grog opened this issue Nov 3, 2023 · 0 comments
Open

guess_assess() can’t return false for second return value #13

Mr0grog opened this issue Nov 3, 2023 · 0 comments

Comments

@Mr0grog
Copy link

Mr0grog commented Nov 3, 2023

Apologies if I am missing something obvious here, but I was doing a bunch of testing with chardetng and noticed that the second return value from EncodingDetector.guess_assess() (the boolean indicating whether the guess was any good) never seems to be false. Looking at the source, it actually seems like it’s not possible:

  • The max variable that tracks the best score starts at 0:

    let mut max = 0i64;

  • It only ever gets updated with scores that are greater than the current value (so: always >= 0):

    chardetng/src/lib.rs

    Lines 3043 to 3048 in 143dadd

    for (i, candidate) in self.candidates.iter().enumerate().skip(Self::FIRST_NORMAL) {
    if let Some(score) = candidate.score(i, tld_type, expectation_is_valid) {
    if score > max {
    max = score;
    encoding = candidate.encoding();
    }

  • The final boolean is just whether max is >= 0, which it always is, per the above points:

    (encoding, max >= 0)

    I think this should probably be max > 0, which would mean you’d get a false if there were no better guesses than the default encoding for the TLD (or if the only good guess is ISO-8859-8? That’s a bit odd…). I’m not familiar enough with the internals here to know for sure what the right thing would be.

It looks like max used to start with a negative value, so maybe that’s how this issue came to be? (That said, it changed in 0d26e7e, which was before guess_assess() existed. 🤷)

let mut max = i64::min_value() + 1;

Again, apologies if I’ve missed something obvious here and this isn’t a real issue — I don’t have a lot of experience with Rust. But this does seem to match up with what I’ve seen so far throwing lots of different data at chardetng and never seeing a false result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant