Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bitwuzla: Fix handling of quoted symbol names while parsing SMTLIB formulas #417

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

daniel-raffler
Copy link
Contributor

This addresses an issue in the Bitwuzla backend where quoted SMTLIB symbols like |var| were added to the variable cache without removing the quotes first. As a result there could be two entries for the same symbol, once with quotes and once without the quotes, which may then cause the parser to fail.

@@ -137,12 +137,16 @@ public Term parseImpl(String formulaStr) throws IllegalArgumentException {
// Process the symbols from the parser
Map_TermTerm subst = new Map_TermTerm();
for (Term term : declared) {
if (cache.containsRow(term.symbol())) {
String symbol = term.symbol();
if (symbol.startsWith("|") && symbol.endsWith("|")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it dangerous to perform this operation manually with inline string manipulation. Isn't there more that needs to be done? For example unescaping or further symbols, such as nested |? And obviously that should be a general utility function somewhere where it is properly maintainable, with unit tests, links to the standard, etc. instead of hidden deeply inside some other code where nobody will be able to find and update it if one for example needs to update the escaping.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well its OK if its thoroughly documented and tested in my opinion. The handling is unique for this solver and best-effort based and can be improved at any point.

But we definitely need tests for this and similar cases. (Multiple |, other special characters in names etc.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've refactored the code and unescaping for SMTLIB variables is now handled directly in the variable cache itself. This should avoid similar bugs where we forget to unescape symbols when accessing the cache.

Isn't there more that needs to be done? For example unescaping or further symbols, such as nested |?

This is some very low level code and we will only encounter variable names that are valid in SMTLIB. This means there can't be any nested | and the escape/unescape functions from FormulaManager should also be unaffected.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this issue only appears when parsing SMTLIB, why do we need to handle quoting in the complete cache and not only in the parser?

Copy link
Contributor Author

@daniel-raffler daniel-raffler Dec 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this issue only appears when parsing SMTLIB, why do we need to handle quoting in the complete cache and not only in the parser?

Hello Karlheinz,
I've refactored the code once more and restored the formula cache to its original version. Removing SMTLIB quotes is now handled in a helper function named BitwuzlaCreator.removeQuotes, which needs to be called from BitwuzlaFormulaManager.parseImpl and BitwuzlaCreator.visit. I missed BitwuzlaCreator.visit in my original patch, but noted that several other crashes went away when I removed the quotes for the entire cache. The new version of the code should now cover both functions without requiring any changes to the cache itself.

Copy link
Member

@PhilippWendler PhilippWendler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without knowing anything about this code, browsing through the code raises the following questions:

  • Why does this need to be done at all? Where would two different entries in the cache for the same symbol come from? Wouldn't this mean that some other part uses an illegal name of a variable and should be fixed instead of tapering over this by blindly removing all quotes?
  • If it needs to be done, this fact and the reason for it should be documented in the comment for the cache, such that future developers also have a change to know this.
  • The PR indicates that already quite a few places exist where removeQuotes needs to be called. This means that likely in the future also more calls need to be added, which is error prone because it can easily be forgotten. So it seems the cache should be better encapsulated with a single method for accessing it that also handles removeQuotes.
  • Are there tests for this?

@daniel-raffler
Copy link
Contributor Author

@PhilippWendler
Can we go ahead with this? If the current solution is too complicated, there might be another way of fixing this directly in the SWIG wrapper. However, this should then be included in #427 before it is merged.

@PhilippWendler
Copy link
Member

I do not know about Bitwuzla nor about any other ongoing work, and so I do not know what is the best solution here. I merely looked at the code and raised a few questions that came to my mind. Actual review needs to be done by the JavaSMT maintainers.

Though I have to note that my questions from #417 (review) were not addressed yet.

@daniel-raffler
Copy link
Contributor Author

@baierd: Could you have another look at this?

We can either use this solution or remove the quotes directly in the SWIG script:

%extend Term {
  std::string symbol() {
    std::string r = $self->symbol().value();
    if (r.front() == '|' && r.back() == '|') {
      r = r.substr(1, r.size() - 2);
    }
    return r;
  }
}

This will make sure that Term.symbol() always returns the variable name without the quotes. I don't think there are any cases where we actually want the quotes, but this should probably be tested by running some CPAchecker benchmarks first.

If we want to patch the SWIG script, this change should be included in #427 before it is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants