-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bitwuzla: Fix handling of quoted symbol names while parsing SMTLIB formulas #417
base: master
Are you sure you want to change the base?
Conversation
@@ -137,12 +137,16 @@ public Term parseImpl(String formulaStr) throws IllegalArgumentException { | |||
// Process the symbols from the parser | |||
Map_TermTerm subst = new Map_TermTerm(); | |||
for (Term term : declared) { | |||
if (cache.containsRow(term.symbol())) { | |||
String symbol = term.symbol(); | |||
if (symbol.startsWith("|") && symbol.endsWith("|")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it dangerous to perform this operation manually with inline string manipulation. Isn't there more that needs to be done? For example unescaping or further symbols, such as nested |
? And obviously that should be a general utility function somewhere where it is properly maintainable, with unit tests, links to the standard, etc. instead of hidden deeply inside some other code where nobody will be able to find and update it if one for example needs to update the escaping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well its OK if its thoroughly documented and tested in my opinion. The handling is unique for this solver and best-effort based and can be improved at any point.
But we definitely need tests for this and similar cases. (Multiple |
, other special characters in names etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've refactored the code and unescaping for SMTLIB variables is now handled directly in the variable cache itself. This should avoid similar bugs where we forget to unescape symbols when accessing the cache.
Isn't there more that needs to be done? For example unescaping or further symbols, such as nested |?
This is some very low level code and we will only encounter variable names that are valid in SMTLIB. This means there can't be any nested |
and the escape
/unescape
functions from FormulaManager
should also be unaffected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this issue only appears when parsing SMTLIB, why do we need to handle quoting in the complete cache and not only in the parser?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this issue only appears when parsing SMTLIB, why do we need to handle quoting in the complete cache and not only in the parser?
Hello Karlheinz,
I've refactored the code once more and restored the formula cache to its original version. Removing SMTLIB quotes is now handled in a helper function named BitwuzlaCreator.removeQuotes
, which needs to be called from BitwuzlaFormulaManager.parseImpl
and BitwuzlaCreator.visit
. I missed BitwuzlaCreator.visit
in my original patch, but noted that several other crashes went away when I removed the quotes for the entire cache. The new version of the code should now cover both functions without requiring any changes to the cache itself.
…var` directly in the variable cache
…|` and `var` directly in the variable cache" This reverts commit ce2688e.
ad7c574
to
4f69f14
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without knowing anything about this code, browsing through the code raises the following questions:
- Why does this need to be done at all? Where would two different entries in the cache for the same symbol come from? Wouldn't this mean that some other part uses an illegal name of a variable and should be fixed instead of tapering over this by blindly removing all quotes?
- If it needs to be done, this fact and the reason for it should be documented in the comment for the cache, such that future developers also have a change to know this.
- The PR indicates that already quite a few places exist where
removeQuotes
needs to be called. This means that likely in the future also more calls need to be added, which is error prone because it can easily be forgotten. So it seems the cache should be better encapsulated with a single method for accessing it that also handlesremoveQuotes
. - Are there tests for this?
@PhilippWendler |
I do not know about Bitwuzla nor about any other ongoing work, and so I do not know what is the best solution here. I merely looked at the code and raised a few questions that came to my mind. Actual review needs to be done by the JavaSMT maintainers. Though I have to note that my questions from #417 (review) were not addressed yet. |
@baierd: Could you have another look at this? We can either use this solution or remove the quotes directly in the SWIG script:
This will make sure that If we want to patch the SWIG script, this change should be included in #427 before it is merged. |
This addresses an issue in the Bitwuzla backend where quoted SMTLIB symbols like
|var|
were added to the variable cache without removing the quotes first. As a result there could be two entries for the same symbol, once with quotes and once without the quotes, which may then cause the parser to fail.