Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blacklist more SMTLIB keywords as variable/uf names and automatically escape all variable names that would be illegal #424

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

daniel-raffler
Copy link
Contributor

Hello,
this will update the list of forbidden variable/uf names to include all SMTLIB commands, such as check-sat or assert, and all predefined function names, like mod or fp.add. While such symbol names are fine when using the API, they cause issues when printing the formulas as most solvers will fail to add the necessary quotes to the symbols. This will then causing issues in FormulaManager.translateFrom() as the generated output is invalid and can't to be read back by the target solver to translate the formula.

By forbidding such symbol names entirely we circumvent the issues. User should always use FormulaManager.isValidName() to check if their name is on the list, and substitute it with FormulaManager.escape() if necessary.

// + any symbol starting with "str."
"str.concat",
// + any symbol starting with "re."
"re.opt");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String and Regex operations are matched via "startsWith" and not via String equality. Do we need these entries here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The keywords in RESERVED are used by VariableNamesTest and I've included these two entries as "examples" for the tests. Otherwise, they could be removed. Come to think of it, VariableNamesTest is already kind of slow and it might even need to test all the entries in RESERVED. Maybe one or two examples for each "kind" of reserved word would be sufficient? Then we could just include those test cases directly in VariableNamesTest and remove these two dummy keywords from RESERVED.

Copy link
Member

@kfriedberger kfriedberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change disallows quite a few names for symbols.
Are we sure that we want to go in that direction?

What is the motivation for this PR? Parsing and writing SMTLIB is complex, and solvers need to escape all unexpeted names anyway. This PR seems to forbid such cases upfront. This might be valid, but it might be unfriendly for existing callers.

@daniel-raffler
Copy link
Contributor Author

This change disallows quite a few names for symbols. Are we sure that we want to go in that direction?

What is the motivation for this PR? Parsing and writing SMTLIB is complex, and solvers need to escape all unexpeted names anyway. This PR seems to forbid such cases upfront. This might be valid, but it might be unfriendly for existing callers.

This was motivated by an issue in CPAchecker (Bitwuzla issues when using a distinct solver for interpolation) were we try to use Bitwuzla as the main solver for CPAchecker. Since interpolation is not supported in Bitwuzla, we need to translate formulas from/to MathSAT whenever CPAchecker wants to calculate interpolants. For this we need to first dump the formulas to SMTLIB and then read them back in with the "target" solver. However, this will often fail with a parse error:

Exception in thread "main" java.lang.IllegalArgumentException: <string>:48:14: expected symbol after 'declare-fun'
	at org.sosy_lab.java_smt.solvers.bitwuzla.api.BitwuzlaNativeJNI.Parser_parse__SWIG_0(Native Method)
	at org.sosy_lab.java_smt.solvers.bitwuzla.api.Parser.parse(Unknown Source)
	at org.sosy_lab.java_smt.solvers.bitwuzla.BitwuzlaFormulaManager.parse(BitwuzlaFormulaManager.java:131)
	at org.sosy_lab.cpachecker.util.predicates.interpolation.SeparateInterpolatingProverEnvironment.convertToMain(SeparateInterpolatingProverEnvironment.java:116)
	at org.sosy_lab.cpachecker.util.predicates.interpolation.SeparateInterpolatingProverEnvironment.getInterpolant(SeparateInterpolatingProverEnvironment.java:92)
	at org.sosy_lab.cpachecker.util.predicates.smt.InterpolatingProverEnvironmentView.getInterpolant(InterpolatingProverEnvironmentView.java:32)
	at org.sosy_lab.cpachecker.util.predicates.interpolation.strategy.ITPStrategy.getInterpolantFromSublist(ITPStrategy.java:166)
	at org.sosy_lab.cpachecker.util.predicates.interpolation.strategy.SequentialInterpolation.getFwdInterpolants(SequentialInterpolation.java:157)
	at org.sosy_lab.cpachecker.util.predicates.interpolation.strategy.SequentialInterpolation.getInterpolants(SequentialInterpolation.java:110)
	at org.sosy_lab.cpachecker.util.predicates.interpolation.InterpolationManager.getInterpolants(InterpolationManager.java:783)
	at org.sosy_lab.cpachecker.util.predicates.interpolation.InterpolationManager$Interpolator.buildCounterexampleTrace(InterpolationManager.java:996)
	at org.sosy_lab.cpachecker.util.predicates.interpolation.InterpolationManager.buildCounterexampleTrace0(InterpolationManager.java:449)
	at org.sosy_lab.cpachecker.util.predicates.interpolation.InterpolationManager.lambda$buildCounterexampleTrace$0(InterpolationManager.java:328)
	at org.sosy_lab.cpachecker.util.predicates.interpolation.InterpolationManager.callWithTimelimit(InterpolationManager.java:337)
	at org.sosy_lab.cpachecker.util.predicates.interpolation.InterpolationManager.buildCounterexampleTrace(InterpolationManager.java:327)
	at org.sosy_lab.cpachecker.cpa.predicate.PredicateCPARefiner.performInterpolatingRefinement(PredicateCPARefiner.java:400)
	at org.sosy_lab.cpachecker.cpa.predicate.PredicateCPARefiner.checkCounterexample(PredicateCPARefiner.java:390)
	at org.sosy_lab.cpachecker.cpa.predicate.PredicateCPARefiner.performRefinementForPath(PredicateCPARefiner.java:281)
	at org.sosy_lab.cpachecker.cpa.predicate.PredicateStaticRefiner.performRefinementForPath(PredicateStaticRefiner.java:183)
	at org.sosy_lab.cpachecker.cpa.arg.AbstractARGBasedRefiner.performRefinementForPath(AbstractARGBasedRefiner.java:158)
	at org.sosy_lab.cpachecker.cpa.arg.AbstractARGBasedRefiner.performRefinement(AbstractARGBasedRefiner.java:104)
	at org.sosy_lab.cpachecker.core.algorithm.CEGARAlgorithm.refine(CEGARAlgorithm.java:310)
	at org.sosy_lab.cpachecker.core.algorithm.CEGARAlgorithm.run(CEGARAlgorithm.java:256)
	at org.sosy_lab.cpachecker.core.algorithm.ParallelAlgorithm.runParallelAnalysis(ParallelAlgorithm.java:390)
	at org.sosy_lab.cpachecker.core.algorithm.ParallelAlgorithm.lambda$createParallelAnalysis$4(ParallelAlgorithm.java:317)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1583)

The issue is that many of the programs use the C function exit() and CPAchecker will model this with a UF that has the same name. When the formula is then printed we get output that looks like this:

(declare-fun exit ((_ BitVec 32) ) (_ BitVec 8))

Since exit is already a SMTLIB command, this is not valid input and that's what leads to the parse error. From what I can see this mostly seems to happen for exit and true. The latter is already a reserved keyword in AbstractFormulaManager and the easiest solution would be to just add exit to the list. However, I'm somewhat concerned that we will eventually run into the same problem for other keywords, and that's why I've tried to include as many of them as possible.

As @PhilippWendler pointed out there is already an old JavaSMT issues here, but it's never been entirely resolved. The proper fix would be to escape such names transparently, but this would require us to write our own parser and printer for SMTLIB formulas, which still seem out of reach. Blacklisting the reserved symbols for now at least makes sure that the user gets an error immediately. Once the parser/printer is ready we can then allow them again.

Note that I added some tests in supported_symbols_test to see what symbol names are actually allowed by the solvers. You can run ParserSymbolsTest.dumpTest() to see the results. If the test fails for the assertion, it means that the (quoted) symbol can be parsed, but the solver fails to add quotes when printing it. If there is a different exception, it means that the solver won't parse the symbol at all, even when properly quoted.

From the tests it seems that most solvers don't quote reserved words when printing formulas and then fail to read back their own output. It's even worse for Bitwuzla, which fails to parse any reserved word, even when quotes are used. This means that just fixing formula printing won't solve the issue either.

@daniel-raffler
Copy link
Contributor Author

I've added transparent escaping/unescapig with a43cb6d. This means that all variables are now allowed, and the escaping is done automatically when needed.

We still have to figure out what to do about all the tests that expect makeVariable to fail for certain illegal names. And there are couple of parsing bugs in the solvers that should be reported.

@daniel-raffler daniel-raffler changed the title Blacklist more SMTLIB keywords as variable/uf names Blacklist more SMTLIB keywords as variable/uf names and automatically escape all variable names that would be illegal Jan 8, 2025
@daniel-raffler daniel-raffler force-pushed the blacklist_more_smtlib_keywords branch from 7573c1f to b1fa368 Compare January 9, 2025 01:23
@daniel-raffler daniel-raffler force-pushed the blacklist_more_smtlib_keywords branch from cdcb730 to bc047e0 Compare January 9, 2025 14:50
@baierd baierd self-requested a review February 4, 2025 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants