VERSION 4
Boolean satisfiability, or SAT, is a classic computational problem: given a boolean formula, determine whether there exists an assignment to the variables that satisfies the formula. SAT is NP-complete; intuitively, this means we can efficiently translate any bounded search problem into an equivalent SAT problem.
It turns out these reductions to SAT are often useful for automated reasoning applications. Circuit equivalence checking can be reduced to SAT, which hardware companies use to verify correctness of circuits. Satisfiability modulo theories (SMT) solvers, which are common tools for program verification, generate and solve many SAT problems during execution. Reductions to SAT have even been used in pure mathematics, performing combinatorial casework to solve previously unsolved conjectures. Bernardo Subercaseaux recently wrote about SAT-for-mathematics on the CSD Blog, go check it out!
For many automated reasoning applications, reductions to SAT are correctness-critical. This means a mistake in the reduction can lead a user to believe incorrect claims about the original problem domain. As an example, suppose a hardware company wants to use SAT for circuit equivalence. The hardware company design a reduction to SAT. Perhaps they even write a proof on paper of the following theorem: for two circuits , the circuits are equivalent iff the reduction is unsatisfiable:
They implement as a computer program, taking two circuits as input and outputting a boolean formula in the standard format taken by SAT solvers. But, alas! their implementation has a small bug which is missed by their test suite. Unaware, the company use this buggy implementation to check two of their circuits equivalent, a SAT solver says the boolean formula is unsatisfiable, and everyone is happy. Except, oh no! the circuits are actually not equivalent, and by the time the company notices, they've wasted $1B preparing to mass-produce this buggy chip!
The story above is contrived and fictional, but the risks around SAT reductions are genuine and significant. I'll give two intuitive justifications here.
SAT solvers, as a black box for solving SAT problems, are remarkably trustworthy. Modern solvers produce certificates of unsatisfiability, and those certificates are checked by formally verified software, essentially ruling out any possibility that SAT solvers give wrong results. Yet this remarkable trustworthiness can be completely undermined before a SAT solver even comes into the picture! How do we improve this situation? How do we expand the circle of trust beyond SAT solvers, to also encompass SAT reductions?
Today we will be discussing some recent work by myself and collaborators
to make it easy to formally verify SAT reductions.
We are developing a library in Lean 4 called trestle,
which provides programming and proving utilities for SAT reductions.
We will focus on just one component of the library:
its CNF encoding utilities.
"CNF encoding" is a big part of the SAT reduction recipe,
and I think the trestle approach to this step is quite cool.
We will learn what CNF encodings are,
discuss what "correctness" means for CNF encodings,
and rediscover some key ideas for verification that we use in trestle.
Let's review some key definitions from the SAT world. A boolean formula is an expression containing the usual boolean operators, and variables from some set . Given a truth assignment mapping the unknowns to truth values , we can evaluate the boolean formula to a truth value .
Conjunctive normal form (CNF) describes a restricted class of boolean formulas. Literals consist of either a variable or the negation of a variable. Clauses are a disjunction of literals, and a formula is a conjunction of clauses:
CNF formulas are 'nice' for a variety of reasons, most of which are not relevant here (curious readers should look up: Tseitin transformation, resolution proofs, conflict-driven clause learning). The one fact to remember is that a formula is essentially a list of clauses, and a clause is essentially a list of literals.
The variable set in our definition is rather arbitrary.
Elements of merely serve as labels for the unknowns in a formula.
So, in practice, SAT solvers take formulas in the DIMACS format,
which specifies that variables are labelled by positive natural numbers, .
There is some tension here:
when humans describe a CNF, we generally use descriptive labels,
and only in the implementation do we map those descriptive labels to concrete numbers.
This tension will motivate some design decisions about how trestle handles variables.
Let's look at how to encode a simple constraint. The at-most-one constraint is a classic and common constraint in practical applications. It says: given a list of literals, require that at most one literal from the list evaluates to true. We will write as shorthand for an at-most-one constraint with a list of literals . AMO is useful when we want to represent choosing a single element from a set. For example, might represent choosing a number from .
How can we express as a CNF formula? The simplest encoding is called the pairwise encoding:
For all pairs of distinct literals in the list, both cannot be true. This CNF expresses the AMO constraint, because . Also, note how many clauses this encoding generates; there are pairs, each of which generates a clause. Can we do better?
Imagine we split the list of literals into two blocks: . Let be a fresh boolean variable (i.e. it is not used anywhere else in our formula), and consider the constraints . If is true, then all the literals in must be false, and at most one of the literals in is true. If instead is set to false, then all the literals in are false, and at most one in is true. This is semantically equivalent to : all satisfying assignments to the split constraints also satisfy the original constraint, and all assignments satisfying the original constraint can be extended with a truth value for .
So, let's apply this splitting lemma many times to reach this equivalent constraint:
There are remaining AMO constraints,
but they each have literals.
So, if we pairwise encode these constant length AMO constraints,
they will each generate clauses,
giving a total of clauses --
we improved (asymptotically) on the pairwise encoding,
at the cost of adding fresh variables.
I am not aware of a standard name for this encoding,
but trestle refers to it as the cut-4 encoding (since we cut it into pairwise AMOs with 4 literals).
The pairwise and cut-4 encodings are semantically equivalent, so why have both? Experimentally, we find that SAT solvers perform differently on these encodings. Sometimes the pairwise encoding is best, sometimes cut-4, sometimes other AMO encodings (see TODO Sheng et al re-encoding paper, and Joseph's thesis?). In general, auxiliary variables can make a CNF encoding more compact (fewer clauses), but increasing the number of variables has costs during SAT solving. This tradeoff is an area of active research. Modern SAT solvers will often eliminate variables by adding clauses, a technique called bounded variable elimination (cite papers). The point is that encodings must carefully manage where and how auxiliary variables are used, because this will significantly affect how SAT solvers perform on the resultant formulas.
Ultimately, we want to prove our CNF encodings are correct. But in order to do that, we need to define what exactly we mean by "correct." Let's use the at-most-one constraints above as a motivating example. We have some abstract, high level constraint , and we are translating this into a CNF formula. We'll call the abstract, high level constraint a specification for the CNF formula.
Specifications can be formalized as predicates over boolean assignments. A specification describes which assignments should satisfy the CNF formula. We'll write specifications as for boolean assignments . For example, a specification for the constraint might look like
This is a direct translation of the informal meaning of an at-most-one constraint: it accepts assignments for which the number of satisfied literals is at most one.
Notice that any auxiliary variables, while potentially present in the implementation, do not appear in the specification. This is, in fact, what distinguishes "auxiliary" variables from "problem" variables. Problem variables are those in the specification, and auxiliary variables are everything else.
Specifications can be thought of as descriptions (generated from an implementation to describe what it does), or requirements (imposed on an implementation to constraint what it does). We will use both mindsets in the following sections.
PySAT library provides encoding utilities in Python,
and we can see what a program looks like in PySAT# Initialize a pool of variable ids
vpool = IDPool(start_from=1)
# x(i,j,n) returns the variable ID for x_{i,j,n}
x = lambda i, j, n: vpool.id(f'x_{i}_{j}_{n}')
cnf = CNF() # create a new formula
# Every cell has exactly one number assigned
for i in range(9):
for j in range(9):
cell_vars = [ x(i,j,n) for n in range(1,10) ]
# at least one of cell_vars is true
cnf.append([cell_vars])
# at most one of cell_vars is true
cnf.append(CardEnc.atmost(cell_vars, bound=1, vpool=vpool))
# Row, column, and block constraints
...
Recall that a CNF formula is a conjunction of many clauses. This structure makes it easy to conjoin () two CNF formulas together, since we can simply append their clause lists. This is how we write the sudoku encoding above: encode many simple constraints into CNFs, and then append each of the CNFs into the big formula. Using the language of specifications:
cnf.extend([[ x(1,2,3), x(4,4,4)]]) corresponds to .
In the code, x is a function returning variable IDs,
whereas in the spec it is some abstract set of labels.also, when we run some sub-component, we use compositional reasoning to justify correctness
Speaking of auxiliaries, there’s some additional state with vpool
We want CNF encodings which
Trestle's encoding utilities achieve all of these goals. The core component of this library, the encoding monad, began as my personal utility for writing composable SAT encodings (goals 1 & 3). Eventually, I realized it could be useful for verification as well (goal 2). Useful programs motivate verification, and verification techniques (particularly compositional reasoning) can motivate better programs.
PySAT program in prior section is a stateful, imperative program
Back in 2022 was working on SAT encodings, but also was exclusively programming in Lean, a pure functional language. Wanted to be able to write PySAT style code, but can't do all this fancy imperative mutation. Instead, the standard approach is to pass around state explicitly.
Let's categorize the state in the program above:
Then, an encoding program is a function which takes an initial state and returns a modified state.
Taking an initial state as a parameter is vital for composing encodings with auxiliary variables. suppose we use the linear AMO as part of a larger encoding; it needs to allocate fresh variables, so it needs to know what has already been allocated.
We implement this in Lean using a state monad, details not important except to say that Lean makes it easy to write programs passing around state
compositional reasoning about encodings requires extremely careful management of auxiliary variables
key idea #1: keep track of which variables are "problem" variables, and only add clauses involving those problem variables
V to the encoding monad, so now it is EncCNF Vkey idea #2: auxiliary variables are scoped problem variables
what is the spec for withTemps
Here is a spec for whatever encoding happens within the scope of the auxiliaries.
Re-invented the key concepts behind trestle's CNF encoding utilities. To re-iterate the design goals, we wanted: