<< All Posts Feed

Auxiliary Variables in Verified SAT Encodings

Posted on 2025 Nov 30


VERSION 4

Boolean satisfiability, or SAT, is a classic computational problem: given a boolean formula, determine whether there exists an assignment to the variables that satisfies the formula. SAT is NP-complete; intuitively, this means we can efficiently translate any bounded search problem into an equivalent SAT problem.

It turns out these reductions to SAT are often useful for automated reasoning applications. Circuit equivalence checking can be reduced to SAT, which hardware companies use to verify correctness of circuits. Satisfiability modulo theories (SMT) solvers, which are common tools for program verification, generate and solve many SAT problems during execution. Reductions to SAT have even been used in pure mathematics, performing combinatorial casework to solve previously unsolved conjectures. Bernardo Subercaseaux recently wrote about SAT-for-mathematics on the CSD Blog, go check it out!

For many automated reasoning applications, reductions to SAT are correctness-critical. This means a mistake in the reduction can lead a user to believe incorrect claims about the original problem domain. As an example, suppose a hardware company wants to use SAT for circuit equivalence. The hardware company design a reduction to SAT. Perhaps they even write a proof on paper of the following theorem: for two circuits C1,C2C_1, C_2, the circuits are equivalent iff the reduction R(C1,C2)R(C_1, C_2) is unsatisfiable:

C1C2    ¬τ,τR(C1,C2)C_1 \equiv C_2 \quad \iff \quad \lnot \exists \tau, \tau \vDash R(C_1, C_2)

They implement RR as a computer program, taking two circuits as input and outputting a boolean formula in the standard format taken by SAT solvers. But, alas! their implementation has a small bug which is missed by their test suite. Unaware, the company use this buggy implementation to check two of their circuits equivalent, a SAT solver says the boolean formula is unsatisfiable, and everyone is happy. Except, oh no! the circuits are actually not equivalent, and by the time the company notices, they've wasted $1B preparing to mass-produce this buggy chip!

The story above is contrived and fictional, but the risks around SAT reductions are genuine and significant. I'll give two intuitive justifications here.

  1. SAT reductions are sensitive to very small changes. We can construct boolean formulas that are 100 megabytes in size, where tweaking a single byte can determine whether the formula is SAT or UNSAT. So, any rare edge case or minor typo in a reduction implementation can drastically affect the result.
  2. UNSAT results are generally not interpretable or explainable. We can be confident that a formula is UNSAT (see next paragraph), but have no or very little insight into why it is UNSAT. Maybe the reduction is implemented correctly, and an UNSAT result tells us something meaningful about the original problem domain. Or, maybe the formula is UNSAT because the reduction implementation has a bug. SAT solvers cannot give us the information to distinguish these outcomes.

SAT solvers, as a black box for solving SAT problems, are remarkably trustworthy. Modern solvers produce certificates of unsatisfiability, and those certificates are checked by formally verified software, essentially ruling out any possibility that SAT solvers give wrong results. Yet this remarkable trustworthiness can be completely undermined before a SAT solver even comes into the picture! How do we improve this situation? How do we expand the circle of trust beyond SAT solvers, to also encompass SAT reductions?

Today we will be discussing some recent work by myself and collaborators to make it easy to formally verify SAT reductions. We are developing a library in Lean 4 called trestle, which provides programming and proving utilities for SAT reductions. We will focus on just one component of the library: its CNF encoding utilities.

"CNF encoding" is a big part of the SAT reduction recipe, and I think the trestle approach to this step is quite cool. We will learn what CNF encodings are, discuss what "correctness" means for CNF encodings, and rediscover some key ideas for verification that we use in trestle.

SAT Definitions

Let's review some key definitions from the SAT world. A boolean formula FF is an expression containing the usual boolean operators, and variables from some set VV. Given a truth assignment τ\tau mapping the unknowns VV to truth values {,}\{\top, \bot\}, we can evaluate the boolean formula to a truth value τ(F){,}\tau(F) \in \{\top, \bot\}.

Conjunctive normal form (CNF) describes a restricted class of boolean formulas. Literals \ell consist of either a variable or the negation of a variable. Clauses CC are a disjunction of literals, and a formula FF is a conjunction of clauses:

xVx \in V

::=xx\ell ::= x \vert \overline{x}

C::=iiC ::= \bigvee_i \ell_i

F::=iCiF ::= \bigwedge_i C_i

CNF formulas are 'nice' for a variety of reasons, most of which are not relevant here (curious readers should look up: Tseitin transformation, resolution proofs, conflict-driven clause learning). The one fact to remember is that a formula is essentially a list of clauses, and a clause is essentially a list of literals.

The variable set VV in our definition is rather arbitrary. Elements of VV merely serve as labels for the unknowns in a formula. So, in practice, SAT solvers take formulas in the DIMACS format, which specifies that variables are labelled by positive natural numbers, V=N+V = \mathbb{N}^+. There is some tension here: when humans describe a CNF, we generally use descriptive labels, and only in the implementation do we map those descriptive labels to concrete numbers. This tension will motivate some design decisions about how trestle handles variables.

Simple Encodings

Let's look at how to encode a simple constraint. The at-most-one constraint is a classic and common constraint in practical applications. It says: given a list of literals, require that at most one literal from the list evaluates to true. We will write AMO(L)\text{AMO}(L) as shorthand for an at-most-one constraint with a list of literals LL. AMO is useful when we want to represent choosing a single element from a set. For example, AMO(x1,x9)\text{AMO}(x_1, \dots x_9) might represent choosing a number from {1,,9}\{1,\dots,9\}.

How can we express AMO(L)\text{AMO}(L) as a CNF formula? The simplest encoding is called the pairwise encoding:

12L(12)\bigwedge_{\ell_1 \ne \ell_2 \in L} (\overline\ell_1 \lor \overline\ell_2)

For all pairs of distinct literals in the list, both cannot be true. This CNF expresses the AMO constraint, because . Also, note how many clauses this encoding generates; there are O(L2)O(|L|^2) pairs, each of which generates a clause. Can we do better?

Imagine we split the list of literals into two blocks: L=A,BL = A, B. Let tt be a fresh boolean variable (i.e. it is not used anywhere else in our formula), and consider the constraints AMO(A,t)AMO(t,B)\text{AMO}(A, t) \land \text{AMO}(\overline{t}, B). If tt is true, then all the literals in AA must be false, and at most one of the literals in BB is true. If instead tt is set to false, then all the literals in BB are false, and at most one in AA is true. This is semantically equivalent to AMO(A,B)\text{AMO}(A,B): all satisfying assignments to the split constraints also satisfy the original constraint, and all assignments satisfying the original constraint can be extended with a truth value for tt.

So, let's apply this splitting lemma many times to reach this equivalent constraint:

AMO(1,2,3,t1)AMO(t1,3,4,t2)AMO(tm,n2,n1,n)\text{AMO}(\ell_1, \ell_2, \ell_3, t_1) \land \text{AMO}(\overline{t_1}, \ell_3, \ell_4, t_2) \\ \land \dots \land \text{AMO}(\overline{t_m}, \ell_{n-2}, \ell_{n-1}, \ell_{n})

There are O(L)O(|L|) remaining AMO constraints, but they each have 4\le 4 literals. So, if we pairwise encode these constant length AMO constraints, they will each generate O(1)O(1) clauses, giving a total of O(L)O(|L|) clauses -- we improved (asymptotically) on the pairwise encoding, at the cost of adding O(L)O(|L|) fresh variables. I am not aware of a standard name for this encoding, but trestle refers to it as the cut-4 encoding (since we cut it into pairwise AMOs with 4 literals).

The pairwise and cut-4 encodings are semantically equivalent, so why have both? Experimentally, we find that SAT solvers perform differently on these encodings. Sometimes the pairwise encoding is best, sometimes cut-4, sometimes other AMO encodings (see TODO Sheng et al re-encoding paper, and Joseph's thesis?). In general, auxiliary variables can make a CNF encoding more compact (fewer clauses), but increasing the number of variables has costs during SAT solving. This tradeoff is an area of active research. Modern SAT solvers will often eliminate variables by adding clauses, a technique called bounded variable elimination (cite papers). The point is that encodings must carefully manage where and how auxiliary variables are used, because this will significantly affect how SAT solvers perform on the resultant formulas.

Specification

Ultimately, we want to prove our CNF encodings are correct. But in order to do that, we need to define what exactly we mean by "correct." Let's use the at-most-one constraints above as a motivating example. We have some abstract, high level constraint AMO(L)\text{AMO}(L), and we are translating this into a CNF formula. We'll call the abstract, high level constraint a specification for the CNF formula.

Specifications can be formalized as predicates over boolean assignments. A specification describes which assignments should satisfy the CNF formula. We'll write specifications as P(τ)P(\tau) for boolean assignments τ:V{,}\tau : V \rightarrow \{\top,\bot\}. For example, a specification for the AMO(L)\text{AMO}(L) constraint might look like

P(τ)    Lτ()1P(\tau) \iff \sum_{\ell \in L} \tau(\ell) \le 1

This is a direct translation of the informal meaning of an at-most-one constraint: it accepts assignments for which the number of satisfied literals is at most one.

Notice that any auxiliary variables, while potentially present in the implementation, do not appear in the specification. This is, in fact, what distinguishes "auxiliary" variables from "problem" variables. Problem variables are those in the specification, and auxiliary variables are everything else.

Specifications can be thought of as descriptions (generated from an implementation to describe what it does), or requirements (imposed on an implementation to constraint what it does). We will use both mindsets in the following sections.

Encodings as Programs

# Initialize a pool of variable ids
vpool = IDPool(start_from=1)
# x(i,j,n) returns the variable ID for x_{i,j,n}
x = lambda i, j, n: vpool.id(f'x_{i}_{j}_{n}')

cnf = CNF() # create a new formula

# Every cell has exactly one number assigned
for i in range(9):
    for j in range(9):
        cell_vars = [ x(i,j,n) for n in range(1,10) ]

        # at least one of cell_vars is true
        cnf.append([cell_vars])
        
        # at most one of cell_vars is true
        cnf.append(CardEnc.atmost(cell_vars, bound=1, vpool=vpool))

# Row, column, and block constraints
...

Recall that a CNF formula is a conjunction of many clauses. This structure makes it easy to conjoin (\land) two CNF formulas together, since we can simply append their clause lists. This is how we write the sudoku encoding above: encode many simple constraints into CNFs, and then append each of the CNFs into the big formula. Using the language of specifications:

also, when we run some sub-component, we use compositional reasoning to justify correctness

Speaking of auxiliaries, there’s some additional state with vpool

Verifying Encoding Programs

We want CNF encodings which

  1. support reuse / composition
  2. support compositional reasoning about their correctness
  3. maintain the expressiveness of unverified approaches.

Trestle's encoding utilities achieve all of these goals. The core component of this library, the encoding monad, began as my personal utility for writing composable SAT encodings (goals 1 & 3). Eventually, I realized it could be useful for verification as well (goal 2). Useful programs motivate verification, and verification techniques (particularly compositional reasoning) can motivate better programs.

compositional reasoning about encodings requires extremely careful management of auxiliary variables

key idea #1: keep track of which variables are "problem" variables, and only add clauses involving those problem variables

key idea #2: auxiliary variables are scoped problem variables

what is the spec for withTemps

P(τ)    σ:T{,},Q(τσ)P(\tau) \quad\iff\quad \exists \sigma : T \rightarrow \{\top,\bot\}, \quad Q(\tau \oplus \sigma)

Here QQ is a spec for whatever encoding happens within the scope of the auxiliaries.

Conclusions

Re-invented the key concepts behind trestle's CNF encoding utilities. To re-iterate the design goals, we wanted:

  1. reusable / composable encodings
  2. compositional reasoning about their correctness
  3. the expressiveness of unverified approaches And this is exactly where we ended up. We can compose arbitrary encodings; we can reason about an encoding by reasoning about its parts; and we can emit clauses and allocate auxiliary variables with just as much freedom as unverified code.

Acknowledgements



Older Back to Top Newer