Auxiliary Variables in Verified SAT Encodings

Posted on 2025 Nov 30

VERSION 4

Boolean satisfiability, or SAT, is a classic computational problem: given a boolean formula, determine whether there exists an assignment to the variables that satisfies the formula. SAT is NP-complete; intuitively, this means we can efficiently translate any bounded search problem into an equivalent SAT problem.

It turns out these reductions to SAT are often useful for automated reasoning applications. Circuit equivalence checking can be reduced to SAT, which hardware companies use to verify correctness of circuits. Satisfiability modulo theories (SMT) solvers, which are common tools for program verification, generate and solve many SAT problems during execution. Reductions to SAT have even been used in pure mathematics, performing combinatorial casework to solve previously unsolved conjectures. Bernardo Subercaseaux recently wrote about SAT-for-mathematics on the CSD Blog, go check it out!

For many automated reasoning applications, reductions to SAT are correctness-critical. This means a mistake in the reduction can lead a user to believe incorrect claims about the original problem domain. As an example, suppose a hardware company wants to use SAT for circuit equivalence. The hardware company design a reduction to SAT. Perhaps they even write a proof on paper of the following theorem: for two circuits $C_1, C_2$ , the circuits are equivalent iff the reduction $R(C_1, C_2)$ is unsatisfiable:

$C_1 \equiv C_2 \quad \iff \quad \lnot \exists \tau, \tau \vDash R(C_1, C_2)$

They implement $R$ as a computer program, taking two circuits as input and outputting a boolean formula in the standard format taken by SAT solvers. But, alas! their implementation has a small bug which is missed by their test suite. Unaware, the company use this buggy implementation to check two of their circuits equivalent, a SAT solver says the boolean formula is unsatisfiable, and everyone is happy. Except, oh no! the circuits are actually not equivalent, and by the time the company notices, they've wasted $1B preparing to mass-produce this buggy chip!

The story above is contrived and fictional, but the risks around SAT reductions are genuine and significant. I'll give two intuitive justifications here.

SAT reductions are sensitive to very small changes. We can construct boolean formulas that are 100 megabytes in size, where tweaking a single byte can determine whether the formula is SAT or UNSAT. So, any rare edge case or minor typo in a reduction implementation can drastically affect the result.
UNSAT results are generally not interpretable or explainable. We can be confident that a formula is UNSAT (see next paragraph), but have no or very little insight into why it is UNSAT. Maybe the reduction is implemented correctly, and an UNSAT result tells us something meaningful about the original problem domain. Or, maybe the formula is UNSAT because the reduction implementation has a bug. SAT solvers cannot give us the information to distinguish these outcomes.

SAT solvers, as a black box for solving SAT problems, are remarkably trustworthy. Modern solvers produce certificates of unsatisfiability, and those certificates are checked by formally verified software, essentially ruling out any possibility that SAT solvers give wrong results. Yet this remarkable trustworthiness can be completely undermined before a SAT solver even comes into the picture! How do we improve this situation? How do we expand the circle of trust beyond SAT solvers, to also encompass SAT reductions?

Today we will be discussing some recent work by myself and collaborators to make it easy to formally verify SAT reductions. We are developing a library in Lean 4 called trestle, which provides programming and proving utilities for SAT reductions. We will focus on just one component of the library: its CNF encoding utilities.

"CNF encoding" is a big part of the SAT reduction recipe, and I think the trestle approach to this step is quite cool. We will learn what CNF encodings are, discuss what "correctness" means for CNF encodings, and rediscover some key ideas for verification that we use in trestle.

SAT Definitions

Let's review some key definitions from the SAT world. A boolean formula $F$ is an expression containing the usual boolean operators, and variables from some set $V$ . Given a truth assignment $\tau$ mapping the unknowns $V$ to truth values $\{\top, \bot\}$ , we can evaluate the boolean formula to a truth value $\tau(F) \in \{\top, \bot\}$ .

Conjunctive normal form (CNF) describes a restricted class of boolean formulas. Literals $\ell$ consist of either a variable or the negation of a variable. Clauses $C$ are a disjunction of literals, and a formula $F$ is a conjunction of clauses:

$x \in V$

$\ell ::= x \vert \overline{x}$

$C ::= \bigvee_i \ell_i$

$F ::= \bigwedge_i C_i$

CNF formulas are 'nice' for a variety of reasons, most of which are not relevant here (curious readers should look up: Tseitin transformation, resolution proofs, conflict-driven clause learning). The one fact to remember is that a formula is essentially a list of clauses, and a clause is essentially a list of literals.

The variable set $V$ in our definition is rather arbitrary. Elements of $V$ merely serve as labels for the unknowns in a formula. So, in practice, SAT solvers take formulas in the DIMACS format, which specifies that variables are labelled by positive natural numbers, $V = \mathbb{N}^+$ . There is some tension here: when humans describe a CNF, we generally use descriptive labels, and only in the implementation do we map those descriptive labels to concrete numbers. This tension will motivate some design decisions about how trestle handles variables.

Simple Encodings

Let's look at how to encode a simple constraint. The at-most-one constraint is a classic and common constraint in practical applications. It says: given a list of literals, require that at most one literal from the list evaluates to true. We will write $\text{AMO}(L)$ as shorthand for an at-most-one constraint with a list of literals $L$ . AMO is useful when we want to represent choosing a single element from a set. For example, $\text{AMO}(x_1, \dots x_9)$ might represent choosing a number from $\{1,\dots,9\}$ .

How can we express $\text{AMO}(L)$ as a CNF formula? The simplest encoding is called the pairwise encoding:

$\bigwedge_{\ell_1 \ne \ell_2 \in L} (\overline\ell_1 \lor \overline\ell_2)$

For all pairs of distinct literals in the list, both cannot be true. This CNF expresses the AMO constraint, because . Also, note how many clauses this encoding generates; there are $O(|L|^2)$ pairs, each of which generates a clause. Can we do better?

Imagine we split the list of literals into two blocks: $L = A, B$ . Let $t$ be a fresh boolean variable (i.e. it is not used anywhere else in our formula), and consider the constraints $\text{AMO}(A, t) \land \text{AMO}(\overline{t}, B)$ . If $t$ is true, then all the literals in $A$ must be false, and at most one of the literals in $B$ is true. If instead $t$ is set to false, then all the literals in $B$ are false, and at most one in $A$ is true. This is semantically equivalent to $\text{AMO}(A,B)$ : all satisfying assignments to the split constraints also satisfy the original constraint, and all assignments satisfying the original constraint can be extended with a truth value for $t$ .

So, let's apply this splitting lemma many times to reach this equivalent constraint:

$\text{AMO}(\ell_1, \ell_2, \ell_3, t_1) \land \text{AMO}(\overline{t_1}, \ell_3, \ell_4, t_2) \\ \land \dots \land \text{AMO}(\overline{t_m}, \ell_{n-2}, \ell_{n-1}, \ell_{n})$

There are $O(|L|)$ remaining AMO constraints, but they each have $\le 4$ literals. So, if we pairwise encode these constant length AMO constraints, they will each generate $O(1)$ clauses, giving a total of $O(|L|)$ clauses -- we improved (asymptotically) on the pairwise encoding, at the cost of adding $O(|L|)$ fresh variables. I am not aware of a standard name for this encoding, but trestle refers to it as the cut-4 encoding (since we cut it into pairwise AMOs with 4 literals).

The pairwise and cut-4 encodings are semantically equivalent, so why have both? Experimentally, we find that SAT solvers perform differently on these encodings. Sometimes the pairwise encoding is best, sometimes cut-4, sometimes other AMO encodings (see TODO Sheng et al re-encoding paper, and Joseph's thesis?). In general, auxiliary variables can make a CNF encoding more compact (fewer clauses), but increasing the number of variables has costs during SAT solving. This tradeoff is an area of active research. Modern SAT solvers will often eliminate variables by adding clauses, a technique called bounded variable elimination (cite papers). The point is that encodings must carefully manage where and how auxiliary variables are used, because this will significantly affect how SAT solvers perform on the resultant formulas.

Specification

Ultimately, we want to prove our CNF encodings are correct. But in order to do that, we need to define what exactly we mean by "correct." Let's use the at-most-one constraints above as a motivating example. We have some abstract, high level constraint $\text{AMO}(L)$ , and we are translating this into a CNF formula. We'll call the abstract, high level constraint a specification for the CNF formula.

Specifications can be formalized as predicates over boolean assignments. A specification describes which assignments should satisfy the CNF formula. We'll write specifications as $P(\tau)$ for boolean assignments $\tau : V \rightarrow \{\top,\bot\}$ . For example, a specification for the $\text{AMO}(L)$ constraint might look like

$P(\tau) \iff \sum_{\ell \in L} \tau(\ell) \le 1$

This is a direct translation of the informal meaning of an at-most-one constraint: it accepts assignments for which the number of satisfied literals is at most one.

Notice that any auxiliary variables, while potentially present in the implementation, do not appear in the specification. This is, in fact, what distinguishes "auxiliary" variables from "problem" variables. Problem variables are those in the specification, and auxiliary variables are everything else.

Specifications can be thought of as descriptions (generated from an implementation to describe what it does), or requirements (imposed on an implementation to constraint what it does). We will use both mindsets in the following sections.

Encodings as Programs

The PySAT library provides encoding utilities in Python, and we can see what a program looks like in PySAT
spec for sudoku puzzle
- For a given row and column, there are 9 variables ( $x_{i,j,1}$ to $x_{i,j,9}$ ) representing the state in that cell. variable $x_{i,j,n}$ means row $i$ column $j$ is assigned number $n$
- at least one of these variables should be true, so we'll require $x_{i,j,1} \lor \dots x_{i,j,9}$ .
- but also, at most one of them should be true (a cell cannot be assigned both 1 and 2), so we'll require at-most-one constraints $\text{AMO}(x_{i,j,1} , ... , x_{i,j,9})$ using one of the encodings given above.
example code derived from Bernardo's fantastic tutorial on writing SAT encodings here

# Initialize a pool of variable ids
vpool = IDPool(start_from=1)
# x(i,j,n) returns the variable ID for x_{i,j,n}
x = lambda i, j, n: vpool.id(f'x_{i}_{j}_{n}')

cnf = CNF() # create a new formula

# Every cell has exactly one number assigned
for i in range(9):
    for j in range(9):
        cell_vars = [ x(i,j,n) for n in range(1,10) ]

        # at least one of cell_vars is true
        cnf.append([cell_vars])
        
        # at most one of cell_vars is true
        cnf.append(CardEnc.atmost(cell_vars, bound=1, vpool=vpool))

# Row, column, and block constraints
...

Recall that a CNF formula is a conjunction of many clauses. This structure makes it easy to conjoin ( $\land$ ) two CNF formulas together, since we can simply append their clause lists. This is how we write the sudoku encoding above: encode many simple constraints into CNFs, and then append each of the CNFs into the big formula. Using the language of specifications:

Sequential composition of programs corresponds to conjunction. We want to encode (cell constraints) $\land$ (row constraints) $\land$ (column constraints), so we run each of those encodings in sequence.
For loops correspond to indexed conjunction $\bigwedge_i$ . For loops are iterated sequential composition, and $\bigwedge$ is iterated conjunction.
Adding a clause corresponds to a clause of the abstract problem variables. For instance cnf.extend([[ x(1,2,3), x(4,4,4)]]) corresponds to $x_{1,2,3} \lor x_{4,4,4}$ . In the code, x is a function returning variable IDs, whereas in the spec it is some abstract set of labels.

also, when we run some sub-component, we use compositional reasoning to justify correctness

We trust PySAT's implementation of cardinality constraints, so we reason about the AMO constraint at a high level, without discussing the details of the clauses that make up the AMO constraint
particularly important: auxiliaries are usually associated with / considered a part of an encoding component, rather than being reasoned about from the top-level CNF. at-most-one encoding might use auxiliaries, and within the implementation we would need to reason about those auxiliaries, but outside the implementation we are blissfully unaware.

Speaking of auxiliaries, there’s some additional state with vpool

Explain vpool manages the variables, keeping track of the highest allocated variable.
Fresh variables are generated by bumping the Max up by 1.
That’s why need to pass vpool into card constraints, to allocate fresh auxiliaries

Verifying Encoding Programs

We want CNF encodings which

support reuse / composition
support compositional reasoning about their correctness
maintain the expressiveness of unverified approaches.

Trestle's encoding utilities achieve all of these goals. The core component of this library, the encoding monad, began as my personal utility for writing composable SAT encodings (goals 1 & 3). Eventually, I realized it could be useful for verification as well (goal 2). Useful programs motivate verification, and verification techniques (particularly compositional reasoning) can motivate better programs.

PySAT program in prior section is a stateful, imperative program
Back in 2022 was working on SAT encodings, but also was exclusively programming in Lean, a pure functional language. Wanted to be able to write PySAT style code, but can't do all this fancy imperative mutation. Instead, the standard approach is to pass around state explicitly.
Let's categorize the state in the program above:
1. A variable pool. The first variable we allocate is assigned index 1, then the next allocated is index 2, and so on. When allocating a new variable, we return the next unused index, ensuring new variables are always fresh.
2. A CNF formula. We can append more clauses to the formula to build up a CNF
Then, an encoding program is a function which takes an initial state and returns a modified state.
Taking an initial state as a parameter is vital for composing encodings with auxiliary variables. suppose we use the linear AMO as part of a larger encoding; it needs to allocate fresh variables, so it needs to know what has already been allocated.
- In PySAT, the AMO constraint takes a vpool as argument for this reason
We implement this in Lean using a state monad, details not important except to say that Lean makes it easy to write programs passing around state

compositional reasoning about encodings requires extremely careful management of auxiliary variables

we want to compose two constraints A and B. if the spec for A is $P_A$ and the spec for $B$ is $P_B$ , we expect the spec for their composition to be $P_A \land P_B$
but this is only true if the constraints' auxiliary variables do not interfere!
example: we encode an AMO constraint with the linear encoding, and then we add another clause that fixes one of the AMO auxiliaries to true. The AMO clauses no longer express AMO!
the auxiliary variables do not appear in the spec, and so we have no way to describe in the spec how the auxiliary variables interact with one another

key idea #1: keep track of which variables are "problem" variables, and only add clauses involving those problem variables

add a type V to the encoding monad, so now it is EncCNF V
keep a map $\text{vMap} : V \rightarrow \mathbb{N}^+$ from those abstract variables to variable indices
rather than extending the CNF with clauses in terms of variable indices, give clauses in terms of the problem variables. So, we add $a \lor b \lor c$ (where $a,b,c$ are problem variables) to the CNF, and then behind the scenes this gets translated via $\text{vMap}$ to $\text{vMap}(a) \lor \text{vMap}(b) \lor \text{vMap}(c)$ . Note that $a \lor b \lor c$ is its own spec.
but now we have a new problem... how do we use auxiliary variables if we cannot add clauses that use them???

key idea #2: auxiliary variables are scoped problem variables

allocate some fresh variables, and expand the set of "problem variables" to include these fresh variables. once you're done with them, forget where you put them forever.
within their scope, they are problem variables. we extend vMap to include them, and we can add clauses using them.
outside their scope, vMap drops the auxiliaries, and we cannot add clauses involving those auxiliaries
introduce withTemps definition

what is the spec for withTemps

by virtue of being independent of the rest of the encoding, auxiliaries are existentially quantified. SAT asks if there exists an assignment to the variables (both problem and auxiliary) satisfying the constraints, so an assignment on the problem variables satisfies the (problem and auxiliary) variables iff there exists an assignment on the auxiliaries:

$P(\tau) \quad\iff\quad \exists \sigma : T \rightarrow \{\top,\bot\}, \quad Q(\tau \oplus \sigma)$

Here $Q$ is a spec for whatever encoding happens within the scope of the auxiliaries.

Conclusions

Re-invented the key concepts behind trestle's CNF encoding utilities. To re-iterate the design goals, we wanted:

reusable / composable encodings
compositional reasoning about their correctness
the expressiveness of unverified approaches And this is exactly where we ended up. We can compose arbitrary encodings; we can reason about an encoding by reasoning about its parts; and we can emit clauses and allocate auxiliary variables with just as much freedom as unverified code.

tag line for people interested in verifying CNF encodings: auxiliary variables are scoped existential quantification
for everyone else: my takeaway from this experience is that formal verification drives the creation of better programming tools. Even with its extremely rough UX, I slightly prefer to write encodings with trestle rather than with C and some printlns. It is just easier (for me, the person who designed this damn thing) to reason about complicated encodings through composable specs. It takes a lot of trial and error to get to something that is useful, but that work can sometimes pay off.

Acknowledgements

Older Back to Top Newer

James' Wall-Free Garden