**Improvement of Fundamentals**

**Synthesis**- Technology-Independent Combinational Synthesis
- Technology mapping
- Resynthesis
- Use of “boxes”
- Sequential synthesis
- Synthesis with placement metrics

**Formal Verification**

**Programming environment**

### Improvement of Fundamentals

The dual areas of Verification and Synthesis share common foundations in how logic functions are stored and manipulated on the computer. Logic functions and relations can be stored in one of basically four different ways: as a circuit, as a binary decision diagram, as a truth table, or as set of satisfiability clauses. Improvements in these capabilities enable everything else, synthesis, optimization, model checking, equivalence checking, bug finding, etc. It is therefore a fundamental tenet of the Berkeley Verification and Synthesis Research Center (BVSRC), that work on improving these capabilities should be a continuous and ongoing research and development effort. Improvements come in the form of improved memory footprints, smaller cache misses, exploring alternate methods, such as clause-based versus circuit-based satisfiability (SAT) solvers, and specializing methods to a particular applications, to name a few. The improvements can combine to make an algorithm sometimes orders of magnitude faster, thus enabling its use in an application or enabling its iteration many times.

#### 1. AIG Packages

And-Inverter Graphs (AIGs) are networks of two-input AND gates and inverters. In ABC, AIGs are used as the data-structure of choice for solving almost all problems in synthesis as well as verification. AIGs as implemented in ABC are quite different from two-input gate networks, such as those produced in SIS by command “tech_decomp -a 2 -o 2”. Several key differences from other logic representations are:

- Inverters are not separate nodes but complemented attributes on the edges
- Uniform representation of both ANDs and ORs (as ANDs) using the de Morgan rule
- Two-input ANDs are structurally hashed on-the-fly to remove structural redundancies
- The AND nodes are always topologically ordered (and therefore ready to be traversed)
- Constants are always propagated (only POs can have constant fanins)
- Dangling nodes are not allowed (this simplifies many logic manipulation procedures)
- Efficient memory management (due to all nodes having the same size)
- Efficient on-demand fanout representation (constant-time addition and deletion of fanouts without memory reallocation)

In applications with large AIGs, the performance of AIG-based code in several algorithms greatly depends on the amount of memory used. For example, sequential AIG simulation is 100x faster when programmed with several principles in mind:

- DFS ordering of the AIG nodes from primary inputs
- Minimalist representation of AIGs using 8 bytes per AIG node (only two fanins are stored)
- Recycling simulation memory when nodes are added/removed to/from the traversal frontier
- Modifying the AIG representation to be more CPU-cache friendly, by going from an absolute representation (each node stores its fanin locations) to the relative one (each node stores the relative position of its fanins in the traversal frontier)

These observations led to the rethinking of the foundations of AIG manipulation. Our earlier AIG package requires 32 bytes of memory per AIG node, not counting the optional fanout representation and structural hashing table. In contrast, much faster and more scalable AIG-based computations have been developed using a package with only 8 bytes per node, while adding temporary data-members (such as reference counter, level counter, cuts, fanout, etc) on demand and freeing them as soon as possible. These speedups are only achieved for AIGs with millions of AIG nodes, while the performance on smaller AIGs (around 10K-50K nodes) is about 1-2x faster.

#### 2. SAT Methods

SAT has grown to become the dominating technique for formal verification, largely replacing BDD-based engines as the scalable alternative that can conquer industrial sized designs. It has also become the work horse of many logic synthesis algorithm, such as clock gating and redundancy removal.

A main theme in the application of modern SAT-solvers is
*incrementality*: how to best craft the SAT-based algorithm to
reuse the information derived during one problem to improve the
performance of successive and related SAT-queries. It is also
important to keep an eye on the overhead generated, going from one
query to the next. E.g., equivalence checking typically generates
hundreds of thousands of very easy and highly related SAT queries,
mixed together with a few thousand really hard problems. The solver
must have minimal overhead for the easy problems, but still work
efficiently on the few hard ones.

Part of our research relates to improving this interaction between the top-level algorithm and the underlying SAT solver. This may require reformulating algorithms to take full advantage of the incremental SAT interface of a modern solver, but may also require extensions and improvements to the SAT-solver itself.

We are also looking into new ways of integrating an AIG representation with a SAT-solver natively to avoid the overhead of translating circuits into clauses. To truly reap the benefit of this approach, the SAT-solver has to support some circuit specific operations, such as transfering the fanouts of one node to another equivalent node, or restricting the boolean propagation to the transitive fanin cones of a set of nodes. By working out the details of how to do this and other things efficiently, we hope to get a significant speedup in this application domain of SAT.

#### 3. BDD package

BDDs are one of only a few ways in which a final definitive answer can be given to a model checking or equivalence checking question. They can answer SAT, UNSAT, or run out of a memory or time bound. Alternate methods are interpolation, induction, bounded model checking (BMC), and synthesis, each of which have their areas of strength. Often a problem can be trimmed down to a reasonable size, say a few hundred flip-flops and inputs, in which case BDDs are a good choice to try. New and more powerful abstraction methods combined with synthesis and retiming have been developed in the past few years. These are capable of reducing a problem significantly, while spending less time on this. It is therefore well motivated to continue to invest in improved BDD techniques even though much has been done in the past. We aim to put together the best BDD package we can, exploring the best methods from past research, and to explore further improvements based on experience with industrial applications.

### Synthesis

#### 4. Technology-Independent Combinational Synthesis

Combinational logic synthesis restructures and optimizes a technology-independent representation of a logic network. Traditionally, logic functions of the nodes have been represented using sums-of-products (SOPs) [33]. In ABC, the networks are represented as And-Inverter-Graphs (AIGs). AIGs can be stored compactly in files or in memory, taking 3.5 bytes per node on average [4]. AIGs have no fixed node boundaries [7] allowing optimization algorithms to scan though all boundaries choosing the ones that lead to the largest improvement during synthesis.

In the last few years, we developed several efficient AIG
optimization algorithms. The main workhorse has been AIG rewriting
[5][22], which looks at all 4-input cuts of each AND-node, compares
its structure with a set of pre-computed AIG subgraphs, and chooses
the subgraph with the fewest AIG nodes, possibly subject to
constraints, such as on AIG levels. Other algorithms for AIG
minimization have been developed such as redundancy removal and
resubstitution [24] using SAT to compute and use local don.t
cares. *Synthesis with structural choices* [10] records and
stores multiple snapshots of the same design. This improves the
quality of technology mapping [23] and can lead to scalable
verification, which also extends into the sequential domain [28].

Clock-gating had become of great interest recently. It method prevents unnecessary switching of registers and reduces dynamic power consumption. Based on a simulation and satisfiability paradigm [20], we have developed a method [13] that can be made scalable and can save, on average, 14% of all register clock transitions. Further, the extraction of a gating cover from pre-existing signals allows don.t cares to be exploited to reduce logic size by 7%. Even more power can be saved using sequential methods. However, this often leads to extremely hard sequential equivalence checking problems. In some cases, we have shown how these problems can be solved much more easily by reducing the problem to a combinational equivalence checking problem.

Often logic synthesis methods do not do well on XOR-rich circuits, such as cryptographic applications and cyclic redundancy checkers. This motivates the development of special methods for such circuits, detecting these and implementing synthesis methods which take advantage of the existence of XORs.

One method that has been very successful in the last few years has
been *sequential signal correspondence* where two or more signals
are identified as being equivalent on the set of states that can be
reached from the initial state. Such state equivalence classes occur
quite often and are not always obvious to the designer, since their
equivalence depends on the set of reachable states. One saving grace
is that often there are large subsets of these that form an inductive
invariant. When equivalence classes have been identified and proved,
they can be used to simplify the circuit by eliminating all but one
and transferring all the fanouts to the class representative. On the
other hand, many equivalences are true but can.t be proved
inductively. One way of enhancing this method is to find other
invariants and prove them all at once. Another way is to speculate on
equivalences and prove them by formal verification techniques.

Signal correspondence is one application where a specialized SAT solver can be employed. This is because during the induction proving process, many small problems are proved using SAT. We found the use of a circuit-based solver in this application can speed up the computation several times as well as allowing it to be applied to much larger circuits not only because it is faster but also requires less memory. This is a good example where an algorithm can be specialized to an application yielding significant speedups.

#### 5. Technology mapping

Technology mapping transforms an AIG into a network of logic nodes. In standard-cell mapping, each node is represented by a gate from a given library. In FPGA mapping, each logic node is represented using a K-input look-up table (LUT) implementing any Boolean function up to K inputs. The target and goals for a mapper can vary; standard cells, various styles of FPGAs, minimization of delay, area, power, and wires. Mappers can be cut-based, structurebased, or Boolean machers.

A recent development was for standard cells based on structural matching and disjoint-support decomposition. The approach uses of structural choices, uniform handling of XORs and MUXes, improved phase assignment, reduced runtime and memory of structural matching using priority matches, similar to priority cuts [25]. It was motivated by requests from industrial collaborators who wanted one that gave small area, leaving the problem of meeting timing requirements mostly to resizing and buffering algorithms. However, we also developed a new timing-aware resynthesis method that can be in conjunction to this to improve the delay if required [30]. The method uses mapping with choices developed earlier as a regular part of any of our mappers. Unlike previous methods, the algorithm is fast, is applicable to very large networks, and incurs little area penalty.

#### 6. Resynthesis

Resynthesis works on mapped networks and tries to improve the results by re-decomposing groups of mapped cells or by rewiring or resubstitution. Recent techniques, [26] and [27], are used for re-synthesizing a mapped circuit to further minimize the number of gates or LUTs. The first is based on resubstitution using don.t cares and the second does gate or LUT packing. By iterating these two Boolean techniques, it is not uncommon to reduce the LUT count by 10% or more, compared to a mapping produced by a highly optimized structural mapper.

#### 7. Use of “boxes”

In many applications, the user does not want synthesis to touch designated parts of a design. This feature is extensively used in FPGA companies where boxes represent parts where there is a known highly optimal implementation, or a box may designate a specialized type of memory element. In the past, these have been dealt with by making them .black. where their outputs become primary inputs to the rest of the logic and their inputs become primary outputs of the logic. We have just completed an extensive development which elevates all the algorithms of ABC to use .white-boxes. and .black-boxes.. This allows application to industrial designs with internal memories, DSP blocks, and arithmetic circuits [15]. White boxes are logic components (combinational or sequential blocks) which are not to be touched, but contain information about their delays and logic structures, which can be very useful for synthesizing the logic outside the boxes or even for deciding how some of the boxes or hierarchy can be collapsed to improve synthesis. As opposed to black boxes, white boxes expose all their logic to synthesis, but synthesis is only constrained to not change the interior of the boxes.

#### 8. Sequential synthesis

Sequential synthesis is still rarely used in commercial software. In contrast to combinational synthesis, sequential synthesis can arbitrarily modify circuit behavior on the unreachable states. Since it is common that reachable states constitute a small fraction of all states, sequential synthesis can produce vastly superior results over combinational synthesis. Sequential synthesis may involve the following optimizations: retiming, re-encoding, minimization with sequential don.t-cares, merging sequentially equivalent nodes [11], etc. Some of these are quite scalable and still produce impressive results. Sequential AIGs are used to extend combinational AIGs by including memory elements [1].

One scalable sequential synthesis method uses induction to detect and merge sequentially-equivalent nodes, that is, nodes that are equivalent on reachable states, but possibly not equivalent otherwise [28]. We have developed an inductive prover that is extremely fast, using a number of new techniques, including speculative reduction [31][17], circuit-based SAT, etc. This type of sequential synthesis is not only scalable, but also is much more acceptable to industry because state-encoding, scan chains, and test vectors are preserved. Moreover, the synthesis results are guaranteed to be sequentially verifiable using an independent inductive prover similar to that used for synthesis. Experiments show impressive effectiveness; when applied to a set of 20 industrial benchmarks ranging up to 26K registers and up to 53K 6-LUTs, the average reductions in the number of registers and LUT count were 12.9% and 13.1% respectively while delay was reduced by 1.4%. These improvements were done on logic after it was synthesized by a commercial FPGA tool.

#### 9. Synthesis with placement metrics

Especially in FPGAs, connecting wires becomes the most critical factor in determining chip area, critical path length, switching power consumption, and so on. It is therefore important to consider connection costs early within a design flow. However, design tools for logic synthesis and technology mapping do not accurately estimate the connect cost, which is available only after placement and routing. A unit delay model that does not reflect wire delay is commonly used in stages prior to placement and routing. Consequently, results are suboptimal and biased. A better way is to initially do synthesis using the best estimates one can get for the wire delays and then apply incremental methods which alternate between placement and synthesis. The idea is that placement feeds actual wire delays for a synthesized, placed and routed design. Then synthesis uses these to re-synthesize critical parts of a design, trying to keep the changes small. Incremental layout is then invoked which works only on the changed logic, leaving the rest of the placement and routing undisturbed. This can be iterated for improved results. This technique was recently implemented using retiming as the synthesis technique.

### Formal Verification

#### 10. Sequential equivalence checking

Sequential equivalence checking (SEC) tests the correctness of designs after sequential synthesis (SS). Efficient SEC is important because it enables wider adoption of innovative SS methods. Recently we developed an integrated engine for general-case SEC based on structural analysis, simulation, combinational synthesis, retiming, and other types of transforms [28]. In particular, our induction-based sequential synthesis, which merges sequentially equivalent registers and nodes, is used to reduce the size of the problem. This was tested on industrial benchmarks and found capable of proving equivalence of the original design to that obtained by using most of the synthesis scenarios involving sequential transforms, particularly retiming and signal correspondence

#### 11. Model checking

Model checking (MC) attempts to prove properties for one copy of a design. It subsumes SEC where the property is the equivalence to the original circuit. SEC often benefits from structural similarities among the two copies of the miter produced; this is not available for MC. Since k-step induction is one of the most widely used but incomplete proof methods in MC [32], several techniques were developed (for example, [6]) to strengthen induction so that MC can be converged for those problems where induction fails without any strengthening of it. In the past, we explored several ways of strengthening induction as a proof method for the target property in model checking by developing efficient frameworks for considering implications as candidate invariants [8] and by selecting invariants using counter-examples [9]. The latter method systematically explores the space of candidate inductive invariants, which allows us to find and prove invariants that are few in number and immediately help the problem at hand. This method is applied to interpolation where invariants are used to refute an error trace and help discard spurious counterexamples. This work of finding additional invariants and using them continues since induction is one of the key methods available for proving properties.

We have implemented and experimented with three sophisticated abstraction/refinement algorithms in ABC. .abstract. starts by replacing all registers by primary inputs and executing BMC procedure. This abstraction is then refined using counter-examples, reintroducing some of the registers. This is called counter-example based abstraction. Another method is .proof-based. abstraction. We have recently combined CBA and PBA into a single method for abstraction which additionally uses techniques where a single instance of a SAT problem is used during the entire abstraction process. This speeds up the process greatly and the combination of CBA/PBA seems to be able to produce smaller abstractions on average. The method is very scalable and can solve hard industrial benchmarks with several million AIG nodes, well beyond the scope of other verification methods.

Speculative reduction starts by deriving and refining as many speculated equivalences as it can within a certain resource limit. At that point it assumes the remaining equivalences and creates an abstracted model to which other verification transformations can be applied. If UNSAT is proved on the speculative reduced model, then the original problem has been proved; else, a counterexample is produced, which can be used to refine the remaining equivalence classes. This method has been shown to be effective in some hard verification problems by reducing the model to a manageable size, allowing other methods, like BDD reachability to succeed.

### Programming environment

R&D of new CAD algorithms is very limited without prototyping them in a software system and rigorously testing the prototypes on industrial designs. In addition, an efficient and well-structured software source code is necessary to enable others to experiment with new ideas and adapt to their own environments and problems. ABC is a growing CAD tool for synthesis and verification of binary sequential logic circuits appearing in synchronous hardware designs. It combines scalable logic optimization based on And-Inverter Graphs (AIGs), optimal-delay DAG-based technology mapping for look-up tables and standard cells, and innovative algorithms for sequential synthesis and verification. ABC provides a base of very efficient and scalable algorithms and implementations of fundamental algorithms in synthesis and verification, including SAT, AIG packages, BMC, retiming, induction, etc. It provides experimental implementations of verification and synthesis algorithms and a programming environment for building new applications. The full source code and documentation on the use of ABC is publicly available [3].

Recently we saw the need for a more sophisticated scripting language to be tied into ABC. We chose PYTHON and now have a set of python functions that can better control a sequence of verification commands. The basis of this is to try something based on the current size of the circuit, interrogate the results and use this to decide on the next thing to try. We have incorporated all of our ideas for verification into a single python function called prove. It uses sequtntial synthesis via signal correspondence and retiming, phase abstraction, trimming and reparameterization, all the abstraction methods (CBA/PBA and speculative reduction), induction, reachability via interpolation or BDDs, extraction of implicit constraints and using them in signal correspondence. All of these techniques are being continually refined and improved via better and/or specialized SAT and AIG packages, improved memory use and fewer cache misses, etc. By applying prove to real industrial examples that we have access to through our industrial partners, we can experience where are the bottle-necks and which methods need more work. This allows to set our priorities for new research based on real applications.