Towards Formal Relaxed Equivalence Checking in Approximate Computing Methodology

Lukáš Holík, Ondřej Lengál, Adam Rogalewicz Lukáš Sekanina, Zdeněk VAŠÍČEK, Tomáš Vojnar

> Faculty of Information Technology Brno University of technology vasicek@fit.vutbr.cz



# Outline



- Approximate computing
- Functional approximations
- Relaxed equivalence checking
- Towards relaxed equivalence checking

# The idea of approximate computing

- T FIT
- The requirement of exact numerical or Boolean equivalence between the specification and implementation of a circuit is relaxed in order to achieve improvements in performance or energy efficiency [Venkatesan et al., 2011].
- The requirement of exactness can be relaxed because of:
  - 1. limited perceptual capability of humans
  - 2. a golden result is impossible (or difficult) to define
  - worst-case design would lead to large power consumption (process parameter variations are large in sub-45 nm technologies)

- The concept of approximate computing has been developed in different ways and at various levels of computing stack, for example
- Software-level approximations
  - EnerJ: approximate data types in JAVA [Sampson el al., 2011]
  - Neural network replaces general purpose code [Esmaeilzadeh el a., 2013]
  - Axilog: language annotations [Yazdanbakhs, 2015]
- Specialized processors supporting approximate computing
  - Improving Efficiency of Extensible Processors by Using Approximate Custom Instructions, [Kamal et al., 2014]
- Hardware-level approximations
  - over-scaling based approximations
  - functional approximations

 Principle: Approximations are introduced by over-clocking or voltage over-scaling (i.e. the circuits operate correctly under normal conditions).



• Limits: The over-scaling based approach works well when there exists few long paths in the target circuit.

# **Functional approximations**



- Goal: To implement a slightly different Boolean function that has faster or more power-efficient implementation.
- Two main methodologies were developed
  - Ad hoc methods optimizing a particular component, e.g. multipliers [Kulkarni, 2011], adders [Gupta, 2013], median filters [Monajati, 2015]
  - Design automation methods

SASIMI: Substitute-and-simplify [Venkataramani et al, 2013] SALSA: Systematic logic synthesis using Quality Constraint Circuits [Venkataramani et al, 2012] ABACUS: AST-based approach [Nepal et al., 2014]

## Design automation methods – issues and limits

- T FIT
- Although approximate computing techniques have shown significant promise, moving them to the mainstream will require several issues to be addressed, foremost among which is the issue of modeling and analysis of accuracy. [Venkatesan, 2011]

| Approach    | Circuits                              | Accuracy measured using               |
|-------------|---------------------------------------|---------------------------------------|
| ABACUS 2014 | FIR filter, perceptron, block matcher | training data                         |
| ASLAN 2014  | MPEG encoder                          | sequential quality constraint circuit |
| SASIMI 2013 | Benchmarks, multipliers, adders, SAD  | training data                         |
| SALSA 2012  | Adders, multipliers, FIR, IIR, DCT    | quality constraint circuit            |

# Design automation methods – issues and limits

- FIT
- Several approximate designs have been proposed that compromise accuracy in different ways; unfortunately there is no simple and systematic analysis methodology to evaluate quality of candidate designs and compare them with conventional designs or with each other.

The evaluation of quality based on training data cannot

- guarantee that a given approximate implementation meets my accuracy requirements.
- ensure that there are no bugs in an approximate implementation.
- be applied when only a neglible error is acceptable.

# Relaxed equivalence checking

- Traditional formal verification techniques (combinational and sequential equivalence checking) are designed to prove exact Boolean equivalence between specification and implementation and do not directly address the previous questions.
- There is a need for relaxed equivalence checking, an approach which is able to prove the equivalence up to some bound.

# Combinational Equivalence Checking

- T FIT
- Goal: To prove that  $F1(x) \equiv F2(x)$  or that  $F1(x) \neq F2(x)$





#### SAT-based Combinational Equivalence Checking





# result: SATISFIABLE model: 001 111110101 $\swarrow$ $\downarrow$ $\checkmark$ $x_1$ $x_2$ $x_3$

FIT

## BDD-based Combinational Equivalence Checking







FIT

### How to determine arithmetic error?

**ROBDD** 

 $\mathbf{x}\mathbf{2}$ 

 $\mathbf{x1}$ 

x3





To calculate average arithmetic error,  $\epsilon_i$  can be successively increased from 1 to  $\epsilon_{max}$ 





- Known issues of the common methods in verification of arithmetic circuits
  - BDD application to verification of arithmetic circuits is limited by the prohibitively high memory requirement for complex arithmetic circuits, such as multipliers.
  - SAT-based approaches are known to be computationally expensive and not scalable.
- Proposed approach: Use a pseudo-Boolean polynomial representation [Ciesielski, 2015] of arithmetic circuits to determine the quality of an approximation.

# Combinational equivalence checking using polynoials





Specification:

pBSolver

z0\*2^0 + z1\*2^1 + z2\*2^2 + z3\*2^3 - (a0\*b0+2\*a0\*b1+2\*a1\*b0+4\*a1\*b1)

result: 0

Verification of Gate-level Arithmetic Circuits by Function Extraction [M. Ciesielski, C. Yu, D. Liu, and W. Brown, 2015]

# Combinational equivalence checking using polynoials





pseudo-Booelan polynomial

- z3 = 0 z2 = s5\_and2 z1 = s8\_or2 z0 = s6\_and2 s8\_or2 = s4\_and2 + s7\_and2 - s4\_and2 \* s7\_and2
- s7\_and2 = b0 \* a1 s6\_and2 = a0 \* b0 s5\_and2 = b1 \* a1 s4\_and2 = b1 \* a0

2-bit approximate multiplier [Kulkarni et al., 2011]

#### Specification:

z0\*2^0 + z1\*2^1 + z2\*2^2 + z3\*2^3 - (a0\*b0+2\*a0\*b1+2\*a1\*b0+4\*a1\*b1)



result: -2\*a0\*a1\*b0\*b1 Error is equal to -2 IFF a0=a1=b0=b1=1

#### 4-bit approximate multiplier [Vasicek, 2014]





z7 = s71\_or2 z6 = s69 xor2 z5 = s66\_xor2 z4 = s60 xor2 z3 = s56\_xor2 z2 = s34 xor2 z1 = s48\_or2 z0 = s29 and2 s71 or2 = s65 and2 + s70 and2 - s65 and2 \* s70 and2 s70\_and2 = s44\_and2 \* s68\_or2 s69\_xor2 = s64\_xor2+s68\_or2 - 2\*s64\_xor2\*s68\_or2 s68\_or2 = s63\_and2 + s67\_and2 - s63\_and2 \* s67\_and2 s67 and2 = s62 xor2 \* s61 and2 s66\_xor2 = s62\_xor2+s61\_and2 - 2\*s62\_xor2\*s61\_and2 s65 and2 = s37 and2 \* s59 or2 s64\_xor2 = s44\_and2+s59\_or2 - 2\*s44\_and2\*s59\_or2 s63 and2 = s55 xor2 \* s58 or2 s62 xor2 = s55 xor2+s58 or2 - 2\*s55 xor2\*s58 or2 s61 and2 = s53 xor2 \* s57 or2 s60 xor2 = s53 xor2+s57 or2 - 2\*s53 xor2\*s57 or2 s59\_or2 = s32\_and2 + s54\_and2 - s32\_and2 \* s54\_and2 s58 or2 = s52 and2 + s51 and2 - s52 and2 \* s51 and2 s57\_or2 = s50\_and2 + s46\_and2 - s50\_and2 \* s46\_and2

s56\_xor2 = s45\_xor2+s13\_and2 - 2\*s45\_xor2\*s13\_and2 s55 xor2 = s49 xor2+s37 and2 - 2\*s49 xor2\*s37 and2 s54\_and2 = s49\_xor2 \* s37\_and2 s53 xor2 = s47 xor2+s16 and2 - 2\*s47 xor2\*s16 and2 s52\_and2 = s40\_xor2 \* s39\_and2 s51 and2 = s47 xor2 \* s16 and2 s50\_and2 = s38\_xor2 \* s42\_or2 s49 xor2 = s22 and2+s35 or2 - 2\*s22 and2\*s35 or2 s48 or2 = s25 and2 + s30 or2 - s25 and2 \* s30 or2 s47\_xor2 = s40\_xor2+s39\_and2 - 2\*s40\_xor2\*s39\_and2 s46\_and2 = s45\_xor2 \* s13\_and2 s45\_xor2 = s38\_xor2+s42\_or2 - 2\*s38\_xor2\*s42\_or2 s44 and2 = b3 \* a3 s42\_or2 = s27\_and2 + s26\_and2 - s27\_and2 \* s26\_and2 s40 xor2 = s20 and2+s18 and2 - 2\*s20 and2\*s18 and2 s39\_and2 = s28\_xor2 \* s19\_and2 s38 xor2 = s28 xor2+s19 and2 - 2\*s28 xor2\*s19 and2 s37 and2 = b3 \* a2 s35\_or2 = s25\_and2 + s32\_and2 - s25\_and2 \* s32\_and2 s34 xor2 = s11 and2+s31 nor2 - 2\*s11 and2\*s31 nor2 s32\_and2 = s18\_and2 \* s15\_and2 s31 nor2 = 1-s27 and2-s24 inva+s27 and2\*s24 inva s30\_or2 = s21\_and2 + s8\_and2 - s21\_and2 \* s8\_and2

s29\_and2 = s9\_and2 \* a0 s28 xor2 = s9 and2+s17 and2 - 2\*s9 and2\*s17 and2 s27\_and2 = s10\_and2 \* s12\_and2 s26 and2 = s11 and2 \* s23 or2 s25 and2 = s9 and2 \* s17 and2 s24 inva = 1-s23 or2 s23\_or2 = s10\_and2 + s12\_and2 - s10\_and2 \* s12\_and2 s22 and2 = a3 \* b2 s21 and2 = b1 \* a0 s20\_and2 = s15\_and2 \* s14\_inva s19 and2 = b2 \* a1 s18 and2 = a2 \* b2 s17 and2 = a2 \* b1 s16\_and2 = b3 \* a1 s15 and2 = b1 \* a3 s14\_inva = 1-s12\_and2 s13 and2 = b3 \* a0 s12 and2 = a2 \* b0 s11 and2 = a0 \* b2 s10 and2 = a1 \* b1 s9 and2 = a3 \* b0 s8 and2 = b0 \* a1

#### **Specification:**

z0\*2^0 + z1\*2^1 + z2\*2^2 + z3\*2^3 + z4\*2^4 + z5\*2^5 + z6\*2^6 + z7\*2^7 -(a0\*b0+2\*a0\*b1+2\*a1\*b0+4\*a0\*b2+4\*a1\*b1+4\*a2\*b0+8\*a0\*b3+8\*a1\*b2+8\*a2\*b1+8\*a3\*b0+16\*a1\*b3+16\*a2 \*b2+16\*a3\*b1+32\*a2\*b3+32\*a3\*b2+64\*a3\*b3)

#### Result:

E = -a0\*b0 + a2\*a3\*b0\*b1 + 2\*a0\*a1\*b0\*b1 Input variables: a0,a1,a2,a3 b0,b1,b2,b3

#### Quality metrics:

Worst-case error: 2

Number of invalid responses:  $4 \cdot 17 = 68$ 

Average error: 4 · 18/68 = 1.05





- A notion of formal relaxed equivalence checking has been briefly introduced.
- Even if some of the equivalence checking methods can be directly extended to support relaxed equivalence checking, the scalability of common approaches represents a limiting factor.
- A suitable formal relaxed equivalence checking method needs to be developed.

# Thank You For Your Attention !