## Library characterizer for opensource VLSI design

Shinichi Nishizawa, Waseda Univ. Toru Nakura, Fukuoka Univ.





## Outline

#### Background

- Proposed characterizer
  - Characterization flow
  - Characterize combinational cells
  - Characterize sequential cells
- Experimental results
- Conclusion and future work



### Open source VLSI design

- Design chips using open source EDAs
  - Google + Skywater (United States)
  - MakeLSI Project (Japan)
    - ☺ : Anyone can join this project, design their own circuits
    - Some important tools are not ready



https://www.slideshare.net/junichiakita9/make-70153385

## Digital circuit design and STA

Static Timing Analysis (STA): estimation of path delay

- Calculate max/min path delay integrating the cells delay
  - Max delay of *i*th node: :  $a_i = \max_{j \in \text{fanin}(i)} \{a_j\} + t_{\text{pd},i}$
- Need both timing information (.lib) calculation engine (STA)
  - STA: OpenSTA (Open-ROAD), sta (yosis)
  - .lib: (Need characterizer)





#### Related works

- Based on our survey) no good candidate is available
  - 1. Google-Skywater-PDK: brank for characterizer
    - Timing library in JSON (why?)
  - 2. pharosc[8]
    - From website The Art of Standard Cell Library Design
    - Binary file for each logic (?), WinSpice3
  - 3. AutoLibGen[9] in Intl. Conf. on Electronic Design
    - Binary file for each logic (?), HSPICE
  - 4. LiChEn[10] in *Euromicro Conf. on Digital System Design* 
    - Analyze logic function, C++ source in GitHub, Spector

DFF: 2 support non-set/reset FF, 3&4 not support

[8] G. Bronstein, et al.,, "Asic standard cell library design by graham petley.," 1991.

<sup>[9]</sup> I.K. Rachit and M.S. Bhat, "AutoLibGen: An open source tool for standard cell library characterization at 65nm technology," Intl. Conf. on Electronic Design, 2008. [10] M.T. Moreira, et al., "LiChEn: Automated electrical characterization of asynchronous standard cell libraries," Euromicro Conf, on Digital System Design, pp.933–940, 2013.

#### Proposed characterizer

#### Open source characterizer

- Language: Python3
- Simulator: ngspice, hspice
- Advantage
  - Characterize both combinational and Flip-Flops
    - Support Flip-Flops w/ async. set/reset
  - Easy to add function
    - Two analysis engine: for combinational and sequential
      - Do not prepare engines for each logic function
      - Use truth table to handle different logic functions



# **Operation flow**

- Setup library and cells
- Branch to comb. or seq.
  - Generate test bench based on the target logic function
  - Launch simulator
- Seq. cell need iteration
  - Find min. delay in setup/hold search



#### Characterize: combinational cell

- Propagation delay: input 50% to output 50%
- Transition delay: output 20% to output 80%
- Dynamic power\*1 : integrate current from input start 1% to output end 99%
- Static power : integ. current at the beginning of sim<sup>\*2</sup>
- Set simulation end time, time step
  - Set by designer, or use auto set

\* Parameters can be changed by designer. Initial parameters comes from Liberty User Guide V1
\*1: This should be in energy
\*2: Don't meas. input dependency



#### Characterize: sequential cell delay



#### Characterize: recovery removal

- Timing for asynchronous set/reset
  - Important for asynchronous circuit design
  - Recovery: operation correctness after the release of set/reset
     min. time from set/reset to inactivate state to clock
  - Removal: operation correctness of set/reset during the clock
    - Min. time clock at active state to set/reset in inactive state



# Library setting

#### Common setting for lib.

- Library name
- Prefix, suffix of cells
- Units (volt, current, cap.)
- Power name
- Temperature
- Voltage
- Logical threshold
- High/low threshold
- Operation directory
- Simulator

```
🛛 🗧 common settings for library
2 set_lib_name
                        R0HM180
3 set_dotlib_name
                        ROHM180.lib
4 set_verilog_name
                        ROHM180.v
5 set_cell_name_suffix ROHM180_
6 set_cell_name_prefix _V1
7 set_voltage_unit V
8 set_capacitance_unit pF
9 set_resistance_unit Ohm
L0 set_current_unit mA
L1 set_leakage_power_unit pW
L2 set_energy_unit fJ
.3 set_time_unit ns
4 set_vdd_name VDD
.5 set_vss_name VSS
L6 set_pwell_name VPW
.7 set_nwell_name VNW
8 # characterization conditions
L9 set_process typ
20 set_temperature 25
21 set_vdd_voltage 1.8
22 set_vss_voltage Ø
23 set_pwell_voltage Ø
24 set_nwell_voltage 1.8
S set_logic_threshold_high 0.8
26 set_logic_threshold_low 0.2
27 set_logic_high_to_low_threshold 0.5
28 set_logic_low_to_high_threshold 0.5
29 set_work_dir work
30 set_simulator /usr/local/bin/ngspice
31 set_run_sim true
32 set_mt_sim true
33 set_supress_message false
34 set_supress_sim_message false
S5 set_supress_debug_message true
36 set_energy_meas_low_threshold 0.01
37 set_energy_meas_high_threshold 0.99
38 set_energy_meas_time_extent 10
39 set_operating_conditions PVT_3P5V_25C
```

# Setting for each cell

#### Characterize cond.

- Add cell
- Input slope (in array)
- Output load (in array)
- Netlist
- Timestep\*. sim. end\*
- Flip-Flop need
  - Clock slope\*
  - Setup unit time\*
  - Hold unit time\*
  - \* Parameters can use auto set

| <pre>## add circuit _add_cell -n ROHM18INVP010 -l INV -i A</pre>             | -oY-fY=!A     |
|------------------------------------------------------------------------------|---------------|
| add_slope {0.1 0.7 4.9}                                                      |               |
| add_load {0.01 0.1 1.0}                                                      |               |
| <pre>add_netlist rohmlib/ROHM18INVP010.sp</pre>                              |               |
| <pre>add_model rohmlib/model_rohm180.sp add_simulation_timestep auto</pre>   |               |
| characterize                                                                 | Inverter      |
| export                                                                       |               |
| <pre>## add circuit add Class = DOUN12DED010 = 1 DEE DCDU</pre>              |               |
| add_flop -n ROHM18DFP010 <u>-l DFF_PCPU</u> -<br>-o Q -q Q QN -f Q=IQ QN=IQN | 1 DATA -C CLK |
| add_slope {0.1 0.7 4.9}                                                      |               |
| <pre>add_load {0.01 0.1 1.0} add_clock_slope auto</pre>                      |               |
| add_area 1                                                                   |               |
| <pre>add_netlist rohmlib/ROHM18DFP010.sp</pre>                               |               |
| add_model rohmlib/model_rohm180.sp                                           |               |
| <pre>add_simulation_timestep auto add_simulation_setup_auto</pre>            |               |
| add_simulation_hold_auto                                                     |               |
| characterize                                                                 | Flip-Flop     |
| export                                                                       |               |

Command example: logic function, in/out pins, storage, logic expression, slew/cap index simulation time step 14

## Netlist gen. and simulation (comb. cell)

*characterizeFiles()*: def. of logic func., its truth table
 *runCombInnOutm()*: set input/output pin setting

*runSpiceCombDelay()*: generate netlist, run spice





## Simulation result (combinational)

- Independent of the second second
  - (Waveform for debug)
- Read result of .measure statements, create timing and power LUT
   www.leak#branch vww.leak#branch vww.leak#branch



Result of .measure statement



## Netlist gen. and simulation (seq. cell)

- *characterizeFiles()*: def. of logic func., its truth table
   *runFlop()*: set input/output pin setting
- runSpiceFlopDelay():generate netlist, run spice



### Simulation result (sequential)

# ■ ngspice example at $D(0\rightarrow 1)$ , $CLK(0\rightarrow 1\rightarrow 0\rightarrow 1)$ , and $Q(X\rightarrow 0\rightarrow 1)$

- 1<sup>st</sup> CLK: set initial value, 2<sup>nd</sup> clock: measure D2Q delay
- (Waveform for debug)
- Read result of .measure statements, create timing and power LUT





## **Registered logic family**

#### Support simple logic functions and several Flip-Flops

| Family      | Logic function      | DFF code       | clk  | porality*1 | set  | rst  |
|-------------|---------------------|----------------|------|------------|------|------|
| Inv/Buf     | Inverter, Buffer    | DFF_PCPU       | pos. | pos.       |      |      |
| NAND        | NAND2, NAND3, NAND4 | *2<br>DFF_PCNU | pos. | nea        |      |      |
| NOR         | NOR2, NOR3, NOR4    |                | p03. | neg.       |      |      |
| NON         | NOR2, NOR3, NOR4    | DFF_NCPU       | pos. | neg.       |      |      |
| AND         | AND2, AND3, AND4    |                |      | Ū          |      |      |
| OR          | OR2, OR3, OR4       | DFF_NCN<br>U   | neg. | neg.       |      |      |
| And-Or-Inv. | AOI21, AOI22        | DFF_PCPU       | pos. | pos.       |      | neg. |
| Or-And-Inv. | OAI21, OAI22        | _NR            |      | -          |      |      |
| Exclusive   | XOR2, XNOR2         | DFF_PCPU       | pos. | pos.       | neg. | neg. |
| Selector    | SEL2                | _NRNS          |      |            |      |      |

\*1: Pos.: in/out are same direction (H/H,L/L). neg. in/out are opposite (H/L,L/H) \*2: D-Flip-Flop w/ pos. clock edge, positive porality

#### **Evaluation setup**

Cent OS 7.8 Ryzen 2990wx 3GHz 32core MEM 96GB, SSD 3TB

#### Use commercial 180nm for evaluation

|                    | Proposed char                    | SilconSmart    |                |  |  |
|--------------------|----------------------------------|----------------|----------------|--|--|
| Simulator          | ngspice                          | hspice         |                |  |  |
| #parallelism       | 1                                | 1              | 32             |  |  |
| Condition          | Typical (TT, 1.8V, 25°C)         |                |                |  |  |
| Input slew         | 0.01ns, 0.1ns, 1.0ns             |                |                |  |  |
| Output load        | 0.1pF, 0.7pF, 4.9pF              |                |                |  |  |
| High/low threshold | 1.44V, 0.36V(80%, 20%)           |                |                |  |  |
| Logic threshold    | 0.9V (50%)                       |                |                |  |  |
| Device under test  | Inverter, DFF w/ pos. edge clock |                |                |  |  |
| Runtime            | 147min.                          | 65 min. (2.3x) | 41 sec. (215x) |  |  |
|                    | 21                               | 11 11          |                |  |  |

Need large runtime: low parallelism, poor search algorithm, separate timing and power sim. (ngspice do not support nesting of .meas)

## Result (comb. cell)

- No difference in simulator (ngspice, HSPICE)
- Characterizers show some different result (HSPICE)
  - Propagation delay and transient delay are almost same (Max error 0.5%, 24%)
  - Intl. energy: large difference at large slew case (Max 418%)
  - Leak power: Matches
  - Input cap.: 10% difference



| TABLE II: | Capacitance | and | leakage | power | of | Inverter. |
|-----------|-------------|-----|---------|-------|----|-----------|
|-----------|-------------|-----|---------|-------|----|-----------|

| Characterizer          | libretto |          | Silicon | Smart |
|------------------------|----------|----------|---------|-------|
| Simulator              | ngspice  | e hspice |         |       |
| Leakage power [pW]     | 9.862    | 9.862    | 9.862   |       |
| Input capacitance [fF] | 4.120    | 4.063    | 4.599   |       |
|                        |          |          |         |       |

## Result (seq. cell)

#### Different result by characterizer

- C2Q delay: 1125%, 2407% pessimistic
- Cause: Setup/hold interdependence
  - Tight setup/hold worse other performance (C2Q, hold/setup)
  - Method we use: find min. setup then measure others
  - SiliconSmart:balance setup/hold
  - Need to balance setup/hold



## Conclusion

#### Full open characterizer

- Generate timing/power library as .lib
- Used for timing analysis (simulation, STA)
- Support both combinational and sequential cells
  - Combinational cell: register the function as truth table
  - Sequential cell: support asynchronous set/reset
- Evaluate delay, energy, and performance
  - Slow processing speed (1/215), need to use parallelism
  - Combinational cell: delay and energy are close
  - Sequential cell: large gap
- Checked .lib by LibraryCompiler Synopsys



#### Issues and future work

- Support more logic cells
  - Combinational cells: multi-output cell support
  - Sequential cells: latch, scan support
- Performance improvement
  - Use parallelism
  - Algorithm improvement for accuracy, speedup
- Code: https://github.com/snishizawa/libretto
- Contact: <u>nishizawa@aoni.waseda.jp</u>
  - Acknowledge: VDEC, Univ. of Tokyo, Nihon Synopsys G.K.
     Fund. from Logic Research and Fukuoka Univ.



## Appendix: right of Liberty format

#### Liberty format is open

Can access liberty user guide

SYNOPSYS® SILICON DESIGN & VERIFICATION SILICON IP SOFTWARE INTEGRITY ABOUT US Home V / Community V / Interoperability Programs V / Technology Access Program (TAP-in) Technology Access Program (TAP-in)

#### Synopsys Technology Access Program (TAP-in) promotes interoperability through open source licensing of interface formats. To access the

Synopsys Technology Access Program (TAP-in) promotes interoperability through open source licensing of interface formats. To access the following Synopsys formats, you must first register and accept the open source license agreement for each format you wish to download.

#### Interconnect Technology Format (ITF)

Provides detailed modeling of interconnect parasitic effects that enables designers to perform accurate parasitic extraction for timing, signal integrity, power and reliability signoff analysis.

#### Liberty

A gate-level modeling technology for timing, noise, power and test behavior that powers Synopsys Design Platform and sign-off tools. Learn more about Liberty and the Liberty Technical Advisory Board by visiting IEEE Industry Standards and Technology Organization.

#### https://www.synopsys.com/community/interoperability-programs/tap-in.html

#### LSIの回路ライブラリの標準化で、日米欧 ASICメーカと米Synopsys社が小競り合い

小島 郁太郎 日経マイクロデバイス

1999.02.17

Support V Global Sites V

Nikkei XTECH, 99/02/17, https://xtech.nikkei.com/dm/article/NEWS/20070402/129966/ 25

#### Liberty User Guides and Reference Manual Suite Version 2020.09

The Liberty User Guides and Reference Manual Suite includes the following documents:

- Liberty Release Notes
- Liberty User Guide Volume 1
- Liberty User Guide Volume 2
- Liberty Reference Manual

### What is characterizer ?

- Extract delay and power of combinational cell and sequential cells
  synopsys<sup>®</sup>
  - Generate SPICE file
  - Simulate w/ SPICE
  - Store the results as .lib
- Commercial tools are available





#### Q1: Why we need characterizer

Fab should provide this information, why?

- Ans.1: In some cases, special function cell will drastically improve Power Performance Area(PPA)
  - XOR help synthesis of Multiplier, reduce area 60% [a1]
- Ans.2: At the development of the fabrication process, cell library may not ready, or need modification

[a1] S. Nishizawa, et al., "Analysis and Comparison of XOR Cell Structures for Low Voltage Circuit Design," in *ISQED*, Mar. 2013, pp. 703–708.

#### Q2: What is the advantage to commercial tool?

- Ans.1: Nothing...
- Ans.2: Price ?
  - One fabless company using this characterizer
- Ans.3: Education ?
  - Can check generated netlist
- Ans.4: For open-EDAs, this characterizer will help designers to use their own standard cells