# A power router for gridded cell placement

Jiayuan He Computer Science, UT Austin Austin, TX, USA hejy@cs.utexas.edu Yihang Yang Electrical Engineering, Yale University New Haven, CT, USA yihang.yang@yale.edu Rajit Manohar Electrical Engineering, Yale University New Haven, CT, USA rajit.manohar@yale.edu

Abstract—Electronic design automation (EDA) tools for asynchronous circuits, especially physical layout tools are limited. To improve the adoption of asynchronous circuits, Dali, a gridded cell placement flow [1] was proposed. In the flow, the cell height and width can be any integer multiple of two grid values, thus traditional power grid generation for standard cells does not apply and a dedicated power router is needed. In this work, we developed an open-source DRC-clean power router that supports the gridded placement approach. The power router consists of two steps: power mesh generation and detailed power routing. The power router is verified by commercial tools and a chip tape-out, and is open-source on Github [2].

Index Terms—Placement, Routing, Power, Cell

## I. INTRODUCTION

Asynchronous Very-Large-Scale-Integration (VLSI) provides a method for implementing digital computation without a global clock signal to synchronize all operations. Several potential benefits have been demonstrated on asynchronous circuits over its synchronous counterparts, including power consumption, elastic pipelining, and variation robustness. However, the adoption of asynchronous circuits is limited due to the lack of dedicated EDA support.

Many asynchronous circuits use customized logic gates to achieve small area and high performance [3], [4]. Converting these gates to commercial standard cells often leads to larger area and power, and sometimes introduces additional timing constraints and results in an inferior final circuit [3], [5]. Therefore, asynchronous circuit developers usually adopt either full-custom design methodology [6]–[8] or a hybrid approach [5], [9]–[11].

The full-custom design methodology is able to achieve stateof-the-art power and performance but is infeasible for most circuit designs due to the long design time and high design cost. The hybrid approach manually creates a custom standard cell library and uses commercial EDA tools for layout. Previous studies have shown low spatial utilization in custom cell designs [5], because standard cell height is determined by the worst-case cell height across all the customized cells and small cells have to be expanded to the standard height.

To reduce the wasted area in the small cells, a gridded cell approach for asynchronous circuits was proposed [1]. A gridded cell is similar to a standard cell with two differences: (i) the cell height can be flexible and any integer multiple of a grid value, which is set to the metal track pitch in our design; and (ii) the VDD and GND pins are not at the top or bottom edge of the cell. The flexible cell height provides new degrees of freedom and can potentially contribute to better spatial utilization.

The gridded cell approach introduces two additional problems for placement: (i) well legalization, since simply abutting cells with different heights can cause design rule errors, (ii) power delivery, since cells with different heights are not able to auto-connect to the power/ground metal straps like those in standard cell approach. In this work, we present a solution to the power delivery problem by the use of a dedicated power supply router.

Dali adopts a cluster-based approach for well legalization problem. Figure 1 (a) and (b) show the result before and after well legalization respectively. In clustering, the cells are organized into 'mini-rows' and some mini-rows with the same length are put vertically into a column. The cells in a minirow have N-wells on the same side and P-well on the other side. Because of the clustering, the dedicated power routing must be adaptive to the height and length of the mini-rows of placement.

Our power router consists of two steps. First, it generates power mesh according to the placement provided by Dali, as shown in Figure 1 (c). Then it runs detailed power routing to connect pins to the closest straps, as shown in Figure 1 (d).

To verify the result of the power router, we run a commercial detailed router for signal nets, then use a commercial DRC signoff tool to verify that the result is DRC clean. And we have taped out an asynchronous microprocessor (a stack machine) with this flow in a 65nm CMOS process.

## II. BACKGROUND

#### A. Standard Cell and Row-based Placement

There are two features in standard cell design: (i) almost all cells have the same standard height and (ii) VDD and GND pins are at the top or bottom edge of the cell. Because of these, row-based placement, also known as standard cell placement is commonly adopted for standard cell design.

In the row-based approach during floor planning, rows with the standard height are created in the chip area to contain standard cells. The floor planning generates N/P-wells and VDD/GND straps for the rows. VDD/GND straps are placed alternately on the boundary of rows and auto-connect to the VDD/GND pins of standard cells during placement. Rowbased placement can achieve a compact physical layout and high spatial utilization if every standard cell conforms to these properties [5].



Fig. 1. Placement and power routing result: (a) placement before well legalization, (b) placement after well legalization, (c) Power mesh generation and (d) Detailed power router

There has been extensive study on standard cell placement in the past. Academic standard cell placers include: TimberWolf [12], FengShui5 [13], Capo [14], NTUPlace3 [15], FastPlace [16], SimPL [17], POLAR [18], ePlace [19] and RePlAce [20] etc.

## B. Gridded Cell and Cluster-based Placement

The layout of a gridded cell is similar to its standard cell counterpart, with two differences: flexible height and VDD/GND pins not at the top or bottom edge of the cell, which introduces two new problems: N/P-well DRC violations and power delivery.

Figure 1 (a) shows the placement of cells after global placement and a legalization step to resolve overlaps, without resolving the N/P-well DRC violations. In this example, N-wells are colored in green and P-wells are colored in yellow. The N/P-wells are disorganized and lead to many DRC violations. To resolve the N/P-well problem, Dali uses a cluster-based placement approach. It first divides the placement region into sub-regions, which are the three columns in Figure 1 (b). Cells are grouped into "mini-rows" within each column. The placement result is shown in Figure 1 (b).

As the cell and mini-row groupings are not known apriori, the power supply routing problem has to be addressed by a separate phase. We present our solution to this problem, which is implemented in a dedicated power routing tool.

# III. POWER ROUTING FOR GRIDDED CELL PLACEMENT

The dedicated power router for gridded cell placement consists of two steps: (i) power mesh generation, to generate power and ground mesh adaptive to the clustering placement, and (ii) detailed power routing, to connect VDD/GND pins in the gridded cell to the closest straps.

## A. Power Mesh Generation

Power mesh generation reads cluster information from Dali and generates appropriate mesh accordingly. First, power mesh generation builds chip-height vertical mesh which is placed in a pair of VDD/GND wires between columns. Second, it generates column-width horizontal mesh between rows inside a column and alternates between VDD and GND. As shown in Figure 1 (c), these meshes partition the chip area into many rows and provide power delivery to each row.

The metal layer and width of the power mesh can be set by the user. Usually, power mesh has a bigger width than signal wires so DRC violations especially parallel run length violations need to be considered. Figure 2 shows the detours on the connection between the horizontal mesh and the vertical mesh.



Fig. 2. Horizontal mesh change connection points when there are DRC violations

Similar to power design for standard cells, wide highlayer straps are generated as well to reduce IR drop. Our power router generates horizontal wide straps on the highest horizontal layer and connects to the vertical mesh with vias as many as possible. The high-layers straps are also good for the connection to pad frames during layout finishing. The wide high-layer straps are not shown in Figure 1.

## B. Detailed Power Routing

In gridded cell design, the VDD/GND pins are not at the top or bottom edge of the cell, which makes a step for simple detailed routing necessary in power routing.

The detailed power routing first finds the closest points of VDD/GND pins to the corresponding horizontal mesh on each track, and then connect one of the points to the mesh by pattern

| bmk     | #cells | #nets | Standard cell approach |          |         |            | Gridded cell approach |          |         |            |                              |
|---------|--------|-------|------------------------|----------|---------|------------|-----------------------|----------|---------|------------|------------------------------|
|         |        |       | cell area              | die area | HPWL    | wirelength | cell area             | die area | HPWL    | wirelength | wirelength w/o power routing |
| des1    | 16k    | 14k   | 134689                 | 138756   | 216782  | 267017     | 80451                 | 115801   | 195721  | 254938     | 250989                       |
| des2    | 19k    | 20k   | 84314                  | 86724    | 228879  | 314284     | 51863                 | 78792    | 229656  | 330857     | 320505                       |
| des3    | 81k    | 71k   | 754135                 | 776949   | 1044571 | 1287450    | 447645                | 643685   | 980099  | 1242999    | 1221752                      |
| des4    | 718k   | 627k  | 6721014                | 6927911  | 8599981 | 10805787   | 3990622               | 5744170  | 8832368 | 11169479   | 10969942                     |
| ratio   | -      | -     | 1.00                   | 1.00     | 1.00    | 1.00       | 0.60                  | 0.85     | 0.97    | 1.00       | 0.98                         |
| TABLE I |        |       |                        |          |         |            |                       |          |         |            |                              |

COMPARISON BETWEEN GRIDDED CELL DESIGNS AND THEIR STANDARD CELL COUNTERPARTS.

routing. The detailed power routing generates standard-width wires on track inside a cell respecting all design rules. It tries to use the lowest metal layer first to avoid consuming the space for signal routing. If no such solution is found, then higher layers are used. Note that it only routes inside a cell: a solution using the free space of other cells is not allowed. Figure 1 (d) shows an example after detailed power routing.

Two major DRC violations considered in detailed power routing are spacing with other pins in the same cell and minarea when using higher layers. In the future, we plan to extend the detailed power routing part with maze routing which can utilize free space from neighboring cells.

## IV. EXPERIMENTAL RESULTS

To validate the correctness of the power router, we run a commercial detailed router for signal nets after power routing, then use a commercial DRC signoff tool to verify it is DRC clean. And we have taped out an asynchronous microprocessor (a stack machine) with this flow in a 65nm CMOS process.

## A. Compare with Standard Cell Approach

Table I shows the result of end-to-end flow (Dali + power router). In the benchmarks, des2 is a RISC-V microprocessor and the rest are asynchronous stack-based microprocessors generated with different parameters (des1, des3, des4). The baseline standard cell approach is performed on commercial placer + router and its results are normalized to 1.

The gridded cell approach achieves 15% less die area because of its flexible gridded height. Since HPWL and wirelength of the gridded cell approach are very close to those of the standard cell approach, gridded cell flow introduces negligible routing quality reduction for asynchronous designs. The last column presents the wirelength without power router, i.e. commercial detailed router directly runs after placement. These results show the power router introduces about 2% of extra wirelength due to its usage of the routing space.

## B. An Example of power routing result

Figure 3 shows the local view and the global view of a power routing result. In (a), the two wide orange wires are the highest-layer straps to reduce IR drop. Vertical red wires and horizontal yellow wires are power mesh for clustering. The short green vertical wires are standard-width wires to connect VDD/GND pins to the mesh.



Fig. 3. Screenshots of a power routing result

## V. CONCLUSION

This paper presents an open-source power router for gridded cell placement flow. To adapt to the features of gridded cell layout and placement, the power router contains two steps: power mesh generation and detailed power routing. It generates power mesh according to the structure of cluster-based placement and connects VDD/GND pins to the corresponding mesh. The power router is verified to be DRC clean by commercial tools and contributes to a chip tape-out in 65nm technology.

In the future, we plan to update the power router with more design rules in newer technologies and implement maze routing in detailed router routing to leverage the free space in neighboring cells.

## REFERENCES

- Y. Yang, J. He, and R. Manohar, "Dali: A gridded cell placement flow," in 2020 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, 2020.
- [2] https://github.com/asyncvlsi/PWRoute.
- [3] P. A. Beerel, R. O. Ozdag, and M. Ferretti, A designer's guide to asynchronous VLSI. Cambridge University Press, 2010.
- [4] A. M. Lines, "Pipelined asynchronous circuits," 1998.
- [5] R. Karmazin, C. T. O. Otero, and R. Manohar, "celltk: Automated layout for asynchronous circuits with nonstandard cells," in 2013 IEEE 19th International Symposium on Asynchronous Circuits and Systems. IEEE, 2013, pp. 58–66.
- [6] A. J. Martin, A. Lines, R. Manohar, M. Nystrom, P. Penzes, R. Southworth, U. Cummings, and T. K. Lee, "The design of an asynchronous mips r3000 microprocessor," in *Proceedings Seventeenth Conference on Advanced Research in VLSI*. IEEE, 1997, pp. 164–181.
- [7] P. A. Merolla, J. V. Arthur, R. Alvarez-Icaza, A. S. Cassidy, J. Sawada, F. Akopyan, B. L. Jackson, N. Imam, C. Guo, Y. Nakamura *et al.*, "A million spiking-neuron integrated circuit with a scalable communication network and interface," *Science*, vol. 345, no. 6197, pp. 668–673, 2014.

- [8] S. M. Sait and H. Youssef, VLSI physical design automation: theory and practice. World Scientific Publishing Company, 1999, vol. 6.
- [9] P. A. Beerel, G. D. Dimou, and A. M. Lines, "Proteus: An ASIC flow for GHz asynchronous designs," *IEEE Design and Test of Computers*, vol. 28, no. 5, pp. 36–51, 2011.
- [10] M. Moreira, B. Oliveira, J. Pontes, and N. Calazans, "A 65nm standard cell set and flow dedicated to automated asynchronous circuits design," in 2011 IEEE International SOC Conference. IEEE, 2011, pp. 99–104.
- [11] M. Trevisan, M. Arendt, A. Ziesemer, R. Reis, and N. L. V. Calazans, "Automated synthesis of cell libraries for asynchronous circuits," in *Proceedings of the 27th Symposium on Integrated Circuits and Systems Design*, 2014, pp. 1–7.
- [12] C. Sechen and A. Sangiovanni-Vincentelli, "The timberwolf placement and routing package," *IEEE Journal of Solid-State Circuits*, vol. 20, no. 2, pp. 510–522, 1985.
- [13] A. R. Agnihotri, S. Ono, and P. H. Madden, "Recursive bisection placement: Feng shui 5.0 implementation details," in *Proceedings of the* 2005 international symposium on *Physical design*, 2005, pp. 230–232.
- [14] J. A. Roy, D. A. Papa, S. N. Adya, H. H. Chan, A. N. Ng, J. F. Lu, and I. L. Markov, "Capo: robust and scalable open-source mincut floorplacer," in *Proceedings of the 2005 international symposium on Physical design*, 2005, pp. 224–226.
- [15] T.-C. Chen, Z.-W. Jiang, T.-C. Hsu, H.-C. Chen, and Y.-W. Chang, "Ntuplace3: An analytical placer for large-scale mixed-size designs with preplaced blocks and density constraints," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 27, no. 7, pp. 1228–1240, 2008.
- [16] N. Viswanathan and C.-N. Chu, "Fastplace: efficient analytical placement using cell shifting, iterative local refinement, and a hybrid net model," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 24, no. 5, pp. 722–733, 2005.
- [17] M.-C. Kim, D.-J. Lee, and I. L. Markov, "Simpl: An effective placement algorithm," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 31, no. 1, pp. 50–60, 2011.
- [18] T. Lin, C. Chu, J. R. Shinnerl, I. Bustany, and I. Nedelchev, "Polar: Placement based on novel rough legalization and refinement," in 2013 IEEE/ACM International Conference on Computer-Aided Design (IC-CAD). IEEE, 2013, pp. 357–362.
- [19] J. Lu, P. Chen, C.-C. Chang, L. Sha, J. Dennis, H. Huang, C.-C. Teng, and C.-K. Cheng, "eplace: Electrostatics based placement using nesterov's method," in 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC). IEEE, 2014, pp. 1–6.
- [20] C.-K. Cheng, A. B. Kahng, I. Kang, and L. Wang, "Replace: Advancing solution quality and routability validation in global placement," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 38, no. 9, pp. 1717–1730, 2018.