Engineering Electrical and Electronic Engineering

VLSI and FPGA Design Techniques

Description

This cluster of papers focuses on the design, optimization, and challenges of field-programmable gate arrays (FPGAs) and application-specific integrated circuits (ASICs). It covers topics such as FPGA architecture, analog circuit design, graph partitioning for parallel computing, power optimization, CMOS design, and dynamic load balancing in computational mechanics.

Keywords

FPGA; ASIC; Placement; Routing; Analog Circuits; Graph Partitioning; Power Optimization; CMOS Design; Dynamic Load Balancing; Geometric Programming

Abstract In this article, we use orthogonal arrays (OA's) to construct Latin hypercubes. Besides preserving the univariate stratification properties of Latin hypercubes, these strength r OA-based Latin hypercubes also stratify … Abstract In this article, we use orthogonal arrays (OA's) to construct Latin hypercubes. Besides preserving the univariate stratification properties of Latin hypercubes, these strength r OA-based Latin hypercubes also stratify each r-dimensional margin. Therefore, such OA-based Latin hypercubes provide more suitable designs for computer experiments and numerical integration than do general Latin hypercubes. We prove that when used for integration, the sampling scheme with OA-based Latin hypercubes offers a substantial improvement over Latin hypercube sampling. Key Words: Computer experimentsNumerical integrationOrthogonal arrays
Partitions of the set of blocks of a computer logic graph, also called a block graph, into subsets called modules demonstrate that a two-region relationship exists between P, the average … Partitions of the set of blocks of a computer logic graph, also called a block graph, into subsets called modules demonstrate that a two-region relationship exists between P, the average number of pins per module, and B, the average number of blocks per module. In the first region, P = KBr, where K is the average number of pins per block and 0.57 ≤ r ≤ 0.75. In the second region, that is, where the number of modules is small (i.e., 1-5), P is less than predicted by the above formula and is given by a more complex relationship. These conclusions resulted from controlled partitioning experiments performed using a computer program to partition four logic graphs varying in size from 500 to 13 000 circuits representing three different computers. The size of a block varied from one NOR circuit in one of the block graphs to a 30-circuit chip in one of the other block graphs.
In the layout design of LSI chips, channel routing is one of the key problems. The problem is to route a specified net list between two rows of terminals across … In the layout design of LSI chips, channel routing is one of the key problems. The problem is to route a specified net list between two rows of terminals across a two-layer channel. Nets are routed with horizontal segments on one layer and vertical segments on the other. Connections between two layers are made through via holes. Two new algorithms are proposed. These algorithms merge nets instead of assigning horizontal tracks to individual nets. The algorithms were coded in Fortran and implemented on a VAX 11/780 computer. Experimental results are quite encouraging. Both programs generated optimal solutions in 6 out of 8 cases, using examples in previously published papers. The computation times of the algorithms for a typical channel (300 terminals, 70 nets) are 1.0 and 2.1 s, respectively.
Abstract An algorithm for solving the Steiner problem on a finite undirected graph is presented. This algorithm computes the set of graph arcs of minimum total length needed to connect … Abstract An algorithm for solving the Steiner problem on a finite undirected graph is presented. This algorithm computes the set of graph arcs of minimum total length needed to connect a specified set of k graph nodes. If the entire graph contains n nodes, the algorithm requires time proportional to n 3 /2 + n 2 (2 k‐1 ‐ k ‐ 1) + n(3 k‐1 ‐ 2 k + 3)/2. The time requirement above includes the term n 3 /2, which can be eliminated if the set of shortest paths connecting each pair of nodes in the graph is available.
Efficient algorithms are presented for partitioning a graph into connected components, biconnected components and simple paths. The algorithm for partitioning of a graph into simple paths of iterative and each … Efficient algorithms are presented for partitioning a graph into connected components, biconnected components and simple paths. The algorithm for partitioning of a graph into simple paths of iterative and each iteration produces a new path between two vertices already on paths. (The start vertex can be specified dynamically.) If V is the number of vertices and E is the number of edges, each algorithm requires time and space proportional to max ( V, E ) when executed on a random access computer.
article Free Access Share on `` Strong '' NP-Completeness Results: Motivation, Examples, and Implications Authors: M. R. Garey Bell Laboratories, 600 Mountain Ave, Murray Hill, NJ Bell Laboratories, 600 Mountain … article Free Access Share on `` Strong '' NP-Completeness Results: Motivation, Examples, and Implications Authors: M. R. Garey Bell Laboratories, 600 Mountain Ave, Murray Hill, NJ Bell Laboratories, 600 Mountain Ave, Murray Hill, NJView Profile , D. S. Johnson Bell Laboratories, 600 Mountain Ave, Murray Hill, NJ Bell Laboratories, 600 Mountain Ave, Murray Hill, NJView Profile Authors Info & Claims Journal of the ACMVolume 25Issue 3July 1978 pp 499–508https://doi.org/10.1145/322077.322090Published:01 July 1978Publication History 431citation4,814DownloadsMetricsTotal Citations431Total Downloads4,814Last 12 Months418Last 6 weeks77 Get Citation AlertsNew Citation Alert added!This alert has been successfully added and will be sent to:You will be notified whenever a record that you have chosen has been cited.To manage your alert preferences, click on the button below.Manage my Alerts New Citation Alert!Please log in to your account Save to BinderSave to BinderCreate a New BinderNameCancelCreateExport CitationPublisher SiteeReaderPDF
An optimum rectilinear Steiner tree for a set A of points in the plane is a tree which interconnects A using horizontal and vertical lines of shortest possible total length. … An optimum rectilinear Steiner tree for a set A of points in the plane is a tree which interconnects A using horizontal and vertical lines of shortest possible total length. Such trees correspond to single net wiring patterns on printed backplanes which minimize total wire length. We show that the problem of determining this minimum length, given A, is $NP$-complete. Thus the problem of finding optimum rectilinear Steiner trees is probably computationally hopeless, and the emphasis of the literature for this problem on heuristics and special case algorithms is well justified. A number of intermediary lemmas concerning the $NP$-completeness of certain graph-theoretic problems are proved and may be of independent interest.
In array processors it is important to map problem modules onto processors such that modules that communicate with each other lie, as far as possible, on adjacent processors. This mapping … In array processors it is important to map problem modules onto processors such that modules that communicate with each other lie, as far as possible, on adjacent processors. This mapping problem is formulated in graph theoretic terms and shown to be equivalent, in its most general form, to the graph isomorphism problem. The problem is also very similar to the bandwidth reduction problem for sparse matrices and to the quadratic assignment problem.
Algorithms for finding shortest paths are presented which are faster than algorithms previously known on networks which are relatively sparse in arcs. Known results which the results of this paper … Algorithms for finding shortest paths are presented which are faster than algorithms previously known on networks which are relatively sparse in arcs. Known results which the results of this paper extend are surveyed briefly and analyzed. A new implementation for priority queues is employed, and a class of “arc set partition” algorithms is introduced. For the single source problem on networks with nonnegative arcs a running time of O (min( n 1+1/ k + e , n + e ) log n )) is achieved, where there are n nodes and e arcs, and k is a fixed integer satisfying k > 0. This bound is O ( e ) on dense networks. For the single source and all pairs problem on unrestricted networks the running time is O (min( n 2+1/ k + ne , n 2 log n + ne log n ).
Recently, a number of researchers have investigated a class of graph partitioning algorithms that reduce the size of the graph by collapsing vertices and edges, partition the smaller graph, and … Recently, a number of researchers have investigated a class of graph partitioning algorithms that reduce the size of the graph by collapsing vertices and edges, partition the smaller graph, and then uncoarsen it to construct a partition for the original graph [Bui and Jones, Proc. of the 6th SIAM Conference on Parallel Processing for Scientific Computing, 1993, 445--452; Hendrickson and Leland, A Multilevel Algorithm for Partitioning Graphs, Tech. report SAND 93-1301, Sandia National Laboratories, Albuquerque, NM, 1993]. From the early work it was clear that multilevel techniques held great promise; however, it was not knownif they can be made to consistently produce high quality partitions for graphs arising in a wide range of application domains. We investigate the effectiveness of many different choices for all three phases: coarsening, partition of the coarsest graph, and refinement. In particular, we present a new coarsening heuristic (called heavy-edge heuristic) for which the size of the partition of the coarse graph is within a small factor of the size of the final partition obtained after multilevel refinement. We also present a much faster variation of the Kernighan--Lin (KL) algorithm for refining during uncoarsening. We test our scheme on a large number of graphs arising in various domains including finite element methods, linear programming, VLSI, and transportation. Our experiments show that our scheme produces partitions that are consistently better than those produced by spectral partitioning schemes in substantially smaller time. Also, when our scheme is used to compute fill-reducing orderings for sparse matrices, it produces orderings that have substantially smaller fill than the widely used multiple minimum degree algorithm.
In this paper, we present a new hypergraph-partitioning algorithm that is based on the multilevel paradigm. In the multilevel paradigm, a sequence of successively coarser hypergraphs is constructed. A bisection … In this paper, we present a new hypergraph-partitioning algorithm that is based on the multilevel paradigm. In the multilevel paradigm, a sequence of successively coarser hypergraphs is constructed. A bisection of the smallest hypergraph is computed and it is used to obtain a bisection of the original hypergraph by successively projecting and refining the bisection to the next level finer hypergraph. We have developed new hypergraph coarsening strategies within the multilevel framework. We evaluate their performance both in terms of the size of the hyperedge cut on the bisection, as well as on the run time for a number of very large scale integration circuits. Our experiments show that our multilevel hypergraph-partitioning algorithm produces high-quality partitioning in a relatively small amount of time. The quality of the partitionings produced by our scheme are on the average 6%-23% better than those produced by other state-of-the-art schemes. Furthermore, our partitioning algorithm is significantly faster, often requiring 4-10 times less time than that required by the other schemes. Our multilevel hypergraph-partitioning algorithm scales very well for large hypergraphs. Hypergraphs with over 100 000 vertices can be bisected in a few minutes on today's workstations. Also, on the large hypergraphs, our scheme outperforms other schemes (in hyperedge cut) quite consistently with larger margins (9%-30%).
Graph partitioning is a fundamental problem in many scientific settings. This document describes the capabilities and operation of Chaco, a software package designed to partition graphs. Chaco allows for recursive … Graph partitioning is a fundamental problem in many scientific settings. This document describes the capabilities and operation of Chaco, a software package designed to partition graphs. Chaco allows for recursive application of any of several different methods for finding small edge separators in weighted graphs. These methods include inertial, spectral, Kernighan-Lin and multilevel methods in addition to several simpler strategies. Each of these methods can be used to partition the graph into two, four or eight pieces at each level of recursion. In addition, the Kernighan-Lin method can be used to improve partitions generated by any of the other methods. Brief descriptions of these methods are provided, along with references to relevant literature. The user interface, input/output formats and appropriate settings for a variety of code parameters are discussed in detail, and some suggestions on algorithm selection are offered.
A Steiner minimal tree for given points $A_1 , \cdots ,A_n $ in the plane is a tree which interconnects these points using lines of shortest possible total length. In … A Steiner minimal tree for given points $A_1 , \cdots ,A_n $ in the plane is a tree which interconnects these points using lines of shortest possible total length. In order to achieve minimum length the Steiner minimal tree may contain other vertices (Steiner points) beside $A_1 , \cdots ,A_n $. We find conditions which simplify the task of constructing a Steiner minimal tree. Some of these use relationships with the easily constructed (ordinary) minimal tree which achieves minimum length among all trees having only $A_1 , \cdots ,A_n $ as vertices. Other questions concern the relative lengths of these two trees in extreme or typical cases. A review of the existing literature is included.
CHOLMOD is a set of routines for factorizing sparse symmetric positive definite matrices of the form A or AA T , updating/downdating a sparse Cholesky factorization, solving linear systems, updating/downdating … CHOLMOD is a set of routines for factorizing sparse symmetric positive definite matrices of the form A or AA T , updating/downdating a sparse Cholesky factorization, solving linear systems, updating/downdating the solution to the triangular system Lx = b , and many other sparse matrix functions for both symmetric and unsymmetric matrices. Its supernodal Cholesky factorization relies on LAPACK and the Level-3 BLAS, and obtains a substantial fraction of the peak performance of the BLAS. Both real and complex matrices are supported. CHOLMOD is written in ANSI/ISO C, with both C and MATLAB TM interfaces. It appears in MATLAB 7.2 as x = A\b when A is sparse symmetric positive definite, as well as in several other sparse matrix functions.
A unified and powerful approach is presented for devising polynomial approximation schemes for many strongly NP-complete problems. Such schemes consist of families of approximation algorithms for each desired performance bound … A unified and powerful approach is presented for devising polynomial approximation schemes for many strongly NP-complete problems. Such schemes consist of families of approximation algorithms for each desired performance bound on the relative error ε > Ο, with running time that is polynomial when ε is fixed. Though the polynomiality of these algorithms depends on the degree of approximation ε being fixed, they cannot be improved, owing to a negative result stating that there are no fully polynomial approximation schemes for strongly NP-complete problems unless NP = P. The unified technique that is introduced here, referred to as the shifting strategy, is applicable to numerous geometric covering and packing problems. The method of using the technique and how it varies with problem parameters are illustrated. A similar technique, independently devised by B. S. Baker, was shown to be applicable for covering and packing problems on planar graphs.
The field programmable gate-array (FPGA) has become an important technology in VLSI ASIC designs. In the past few years, a number of heuristic algorithms have been proposed for technology mapping … The field programmable gate-array (FPGA) has become an important technology in VLSI ASIC designs. In the past few years, a number of heuristic algorithms have been proposed for technology mapping in lookup-table (LUT) based FPGA designs, but none of them guarantees optimal solutions for general Boolean networks and little is known about how far their solutions are away from the optimal ones. This paper presents a theoretical breakthrough which shows that the LUT-based FPGA technology mapping problem for depth minimization can be solved optimally in polynomial time. A key step in our algorithm is to compute a minimum height K-feasible cut in a network, which is solved optimally in polynomial time based on network flow computation. Our algorithm also effectively minimizes the number of LUT's by maximizing the volume of each cut and by several post-processing operations. Based on these results, we have implemented an LUT-based FPGA mapping package called FlowMap. We have tested FlowMap on a large set of benchmark examples and compared it with other LUT-based FPGA mapping algorithms for delay optimization, including Chortle-d, MIS-pga-delay, and DAG-Map. FlowMap reduces the LUT network depth by up to 7% and reduces the number of LUT's by up to 50% compared to the three previous methods.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">&gt;</ETX>
The author addresses the problem of routing connections in a large-scale packet-switched network supporting multipoint communications. He gives a formal definition of several versions of the multipoint problem, including both … The author addresses the problem of routing connections in a large-scale packet-switched network supporting multipoint communications. He gives a formal definition of several versions of the multipoint problem, including both static and dynamic versions. He looks at the Steiner tree problem as an example of the static problem and considers the experimental performance of two approximation algorithms for this problem. A weighted greedy algorithm is considered for a version of the dynamic problem which allows endpoints to come and go during the life of a connection. One of the static algorithms serves as a reference to measure the performance of the proposed weighted greedy algorithm in a series of experiments.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">&gt;</ETX>
In this paper, we present a new hypergraph partitioning algorithmthat is based on the multilevel paradigm. In the multilevel paradigm,a sequence of successively coarser hypergraphs is constructed. Abisection of the … In this paper, we present a new hypergraph partitioning algorithmthat is based on the multilevel paradigm. In the multilevel paradigm,a sequence of successively coarser hypergraphs is constructed. Abisection of the smallest hypergraph is computed and it is used toobtain a bisection of the original hypergraph by successively projectingand refining the bisection to the next level finer hypergraph.We evaluate the performance both in terms of the size of the hyper-edgecut on the bisection as well as run time on a number of VLSIcircuits. Our experiments show that our multilevel hypergraph partitioningalgorithm produces high quality partitioning in relativelysmall amount of time. The quality of the partitionings produced byour scheme are on the average 4% to 23% better than those producedby other state-of-the-art schemes. Furthermore, our partitioning algorithmissignificantly faster, often requiring 4 to 15 times less timethan that required by the other schemes. Our multilevel hypergraphpartitioning algorithm scales very well for large hypergraphs. Hypergraphswith over 100,000 vertices can be bisected in a few minuteson today's workstations. Also, on the large hypergraphs, ourscheme outperforms other schemes (in hyperedge cut) quite consistentlywith larger margins (9% to 30%).
Partitioning of circuit netlists in VLSI design is considered. It is shown that the second smallest eigenvalue of a matrix derived from the netlist gives a provably good approximation of … Partitioning of circuit netlists in VLSI design is considered. It is shown that the second smallest eigenvalue of a matrix derived from the netlist gives a provably good approximation of the optimal ratio cut partition cost. It is also demonstrated that fast Lanczos-type methods for the sparse symmetric eigenvalue problem are a robust basis for computing heuristic ratio cuts based on the eigenvector of this second eigenvalue. Effective clustering methods are an immediate by-product of the second eigenvector computation and are very successful on the difficult input classes proposed in the CAD literature. The intersection graph representation of the circuit netlist is considered, as a basis for partitioning, a heuristic based on spectral ratio cut partitioning of the netlist intersection graph is proposed. The partitioning heuristics were tested on industry benchmark suites, and the results were good in terms of both solution quality and runtime. Several types of algorithmic speedups and directions for future work are discussed.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">&gt;</ETX>
This survey presents an overview of recent advances in the state of the art for computer-aided design (CAD) tools for analog and mixed-signal integrated circuits (ICs). Analog blocks typically constitute … This survey presents an overview of recent advances in the state of the art for computer-aided design (CAD) tools for analog and mixed-signal integrated circuits (ICs). Analog blocks typically constitute only a small fraction of the components on mixed-signal ICs and emerging systems-on-a-chip (SoC) designs. But due to the increasing levels of integration available in silicon technology and the growing requirement for digital systems to communicate with the continuous-valued external world, there is a growing need for CAD tools that increase the design productivity and improve the quality of analog integrated circuits. This paper describes the motivation and evolution of these tools and outlines progress on the various design problems involved: simulation and modeling, symbolic analysis, synthesis and optimization, layout generation, yield analysis and design centering, and test. This paper summarizes the problems for which viable solutions are emerging and those which are still unsolved.
In this work, we show that the standard graph-partitioning-based decomposition of sparse matrices does not reflect the actual communication volume requirement for parallel matrix-vector multiplication. We propose two computational hypergraph … In this work, we show that the standard graph-partitioning-based decomposition of sparse matrices does not reflect the actual communication volume requirement for parallel matrix-vector multiplication. We propose two computational hypergraph models which avoid this crucial deficiency of the graph model. The proposed models reduce the decomposition problem to the well-known hypergraph partitioning problem. The recently proposed successful multilevel framework is exploited to develop a multilevel hypergraph partitioning tool PaToH for the experimental verification of our proposed hypergraph models. Experimental results on a wide range of realistic sparse test matrices confirm the validity of the proposed hypergraph models. In the decomposition of the test matrices, the hypergraph models using PaToH and hMeTiS result in up to 63 percent less communication volume (30 to 38 percent less on the average) than the graph model using MeTiS, while PaToH is only 1.3-2.3 times slower than MeTiS on the average.
We consider the problem of partitioning the nodes of a graph with costs on its edges into subsets of given sizes so as to minimize the sum of the costs … We consider the problem of partitioning the nodes of a graph with costs on its edges into subsets of given sizes so as to minimize the sum of the costs on all edges cut. This problem arises in several physical situations — for example, in assigning the components of electronic circuits to circuit boards to minimize the number of connections between boards. This paper presents a heuristic method for partitioning arbitrary graphs which is both effective in finding optimal partitions, and fast enough to be practical in solving large problems.
A brief introduction is given to the actual mechanics of simulated annealing, and a simple example from an IC layout is used to illustrate how these ideas can be applied. … A brief introduction is given to the actual mechanics of simulated annealing, and a simple example from an IC layout is used to illustrate how these ideas can be applied. The complexities and tradeoffs involved in attacking a realistically complex design problem are illustrated by dissecting two very different annealing algorithms for VLSI chip floorplanning. Several current research problems aimed at determining more precisely how and why annealing algorithms work are examined. Some philosophical issues raised by the introduction of annealing are discussed.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">&gt;</ETX>
Abstract The problem of determining a minimum cost connected network (i.e., weighted graph) G that spans a given subset of vertices is known in the literature as the Steiner problem … Abstract The problem of determining a minimum cost connected network (i.e., weighted graph) G that spans a given subset of vertices is known in the literature as the Steiner problem in networks. We survey exact algorithms and heuristics which appeared in the published literature. We also discuss problems related to the Steiner problem in networks.
In this work, we present a new network design paradigm. Our goal is to help advance the understanding of network design and discover design principles that generalize across settings. Instead … In this work, we present a new network design paradigm. Our goal is to help advance the understanding of network design and discover design principles that generalize across settings. Instead of focusing on designing individual network instances, we design network design spaces that parametrize populations of networks. The overall process is analogous to classic manual design of networks, but elevated to the design space level. Using our methodology we explore the structure aspect of network design and arrive at a low-dimensional design space consisting of simple, regular networks that we call RegNet. The core insight of the RegNet parametrization is surprisingly simple: widths and depths of good networks can be explained by a quantized linear function. We analyze the RegNet design space and arrive at interesting findings that do not match the current practice of network design. The RegNet design space provides simple and fast networks that work well across a wide range of flop regimes. Under comparable training settings and flops, the RegNet models outperform the popular EfficientNet models while being up to 5x faster on GPUs.
An iterative mincut heuristic for partitioning networks is presented whose worst case computation time, per pass, grows linearly with the size of the network. In practice, only a very small … An iterative mincut heuristic for partitioning networks is presented whose worst case computation time, per pass, grows linearly with the size of the network. In practice, only a very small number of passes are typically needed, leading to a fast approximation algorithm for mincut partitioning. To deal with cells of various sizes, the algorithm progresses by moving one cell at a time between the blocks of the partition while maintaining a desired balance based on the size of the blocks rather than the number of cells per block. Efficient data structures are used to avoid unnecessary searching for the best cell to move and to minimize unnecessary updating of cells affected by each move.
Efficient routing is critical for the performance and reliability of modern high-density printed circuit boards (PCBs). This paper presents an optimized automatic routing strategy for PCBs using an improved ant … Efficient routing is critical for the performance and reliability of modern high-density printed circuit boards (PCBs). This paper presents an optimized automatic routing strategy for PCBs using an improved ant colony optimization (ACO) algorithm, focusing on equal-length and differential pair wiring requirements. The algorithm incorporates maximum and minimum length constraints for equal-length wiring to ensure precise length matching, thereby reducing signal delays and improving timing synchronization. For differential pair wiring, a two-ant synchronized routing method ensures equidistance, followed by fine-tuning to achieve length uniformity and improve signal integrity and electromagnetic interference immunity. The proposed enhancements to the ACO algorithm include refined pheromone-updating mechanisms, improving convergence speed, mitigating local optima, and handling large-scale multiconstraint routing effectively. Experimental results demonstrate superior performance in meeting equal-length and differential pair wiring constraints, validating the algorithm’s robustness and practical applicability in modern PCB design.
| IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
To this day, the design of analog integrated circuits is a predominantly manual task, heavily reliant on the knowledge and intuition of human experts. Many current automation approaches aim to … To this day, the design of analog integrated circuits is a predominantly manual task, heavily reliant on the knowledge and intuition of human experts. Many current automation approaches aim to be holistic solutions, attempting to take the human out of the loop. This work, in turn, does not intend to replace human designers with algorithms, but support their qualities in the established flow. Here, the performance space of analog ICs is modeled by PVT-aware neural networks and visualized with parallel coordinate plots. Such a responsive visualization gives insights into the relations of parameters through interactive exploration where any parameter can be the cause while all others show the immediate effect. Thus, complex decision-making problems based on the experience of seasoned designers, such as circuit sizing or topology selection, are transformed into intuitive perceptual problems. Through the responsiveness and immediacy of the implementation, designers are encouraged to explore the entire performance space instead of basing all decisions on previous designs, never leaving the beaten path. A data generation and training procedure for surrogate models is outlined. Models for three operational amplifiers in three different technologies illustrate the applicability and feasibility of the presented approach. Additionally, a web-based demo, including all source code, is available for review.
Koshiya Dev Nareshbhai | International Journal For Multidisciplinary Research
Transformers are critical, high-value components of the power system that operate continuously to supply electrical loads. Due to their constant operation and fixed capacity, sudden increases in demand can lead … Transformers are critical, high-value components of the power system that operate continuously to supply electrical loads. Due to their constant operation and fixed capacity, sudden increases in demand can lead to overloading, resulting in overheating and insulation damage, ultimately causing supply interruptions. To mitigate this risk, this research proposes an embedded automation-based load sharing system for transformers operating in parallel. The approach enables auxiliary (slave) transformers to dynamically share excess load when the main transformer reaches its rated capacity. Utilizing microcontroller-based monitoring and control units, the system continuously evaluates voltage, current, and load conditions in real-time. When overloading is detected, the automation system activates standby transformers to distribute the load evenly, thus avoiding damage and maintaining uninterrupted power supply. The embedded system employs intelligent decision-making logic and communication protocols to coordinate between units seamlessly. Implementation of this method enhances system reliability, prevents outages due to transformer stress, and extends equipment life. This research contributes to smart energy distribution by offering a cost-effective and scalable solution to transformer overloading through intelligent load sharing.
With the rapid development of global electronic design automation (EDA) technology and the intensification of technology competition between China and the United States, the research on automatic layout algorithms for … With the rapid development of global electronic design automation (EDA) technology and the intensification of technology competition between China and the United States, the research on automatic layout algorithms for printed circuit board (PCB) modules has become particularly important. In order to overcome the drawbacks of conventional layout techniques in complicated circuit design, this paper suggests an autonomous PCB module layout approach based on deep reinforcement learning (DRL). First, the challenges faced by PCB design and the shortcomings of existing algorithms were analyzed, and two different state encoding methods were proposed: the direct feature embedding method and the graph structure-based feature method. In the direct feature embedding method, the layout environment state is encoded into a state matrix using component feature information, and component features are represented by setting different channels in the feature matrix; The graph structure feature method extracts node and edge information from PCB circuit diagrams through multi-layer perceptron (MLP) models to comprehensively capture the complexity and connectivity of PCB layouts. In addition, a DRL neural network structure suitable for different state encoding methods was designed, and the advantages of the proposed algorithm in improving layout efficiency and optimizing design performance were verified through experiments. The experimental findings demonstrate the considerable layout optimization capabilities of the DRL-based PCB module automatic layout algorithm, offering fresh concepts and strategies for encouraging autonomous EDA technology advancement in China.
Mixed-cell-height VLSI circuits are widely used to meet various design requirements. Due to design for manufacturability (DFM) considerations such as layout-dependent effects (LDEs), drain-to-drain abutment (DDA), and pattern coloring for … Mixed-cell-height VLSI circuits are widely used to meet various design requirements. Due to design for manufacturability (DFM) considerations such as layout-dependent effects (LDEs), drain-to-drain abutment (DDA), and pattern coloring for multiple patterning, different spacings between adjacent cells affect performance, modeled as discrete spacing costs. A state-of-the-art dynamic programming (DP) approach can address this problem but can only handle a few cell rows simultaneously due to its high complexity. In this paper, we propose a novel DP algorithm that can solve the problem optimally and more efficiently. Additionally, several optimality-preserving reduction techniques are employed to derive full-chip optimal solutions for large-scale designs. Experimental results demonstrate that the proposed approach significantly outperforms existing methods in terms of total spacing cost and total displacement.
High-performance analog and mixed-signal (AMS) circuits are mainly full-custom designed, which is time-consuming and labor-intensive. A significant portion of the effort is experience-driven, which makes the automation of AMS circuit … High-performance analog and mixed-signal (AMS) circuits are mainly full-custom designed, which is time-consuming and labor-intensive. A significant portion of the effort is experience-driven, which makes the automation of AMS circuit design a formidable challenge. Large language models (LLMs) have emerged as powerful tools for Electronic Design Automation (EDA) applications, fostering advancements in the automatic design process for large-scale AMS circuits. However, the absence of high-quality datasets has led to issues such as model hallucination, which undermines the robustness of automatically generated circuit designs. To address this issue, this paper introduces AMSnet-KG, a dataset encompassing various AMS circuit schematics and netlists. We construct a knowledge graph with annotations on detailed functional and performance characteristics. Facilitated by AMSnet-KG, we propose an automated AMS circuit generation framework that utilizes the comprehensive knowledge embedded in LLMs. The flow first formulate a design strategy (e.g., circuit architecture using a number of circuit components) based on required specifications. Next, matched subcircuits are retrieved and assembled into a complete topology, and transistor sizing is obtained through Bayesian optimization. Simulation results of the netlist are automatically fed back to the LLM for further topology refinement, ensuring the circuit design specifications are met. We perform case studies of operational amplifier and comparator design to verify the automatic design flow from specifications to netlists with minimal human effort. The dataset used in this paper is available at https://ams-net.github.io/.
As technology scaling exacerbates interconnect resistance in advanced nodes, FPGA architectures demand enhanced programmable logic blocks (PLBs) to minimize global metal routing. However, it is expensive to raise the functionality … As technology scaling exacerbates interconnect resistance in advanced nodes, FPGA architectures demand enhanced programmable logic blocks (PLBs) to minimize global metal routing. However, it is expensive to raise the functionality of LUTs due to exponential area growth with the number of inputs, resulting in poor scalability. Moreover, LUTs are redundant since practical functions in real-world benchmarks only account for an extremely small proportion of all the functions. For example, only 16424 out of more than 100 trillion NPN classes of 6-input functions are used in the mapped netlists of the VTR8 and KOIOS benchmarks. Therefore, we propose a reduced LUT architecture, named RLUT, to efficiently implement most of the frequent functions. The compact structure of the MUX tree in LUTs is preserved and reduced, while the reduced programmable bits are connected to the MUX tree according to the bit assignment generated automatically by the proposed algorithms. Results of evaluations by a full EDA flow show that, compared with the modified Stratix10 baseline, the proposed 8-input PLB with 75 SRAM bits, named Dual-RLUT6, reduces the maximum logic levels significantly by 20.85%, while the critical path delay is improved by 10.11% at the cost of 4.65% area overhead.
As an important intermediate between integrated circuits (ICs) and the printed circuit board (PCB), the routing in the package substrate plays a crucial role in the efficiency and accuracy of … As an important intermediate between integrated circuits (ICs) and the printed circuit board (PCB), the routing in the package substrate plays a crucial role in the efficiency and accuracy of signal and power transmission. While numerous research efforts have focused on substrate routing to avoid inefficient, time-consuming, and error-prone manual processes, few of them have addressed the challenge of routing multi-pin nets, particularly those with a large number of pins. This paper presents a three-stage framework of multi-pin net routing for packages with fine-pitch ball grid arrays, consisting of pin grouping, minimum spanning tree topology generation, and group topology connection. Our framework classifies net connections into different categories, prioritizes their routing ordering, and applies different routing approaches and strategies to boost overall routability. The results of the experiments conducted on six real industrial designs demonstrate that our framework can simultaneously and effectively handle two-pin nets and multi-pin nets with better routing performance compared to the state-of-the-art work.
P. Agarwal | Journal of Information Systems Engineering & Management
Propagation of slew in design is an important aspect while performing the static timing analysis (STA) of a design. Slew has a direct impact on delay of a timing path … Propagation of slew in design is an important aspect while performing the static timing analysis (STA) of a design. Slew has a direct impact on delay of a timing path and could make the design pass or fail the timing closure. This research paper introduces the concept of slew and how it is propagated in two different ways in the design, and how this is a problem in itself to choose between the incident-slews at a slew merging point. In static timing analysis (STA), the waveforms that change are measured in terms of how fast or slow they change. The slew is measured in terms of the time it takes to transition from one level to the other. The rate of change is defined as the slew. The slew rate has inverse relationship with the transition time. If the slew is slow, then the transition time is more and if the slew is fast, the transition time is less. In a digital circuit design, there are points where timing arcs merge. Such points could be termed as slew merging points. In such conflicts, there are two approaches to move forward. The first approach is graph-based static timing analysis. In a graph- based approach, the worst case delays are taken into consideration by taking into the account the worst case slews (slow slews) along the timing paths, for setup analysis. While in case of hold analysis, the best case delays are taken into consideration by taking into the account the best case slews (fast slews) along the timing paths. The second approach is path-based static timing analysis. In this approach, the actual delays are taken into consideration by taking into account the actual slews along the timing paths, for setup as well as hold analysis. In path-based static timing analysis, delay is computed of the timing path so as to obtain the actual delay values. Such delay calculation takes some extra amount of time. There is a trade-off between this extra calculation time and accurate delay calculation. The synthesized gate-level net-lists are sourced in the tool along with the required files. These files include liberty timing files, which include the timing information for various cells present in the design. Library exchange files are also fed in which includes the information about various metal layers used for routing. The presence of this file makes the post-route timing possible as the values of resistances and capacitances could be extracted from interconnects. Graph-based and path-based approach are followed differently on the design. In the graph-based approach, initial timing reports are generated and then after optimizing the design for slack improvement, post-route timing reports are generated which are analysed.
This work details the capabilities of a major new release of the Verilog-to-Routing (VTR) open-source FPGA CAD tool flow. Enhancements include generalizations of VTR’s architecture modeling language and optimizers to … This work details the capabilities of a major new release of the Verilog-to-Routing (VTR) open-source FPGA CAD tool flow. Enhancements include generalizations of VTR’s architecture modeling language and optimizers to enable a more diverse set of programmable routing fabrics, FPGAs with embedded hard Networks-on-Chip (NoCs) and three-dimensional FPGA systems that leverage stacked silicon integration. The new Parmys logic synthesis flow improves language coverage and result quality, and the physical implementation flow includes a more efficient placement engine, floorplanning constraints to guide placement, the ability to perform single-stage (flat) routing to improve quality, and parallel routing algorithms to reduce CPU time. This release also includes new architecture captures of recent commercial devices (Xilinx’s 7-series and Altera’s Stratix 10) and new benchmark suites ( Titanium25 and Hermes) to aid FPGA architecture investigation. Verilog language coverage is greatly improved with the new Parmys logic synthesis flow, enabling more designs to be used with VTR. Finally, the placement and routing engines have beeen sped up by 4 \(\times\) and 2.2 \(\times\) vs. VTR 8, respectively, leading to an overall physical implementation flow CPU time reduction of 48% with better result quality on average compared to VTR 8.
Routing is a critical stage in achieving timing closure in integrated circuit design. Due to the time-consuming flow of detailed routing (DR), the lack of accurate routing information, and the … Routing is a critical stage in achieving timing closure in integrated circuit design. Due to the time-consuming flow of detailed routing (DR), the lack of accurate routing information, and the impact of congestion during global routing (GR), rapidly obtaining precise timing information at the global routing stage to guide subsequent timing optimization is a significant challenge. These challenges lead to substantial discrepancies between the estimated timing at GR stage and the actual results after post-DR, resulting in inaccurate evaluations of chip performance. To address this issue, we propose an effective timing prediction and optimization framework, AiTPO. The innovative KAN-UNet heterogeneous timing prediction model effectively combines UNet and KAN networks. By fusing spatial features extracted by UNet with numerical data, the model gains the capability to learn complex relationships across multi-modal data, thereby enhancing robustness and accuracy. Additionally, with the accurate timing evaluation, we introduce two timing optimization strategies during global routing to enhance timing performance. The first strategy involves net ordering based on predicted significant delay nets, prioritizing the routing of more timing-critical nets to reduce detours caused by congestion. The second strategy employs timing estimation to select the most optimal topology from multiple candidates generated by the enhanced A* algorithm, where congestion is considered as a cost factor. Which contributes to optimizing Worst Negative Slack (WNS) and Total Negative Slack (TNS). Experimental results on the real circuits under 28nm process node show that the wire delay prediction accuracy with the proposed KAN-UNet model improves by 34.6% and 25.4% in terms of Mean Absolute Error (MAE) and Max Absolute Error (MaxAE), respectively, compared to GR-based estimations and demonstrate the effectiveness of our timing optimization strategies, which lead to a 2.0% and 4.2% improvement in TNS and WNS, respectively.
This paper introduces GenPart 2.0, an enhanced version of the hypergraph partitioner GenPart. While GenPart was limited to handling only unit vertex weights, GenPart 2.0 extends capabilities to include varying … This paper introduces GenPart 2.0, an enhanced version of the hypergraph partitioner GenPart. While GenPart was limited to handling only unit vertex weights, GenPart 2.0 extends capabilities to include varying vertex weights. This extension is achieved through a variational graph neural network-based generative model and new feature preparation techniques. GenPart 2.0 addresses hypergraph partitioning challenges by establishing an embedding space that adheres to normalized cut and balance constraints. The generative model in GenPart 2.0 is designed to explore various partitioning configurations within this embedding space. Further, it enhances partitioning solutions using the V-cycle method. Testing on VLSI circuit benchmarks, including ISPD98, ISPD2005, and Titan23, under various balance constraints, has demonstrated improved performance by GenPart 2.0.
Graph partitioning is important for the design of many CAD algorithms. However, as the graph size continues to grow, graph partitioning becomes increasingly time-consuming. Recent research has introduced parallel graph … Graph partitioning is important for the design of many CAD algorithms. However, as the graph size continues to grow, graph partitioning becomes increasingly time-consuming. Recent research has introduced parallel graph partitioners using either multi-core CPUs or GPUs. However, the speedup of existing CPU graph partitioners is typically limited to a few cores, while the performance of GPU-based solutions is algorithmically limited by available GPU memory. To overcome these challenges, we propose G-kway, an efficient multilevel GPU-accelerated k -way graph partitioner. G-kway introduces an effective union find-based coarsening and a novel independent set-based refinement algorithm to significantly accelerate both the coarsening and uncoarsening stages. Furthermore, when kernel launch overhead becomes substantial in the refinement algorithm, G-kway employs CUDA Graph-based uncoarsening to reduce the overhead and improve performance. Experimental results have shown that G-kway outperforms both the state-of-the-art CPU-based and GPU-based parallel partitioners with an average speedup of 8.6 × and 3.8 ×, respectively, while achieving comparable partitioning quality. Additionally, G-kway with CUDA Graph-based uncoarsening can further accelerate graph partitioning, achieving up to 1.93 × speedup over the default G-kway.
Since Beineke's work in 1968 on linegraphs, attention has focused on the classification of graphs as linegraphs. It is known that every graph $G$ is the linegraph of an hypergraph, … Since Beineke's work in 1968 on linegraphs, attention has focused on the classification of graphs as linegraphs. It is known that every graph $G$ is the linegraph of an hypergraph, and the question is to characterize that root graph. We introduce the $C_{p,q}$ classes, defined as sets of graphs where each vertex can be covered by at most $p$ cliques, and each edge belongs to at most $q$ cliques. These classes provide a comprehensive classification of linegraphs through a unified and parameterized approach. They describe previously known graph classes - such as linegraphs of simple graphs, $p$-uniform hypergraphs and $p$-uniform $1$-linear hypergraphs - while being capable of generalization. We study the complexity of determining the membership and edit distance of a graph to one of these classes. We prove the first Fixed Parameter Tractable algorithm with respect to treewidth to compute the edit distance.
This article presents a comprehensive overview of how artificial intelligence and machine learning technologies are revolutionizing functional verification in modern chip design. As semiconductor complexity escalates with advanced process nodes … This article presents a comprehensive overview of how artificial intelligence and machine learning technologies are revolutionizing functional verification in modern chip design. As semiconductor complexity escalates with advanced process nodes enabling billions of transistors on a single die, traditional verification methods face insurmountable challenges in ensuring design correctness. The verification bottleneck has become the dominant constraint in chip development cycles, consuming the majority of resources and frequently allowing critical bugs to escape to silicon. The integration of AI/ML techniques offers transformative solutions across multiple verification domains, including intelligent test generation, coverage analysis optimization, and bug prediction. These technologies enable more efficient resource allocation, targeted verification of high-risk design areas, and significantly accelerated coverage closure. The article examines implementation strategies for AI-driven verification systems and presents concrete case studies demonstrating measurable improvements in verification efficiency, quality, and time-to-market