Computer Science Computational Theory and Mathematics

Numerical Methods and Algorithms

Description

This cluster of papers focuses on the theory, implementation, and optimization of floating-point arithmetic for scientific computation. It covers topics such as interval analysis, high-precision computation, hardware implementation on FPGAs, numerical verification methods, decimal floating-point arithmetic, accuracy optimization, Taylor models, and handling interval uncertainty.

Keywords

Floating-Point Arithmetic; Interval Analysis; High-Precision Computation; Hardware Implementation; Numerical Verification Methods; FPGA Acceleration; Decimal Floating-Point; Accuracy-Guaranteed Bit-Width Optimization; Taylor Models; Interval Uncertainty

Kolmogorov's goodness-of-fit measure, D<sub>n</sub> , for a sample CDF has consistently been set aside for methods such as the D<sup>+</sup><sub>n</sub> or D<sup>-</sup><sub>n</sub> of Smirnov, primarily, it seems, because of the … Kolmogorov's goodness-of-fit measure, D<sub>n</sub> , for a sample CDF has consistently been set aside for methods such as the D<sup>+</sup><sub>n</sub> or D<sup>-</sup><sub>n</sub> of Smirnov, primarily, it seems, because of the difficulty of computing the distribution of D<sub>n</sub> . As far as we know, no easy way to compute that distribution has ever been provided in the 70+ years since Kolmogorov's fundamental paper. We provide one here, a C procedure that provides Pr(D<sub>n</sub> < d) with 13-15 digit accuracy for n ranging from 2 to at least 16000. We assess the (rather slow) approach to limiting form, and because computing time can become excessive for probabilities>.999 with n's of several thousand, we provide a quick approximation that gives accuracy to the 7th digit for such cases.
Training of large-scale deep neural networks is often constrained by the available computational resources. We study the effect of limited precision data representation and computation on neural network training. Within … Training of large-scale deep neural networks is often constrained by the available computational resources. We study the effect of limited precision data representation and computation on neural network training. Within the context of low-precision fixed-point computations, we observe the rounding scheme to play a crucial role in determining the network's behavior during training. Our results show that deep networks can be trained using only 16-bit wide fixed-point number representation when using stochastic rounding, and incur little to no degradation in the classification accuracy. We also demonstrate an energy-efficient hardware accelerator that implements low-precision fixed-point arithmetic with stochastic rounding.
This article presents a multiple-precision binary floating-point library, written in the ISO C language, and based on the GNU MP library. Its particularity is to extend to arbitrary-precision, ideas from … This article presents a multiple-precision binary floating-point library, written in the ISO C language, and based on the GNU MP library. Its particularity is to extend to arbitrary-precision, ideas from the IEEE 754 standard, by providing correct rounding and exceptions . We demonstrate how these strong semantics are achieved---with no significant slowdown with respect to other arbitrary-precision tools---and discuss a few applications where such a library can be useful.
This paper describes a single unified algorithm for the calculation of elementary functions including multiplication, division, sin, cos, tan, arctan, sinh, cosh, tanh, arctanh, In, exp and square-root. The basis … This paper describes a single unified algorithm for the calculation of elementary functions including multiplication, division, sin, cos, tan, arctan, sinh, cosh, tanh, arctanh, In, exp and square-root. The basis for the algorithm is coordinate rotation in a linear, circular, or hyperbolic coordinate system depending on which function is to be calculated. The only operations required are shifting, adding, subtracting and the recall of prestored constants. The limited domain of convergence of the algorithm is calculated, leading to a discussion of the modifications required to extend the domain for floating point calculations.
article Free Access Share on Basic Linear Algebra Subprograms for Fortran Usage Authors: C. L. Lawson Jet Propulsion Laboratory, M/S 125-128, 4800 Oak Grove Drive, Pasadena, CA Jet Propulsion Laboratory, … article Free Access Share on Basic Linear Algebra Subprograms for Fortran Usage Authors: C. L. Lawson Jet Propulsion Laboratory, M/S 125-128, 4800 Oak Grove Drive, Pasadena, CA Jet Propulsion Laboratory, M/S 125-128, 4800 Oak Grove Drive, Pasadena, CAView Profile , R. J. Hanson Numerical Mathematics, Div. 5122, Sandia Laboratories, Albuquerque, NM Numerical Mathematics, Div. 5122, Sandia Laboratories, Albuquerque, NMView Profile , D. R. Kincaid Center for Numerical Analysis, The University of Texas at Austin, Austin, TX Center for Numerical Analysis, The University of Texas at Austin, Austin, TXView Profile , F. T. Krogh Jet Propulsion Laboratory, M/S 125-128, 4800 Oak Grove Drive, Pasadena, CA Jet Propulsion Laboratory, M/S 125-128, 4800 Oak Grove Drive, Pasadena, CAView Profile Authors Info & Claims ACM Transactions on Mathematical SoftwareVolume 5Issue 3Sept. 1979 pp 308–323https://doi.org/10.1145/355841.355847Online:01 September 1979Publication History 1,015citation3,723DownloadsMetricsTotal Citations1,015Total Downloads3,723Last 12 Months242Last 6 weeks48 Get Citation AlertsNew Citation Alert added!This alert has been successfully added and will be sent to:You will be notified whenever a record that you have chosen has been cited.To manage your alert preferences, click on the button below.Manage my AlertsNew Citation Alert!Please log in to your account Save to BinderSave to BinderCreate a New BinderNameCancelCreateExport CitationPublisher SiteeReaderPDF
From the Publisher: What is the most accurate way to sum floating point numbers? What are the advantages of IEEE arithmetic? How accurate is Gaussian elimination and what were the … From the Publisher: What is the most accurate way to sum floating point numbers? What are the advantages of IEEE arithmetic? How accurate is Gaussian elimination and what were the key breakthroughs in the development of error analysis for the method? The answers to these and many related questions are included here. This book gives a thorough, up-to-date treatment of the behavior of numerical algorithms in finite precision arithmetic. It combines algorithmic derivations, perturbation theory, and rounding error analysis. Software practicalities are emphasized throughout, with particular reference to LAPACK and MATLAB. The best available error bounds, some of them new, are presented in a unified format with a minimum of jargon. Because of its central role in revealing problem sensitivity and providing error bounds, perturbation theory is treated in detail. Historical perspective and insight are given, with particular reference to the fundamental work of Wilkinson and Turing, and the many quotations provide further information in an accessible format. The book is unique in that algorithmic developments and motivations are given succinctly and implementation details minimized, so that attention can be concentrated on accuracy and stability results. Here, in one place and in a unified notation, is error analysis for most of the standard algorithms in matrix computations. Not since Wilkinson's Rounding Errors in Algebraic Processes (1963) and The Algebraic Eigenvalue Problem (1965) has any volume treated this subject in such depth. A number of topics are treated that are not usually covered in numerical analysis textbooks, including floating point summation, block LU factorization, condition number estimation, the Sylvester equation, powers of matrices, finite precision behavior of stationary iterative methods, Vandermonde systems, and fast matrix multiplication. Although not designed specifically as a textbook, this volume is a suitable reference for an advanced course, and could be used by instructors at all levels as a supplementary text from which to draw examples, historical perspective, statements of results, and exercises (many of which have never before appeared in textbooks). The book is designed to be a comprehensive reference and its bibliography contains more than 1100 references from the research literature. Audience Specialists in numerical analysis as well as computational scientists and engineers concerned about the accuracy of their results will benefit from this book. Much of the book can be understood with only a basic grounding in numerical analysis and linear algebra. About the Author Nicholas J. Higham is a Professor of Applied Mathematics at the University of Manchester, England. He is the author of more than 40 publications and is a member of the editorial boards of the SIAM Journal on Matrix Analysis and Applications and the IMA Journal of Numerical Analysis. His book Handbook of Writing for the Mathematical Sciences was published by SIAM in 1993.
Employing a closed set-theoretic foundation for interval computations, Global Optimization Using Interval Analysis simplifies algorithm construction and increases generality of interval arithmetic. This Second Edition contains an up-to-date discussion of … Employing a closed set-theoretic foundation for interval computations, Global Optimization Using Interval Analysis simplifies algorithm construction and increases generality of interval arithmetic. This Second Edition contains an up-to-date discussion of interval methods for solving systems of nonlinear equations and global optimization problems. It expands and improves various aspects of its forerunner and features significant new discussions, such as those on the use of consistency methods to enhance algorithm performance. Provided algorithms are guaranteed to find and bound all solutions to these problems despite bounded errors in data, in approximations, and from use of rounded arithmetic.
We present a new method for accelerating matrix multiplication asymptotically. This work builds on recent ideas of Volker Strassen, by using a basic trilinear form which is not a matrix … We present a new method for accelerating matrix multiplication asymptotically. This work builds on recent ideas of Volker Strassen, by using a basic trilinear form which is not a matrix product. We make novel use of the Salem-Spencer Theorem, which gives a fairly dense set of integers with no three-term arithmetic progression. Our resulting matrix exponent is 2.376.
The history and fundamentals of distributed arithmetic (DA) are presented. Ways to increase the speed of DA multiplication are described. DA is applied to a biquadratic digital filter, providing an … The history and fundamentals of distributed arithmetic (DA) are presented. Ways to increase the speed of DA multiplication are described. DA is applied to a biquadratic digital filter, providing an example of vector dot-product and vector-matrix-product mechanization. Applications to transformers and nonlinear and/or nonstationary processing with DA are discussed. It is seen that DA is a very efficient means to mechanize computations that are dominated by inner products.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">&gt;</ETX>
The COordinate Rotation DIgital Computer(CORDIC) is a special-purpose digital computer for real-time airborne computation. In this computer, a unique computing technique is employed which is especially suitable for solving the … The COordinate Rotation DIgital Computer(CORDIC) is a special-purpose digital computer for real-time airborne computation. In this computer, a unique computing technique is employed which is especially suitable for solving the trigonometric relationships involved in plane coordinate rotation and conversion from rectangular to polar coordinates. CORDIC is an entire-transfer computer; it contains a special serial arithmetic unit consisting of three shift registers, three adder-subtractors, and special interconnections. By use of a prescribed sequence of conditional additions or subtractions, the CORDIC arithmetic unit can be controlled to solve either set of the following equations: Y' = K(Y cosλ + X sinλ) X' = K(X cosλ - Y sinλ), or R = K√X <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> + Y <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> θ = tan-1 Y/X, where K is an invariable constant. This special arithmetic unit is also suitable for other computations such as multiplication, division, and the conversion between binary and mixed radix number systems. However, only the trigonometric algorithms used in this computer and the instrumentation of these algorithms are discussed in this paper.
This text gives an account of an ellipsoidal calculus and ellipsoidal techniques that allows presentation of the set-valued solutions to these problems in terms of approximating ellipsoidal-valued functions. Such an … This text gives an account of an ellipsoidal calculus and ellipsoidal techniques that allows presentation of the set-valued solutions to these problems in terms of approximating ellipsoidal-valued functions. Such an leads to effective computation schemes, an dopens the way to applications and implementations with computer animation, particularly in decision support systems. The problems treated here are those that involve calculation of attainability domains, of control synthesis under bounded controls, state constraints and unknown input disturbances, as well as those of viability and of the bounding approach to state estimation. The text ranges from a specially developed theory of exact set-valued solutions to the description of ellipsoidal calculus, related ellipsoidal-based methods and examples worked out with computer graphics. the calculus given here may also be interpreted as a generalized technique of the interval analysis type with an impact on scientific computation.
This paper describes a class of number representations which are called signed-digit representations. Signed-digit representations limit carry-propagation to one position to the left during the operations of addition and subtraction … This paper describes a class of number representations which are called signed-digit representations. Signed-digit representations limit carry-propagation to one position to the left during the operations of addition and subtraction in digital computers. Carry-propagation chains are eliminated by the use of redundant representations for the operands. Redundancy in the number representation allows a method of fast addition and subtraction in which each sum (or difference) digit is the function only of the digits in two adjacent digital positions of the operands. The addition time for signed-digit numbers of any length is equal to the addition time for two digits. The paper discusses the properties of signed-digit representations and arithmetic operations with signed-digit numbers: addition, subtraction, multiplication, division and roundoff. A brief discussion of logical design problems for a signed-digit adder concludes the presentation.
This paper describes the CRAY-1, discusses the evolution of its architecture, and gives an account of some of the problems that were overcome during its manufacture. The CRAY-1 is the … This paper describes the CRAY-1, discusses the evolution of its architecture, and gives an account of some of the problems that were overcome during its manufacture. The CRAY-1 is the only computer to have been built to date that satisfies ERDA's Class VI requirement (a computer capable of processing from 20 to 60 million floating point operations per second) [1]. The CRAY-1's Fortran compiler (CFT) is designed to give the scientific user immediate access to the benefits of the CRAY-1's vector processing architecture. An optimizing compiler, CFT, “vectorizes” innermost DO loops. Compatible with the ANSI 1966 Fortran Standard and with many commonly supported Fortran extensions, CFT does not require any source program modifications or the use of additional nonstandard Fortran statements to achieve vectorization. Thus the user's investment of hundreds of man months of effort to develop Fortran programs for other contemporary computers is protected.
article Share on An updated set of basic linear algebra subprograms (BLAS)ACM Transactions on Mathematical SoftwareVolume 28Issue 2pp 135–151https://doi.org/10.1145/567806.567807Published:01 June 2002Publication History 510citation2,184DownloadsMetricsTotal Citations510Total Downloads2,184Last 12 Months45Last 6 weeks9 Get … article Share on An updated set of basic linear algebra subprograms (BLAS)ACM Transactions on Mathematical SoftwareVolume 28Issue 2pp 135–151https://doi.org/10.1145/567806.567807Published:01 June 2002Publication History 510citation2,184DownloadsMetricsTotal Citations510Total Downloads2,184Last 12 Months45Last 6 weeks9 Get Citation AlertsNew Citation Alert added!This alert has been successfully added and will be sent to:You will be notified whenever a record that you have chosen has been cited.To manage your alert preferences, click on the button below.Manage my AlertsNew Citation Alert!Please log in to your account Save to BinderSave to BinderCreate a New BinderNameCancelCreateExport CitationPublisher SiteGet Access
Theano is a compiler for mathematical expressions in Python that combines the convenience of NumPy's syntax with the speed of optimized native machine language. The user composes mathematical expressions in … Theano is a compiler for mathematical expressions in Python that combines the convenience of NumPy's syntax with the speed of optimized native machine language. The user composes mathematical expressions in a high-level description that mimics NumPy's syntax and semantics, while being statically typed and functional (as opposed to imperative). These expressions allow Theano to provide symbolic differentiation. Before performing computation, Theano optimizes the choice of expressions, translates them into C++ (or CUDA for GPU), compiles them into dynamically loaded Python modules, all automatically. Common machine learning algorithms implemented with Theano are from to faster than competitive alternatives (including those implemented with C/C++, NumPy/SciPy and MATLAB) when compiled for the CPU and between and faster when compiled for the GPU. This paper illustrates how to use Theano, outlines the scope of the compiler, provides benchmarks on both CPU and GPU processors, and explains its overall design.
With VLSI architecture, the chip area and design regularity represent a better measure of cost than the conventional gate count. We show that addition of n-bit binary numbers can be … With VLSI architecture, the chip area and design regularity represent a better measure of cost than the conventional gate count. We show that addition of n-bit binary numbers can be performed on a chip with a regular layout in time proportional to log n and with area proportional to n.
Floating-point arithmetic is considered as esoteric subject by many people. This is rather surprising, because floating-point is ubiquitous in computer systems: Almost every language has a floating-point datatype; computers from … Floating-point arithmetic is considered as esoteric subject by many people. This is rather surprising, because floating-point is ubiquitous in computer systems: Almost every language has a floating-point datatype; computers from PCs to supercomputers have floating-point accelerators; most compilers will be called upon to compile floating-point algorithms from time to time; and virtually every operating system must respond to floating-point exceptions such as overflow. This paper presents a tutorial on the aspects of floating-point that have a direct impact on designers of computer systems. It begins with background on floating-point representation and rounding error, continues with a discussion of the IEEE floating point standard, and concludes with examples of how computer system builders can better support floating point.
Training of large-scale deep neural networks is often constrained by the available computational resources. We study the effect of limited precision data representation and computation on neural network training. Within … Training of large-scale deep neural networks is often constrained by the available computational resources. We study the effect of limited precision data representation and computation on neural network training. Within the context of lowprecision fixed-point computations, we observe the rounding scheme to play a crucial role in determining the network's behavior during training. Our results show that deep networks can be trained using only 16-bit wide fixed-point number representation when using stochastic rounding, and incur little to no degradation in the classification accuracy. We also demonstrate an energy-efficient hardware accelerator that implements low-precision fixed-point arithmetic with stochastic rounding.
From the Publisher: What is the most accurate way to sum floating point numbers? What are the advantages of IEEE arithmetic? How accurate is Gaussian elimination and what were the … From the Publisher: What is the most accurate way to sum floating point numbers? What are the advantages of IEEE arithmetic? How accurate is Gaussian elimination and what were the key breakthroughs in the development of error analysis for the method? The answers to these and many related questions are included here. This book gives a thorough, up-to-date treatment of the behavior of numerical algorithms in finite precision arithmetic. It combines algorithmic derivations, perturbation theory, and rounding error analysis. Software practicalities are emphasized throughout, with particular reference to LAPACK and MATLAB. The best available error bounds, some of them new, are presented in a unified format with a minimum of jargon. Because of its central role in revealing problem sensitivity and providing error bounds, perturbation theory is treated in detail. Historical perspective and insight are given, with particular reference to the fundamental work of Wilkinson and Turing, and the many quotations provide further information in an accessible format. The book is unique in that algorithmic developments and motivations are given succinctly and implementation details minimized, so that attention can be concentrated on accuracy and stability results. Here, in one place and in a unified notation, is error analysis for most of the standard algorithms in matrix computations. Not since Wilkinson's Rounding Errors in Algebraic Processes (1963) and The Algebraic Eigenvalue Problem (1965) has any volume treated this subject in such depth. A number of topics are treated that are not usually covered in numerical analysis textbooks, including floating point summation, block LU factorization, condition number estimation, the Sylvester equation, powers of matrices, finite precision behavior of stationary iterative methods, Vandermonde systems, and fast matrix multiplication. Although not designed specifically as a textbook, this volume is a suitable reference for an advanced course, and could be used by instructors at all levels as a supplementary text from which to draw examples, historical perspective, statements of results, and exercises (many of which have never before appeared in textbooks). The book is designed to be a comprehensive reference and its bibliography contains more than 1100 references from the research literature. Audience Specialists in numerical analysis as well as computational scientists and engineers concerned about the accuracy of their results will benefit from this book. Much of the book can be understood with only a basic grounding in numerical analysis and linear algebra. About the Author Nicholas J. Higham is a Professor of Applied Mathematics at the University of Manchester, England. He is the author of more than 40 publications and is a member of the editorial boards of the SIAM Journal on Matrix Analysis and Applications and the IMA Journal of Numerical Analysis. His book Handbook of Writing for the Mathematical Sciences was published by SIAM in 1993.
Let $N > 1$. We present a method for multiplying two integers (called N-residues) modulo N while avoiding division by N. N-residues are represented in a nonstandard way, so this … Let $N > 1$. We present a method for multiplying two integers (called N-residues) modulo N while avoiding division by N. N-residues are represented in a nonstandard way, so this method is useful only if several computations are done modulo one N. The addition and subtraction algorithms are unchanged.
Abstract This paper provides tables of critical values for some popular tests of cointegration and unit roots. Although these tables are necessarily based on computer simulations, they are much more … Abstract This paper provides tables of critical values for some popular tests of cointegration and unit roots. Although these tables are necessarily based on computer simulations, they are much more accurate than those previously available. The results of the simulation experiments are summarized by means of response surface regressions in which critical values depend on the sample size. From these regressions asymptotic critical values can be read off directly, and critical values for any finite sample size can easily be computed with a hand calculator.
In this work, we develop a method for rational approximation of the Fourier transform (FT) based on the real and imaginary parts of the complex error function w(z)=e−z2(1−erf(−iz))=K(x,y)+iL(x,y), z=x+iy, where … In this work, we develop a method for rational approximation of the Fourier transform (FT) based on the real and imaginary parts of the complex error function w(z)=e−z2(1−erf(−iz))=K(x,y)+iL(x,y), z=x+iy, where K(x,y) and L(x,y) are known as the Voigt and imaginary Voigt functions, respectively. In contrast to our previous rational approximation of the FT, the expansion coefficients in this method are not dependent on the values of a sampled function. As the values of the Voigt functions remain the same, this approach can be used for rapid computation with help of look-up tables. Mathematica codes with some examples are presented.
Floating-point constraint solving is challenging due to the complex representation and non-linear computations. Search-based constraint solving provides an effective method for solving floating-point constraints. In this paper, we propose QSF … Floating-point constraint solving is challenging due to the complex representation and non-linear computations. Search-based constraint solving provides an effective method for solving floating-point constraints. In this paper, we propose QSF to improve the efficiency of search-based solving for floating-point constraints. The key idea of QSF is to model the floating-point constraint solving problem as a multi-objective optimization problem. Specifically, QSF considers both the number of unsatisfied constraints and the sum of the violation degrees of unsatisfied constraints as the objectives for search-based optimization. Besides, we propose a new evolutionary algorithm in which the mutation operators are specially designed for floating-point numbers, aiming to solve the multi-objective problem more efficiently. We have implemented QSF and conducted extensive experiments on both the SMT-COMP benchmark and the benchmark from real-world floating-point programs. The results demonstrate that compared to SOTA floating-point solvers, QSF achieved an average speedup of 15.72X under a 60-second timeout and an impressive 87.48X under a 600-second timeout on the first benchmark. Similarly, on the second benchmark, QSF delivered an average speedup of 22.44X and 29.23X, respectively, under the two timeout configurations. Furthermore, QSF has also enhanced the performance of symbolic execution for floating-point programs.
Logical qubits can be protected against environmental noise by encoding them into a highly entangled state of many physical qubits and actively intervening in the dynamics with stabilizer measurements. In … Logical qubits can be protected against environmental noise by encoding them into a highly entangled state of many physical qubits and actively intervening in the dynamics with stabilizer measurements. In this work, we numerically optimize the rate of these interventions: the number of stabilizer measurement rounds for a logical qubit encoded in a surface code patch and idling for a given time. We model the environmental noise on the circuit level, including gate errors, readout errors, amplitude and phase damping. We find, qualitatively, that the optimal number of stabilizer measurement rounds is getting smaller for better qubits and getting larger for better gates or larger code sizes. We discuss the implications of our results to some of the leading architectures, superconducting qubits, and neutral atoms.
Our RLibm project has recently proposed methods to generate a single implementation for an elementary function that produces correctly rounded results for multiple rounding modes and representations with up to … Our RLibm project has recently proposed methods to generate a single implementation for an elementary function that produces correctly rounded results for multiple rounding modes and representations with up to 32-bits. They are appealing for developing fast reference libraries without double rounding issues. The key insight is to build polynomial approximations that produce the correctly rounded result for a representation with two additional bits when compared to the largest target representation and with the "non-standard" round-to-odd rounding mode, which makes double rounding the RLibm math library result to any smaller target representation innocuous. The resulting approximations generated by the RLibm approach are implemented with machine supported floating-point operations with the round-to-nearest rounding mode. When an application uses a rounding mode other than the round-to-nearest mode, the RLibm math library saves the application's rounding mode, changes the system's rounding mode to round-to-nearest, computes the correctly rounded result, and restores the application’s rounding mode. This frequent change of rounding modes has a performance cost. This paper proposes two new methods, which we call rounding-invariant outputs and rounding-invariant input bounds, to avoid the frequent changes to the rounding mode and the dependence on the round-to-nearest mode. First, our new rounding-invariant outputs method proposes using the round-to-zero rounding mode to implement RLibm's polynomial approximations. We propose fast, error-free transformations to emulate a round-to-zero result from any standard rounding mode without changing the rounding mode. Second, our rounding-invariant input bounds method factors any rounding error due to different rounding modes using interval bounds in the RLibm pipeline. Both methods make a different set of trade-offs and improve the performance of resulting libraries by more than 2X.
Ariel Kellison , Laura Zielinski , David Bindel +1 more | Proceedings of the ACM on Programming Languages
Backward error analysis offers a method for assessing the quality of numerical programs in the presence of floating-point rounding errors. However, techniques from the numerical analysis literature for quantifying backward … Backward error analysis offers a method for assessing the quality of numerical programs in the presence of floating-point rounding errors. However, techniques from the numerical analysis literature for quantifying backward error require substantial human effort, and there are currently no tools or automated methods for statically deriving sound backward error bounds. To address this gap, we propose Bean, a typed first-order programming language designed to express quantitative bounds on backward error. Bean’s type system combines a graded coeffect system with strict linearity to soundly track the flow of backward error through programs. We prove the soundness of our system using a novel categorical semantics, where every Bean program denotes a triple of related transformations that together satisfy a backward error guarantee. To illustrate Bean’s potential as a practical tool for automated backward error analysis, we implement a variety of standard algorithms from numerical linear algebra in Bean, establishing fine-grained backward error bounds via typing in a compositional style. We also develop a prototype implementation of Bean that infers backward error bounds automatically. Our evaluation shows that these inferred bounds match worst-case theoretical relative backward error bounds from the literature, underscoring Bean’s utility in validating a key property of numerical programs: numerical stability .
Muhammed Emin Başak | Istanbul University - Journal of Electrical & Electronics Engineering
<title>Abstract</title> This paper introduces Lineal, a new C++20 linear algebra library designed to solve large sparse linear systems arising from PDE discretization on attainable hardware by optimizing runtime and especially … <title>Abstract</title> This paper introduces Lineal, a new C++20 linear algebra library designed to solve large sparse linear systems arising from PDE discretization on attainable hardware by optimizing runtime and especially memory consumption. Lineal supports matrix-free linear systems for stencil-based problems, which can store a single value per cell to minimize memory use. This can be paired with MAMGO, Lineal’s Algebraic Multi-Grid implementation based on the algorithm used in DUNE ISTL, using a matrix-free linear system on the finest level and Compressed Sparse Row (CSR) matrices on coarser levels, drastically reducing memory consumption on the finest, and largest, level—a novel approach to our knowledge. It can additionally be combined with Lineal’s very flexible support for mixed precision to minimize memory use and the effect of reduced precision on convergence. Almost all components are fully multithreaded and many support explicit SIMD operations and/or tiling. The main evaluation—solving the stationary diffusion problem using two sets of high-resolution scans of soil samples—shows Lineal’s very good strong scaling and the effectiveness of the optimizations implemented. On the smaller dataset, Lineal outperforms hypre’s BoomerAMG, DUNE ISTL, and Ginkgo, requiring less than half the runtime and memory with CSR matrices and less than a quarter with the hybrid matrix-free approach. Only Lineal can handle the larger dataset on the test system due to its lower memory use. Additional evaluation using the SPE10 Model 2 benchmark further highlights Lineal’s strong performance and potential areas for improvement. MSC Classification: 65F10 , 65F50 , 65N22 , 65N55 , 65Y05 , 65Y15 , 65Y20
Convolution plays a significant role in many scientific and technological computations, such as artificial intelligence and signal processing. Convolutional computations consist of many dot-product operations (multiplication–accumulation, or MAC), for which … Convolution plays a significant role in many scientific and technological computations, such as artificial intelligence and signal processing. Convolutional computations consist of many dot-product operations (multiplication–accumulation, or MAC), for which the Winograd algorithm is currently the most widely used method to reduce the number of MACs. The Karatsuba algorithm, since its introduction in the 1960s, has been traditionally used as a fast arithmetic method to perform multiplication between large-bit-width operands. It had not been exploited to accelerate 2D convolution computations before. In this paper, we revisited the Karatsuba algorithm and exploited it to reduce the number of MACs in 2D convolutions. The matrices are first segmented into tiles in a divide-and-conquer method, and the resulting submatrices are overlapped to construct the final output matrix. Our analysis and benchmarks have shown that for convolution operations of the same dimensions, the Karatsuba algorithm requires the same number of multiplications but fewer additions as compared with the Winograd algorithm. A pseudocode implementation is also provided to demonstrate the complexity reduction in Karatsuba-based convolution. FPGA implementation of Karatsuba-based convolution also achieves 33.6% LUTs (Look -up Tables) reduction compared with Winograd-based implementation.
Sound static data race freedom verification has been a long-standing challenge in the field of programming languages. While actively researched a decade ago, most practical data race detection tools have … Sound static data race freedom verification has been a long-standing challenge in the field of programming languages. While actively researched a decade ago, most practical data race detection tools have since abandoned soundness. Is sound static race freedom verification for real-world C programs a lost cause? In this work, we investigate the obstacles to making significant progress in automated race freedom verification. We selected a benchmark suite of real-world programs and, as our primary contribution, extracted a set of coding idioms that represent fundamental barriers to verification. We expressed these idioms as micro-benchmarks and contributed them as evaluation tasks for the International Competition on Software Verification, SV-COMP. To understand the current state, we measure how sound automated verification tools competing in SV-COMP perform on these idioms and also when used out of the box on the real-world programs. For 8 of the 20 coding idioms, there does exist an automated race freedom verifier that can verify it; however, we also found significant unsoundness in leading verifiers, including Goblint and Deagle. Five of the 7 tools failed to return any result on any real-world benchmarks under our chosen resource limitations, with the remaining 2 tools verifying race freedom for 2 of the 18 programs and crashing or returning inconclusive results on the others. We thus show that state-of-the-art verifiers have both superficial and fundamental barriers to correctly analyzing real-world programs. These barriers constitute the open problems that must be solved to make progress on automated static data race freedom verification.
Multiplication is an arithmetic operation that has a significant impact on the performance of several real-life applications such as digital signals, image processing, and machine learning. The main concern of … Multiplication is an arithmetic operation that has a significant impact on the performance of several real-life applications such as digital signals, image processing, and machine learning. The main concern of electronic system designers is energy optimization with minimal penalties in terms of speed and area for designing portable devices. In this work, a very-large-scale integration (VLSI) design and delay/area performance comparison of array, Wallace tree, and radix-4 Booth multipliers was performed. This study employs different word lengths, with an emphasis on the design of floating-point multipliers. All multiplier circuits were designed and synthesized using Alliance open-source tools in 350 nm process technology with the minimum delay constraint. The findings indicate that the array multiplier has the highest delay and area for all the multiplier sizes. The Wallace multiplier exhibited the lowest delay in the mantissa multiplication of single-precision floating-point numbers. However, no significant difference was observed when compared with the double-precision floating-point multipliers. The Wallace multiplier uses the lowest area in both the single- and double-precision floating-point multipliers.
Probabilistic circuits are a unifying representation of functions as computation graphs of weighted sums and products. Their primary application is in probabilistic modeling, where circuits with non-negative weights (monotone circuits) … Probabilistic circuits are a unifying representation of functions as computation graphs of weighted sums and products. Their primary application is in probabilistic modeling, where circuits with non-negative weights (monotone circuits) can be used to represent and learn density/mass functions, with tractable marginal inference. Recently, it was proposed to instead represent densities as the square of the circuit function (squared circuits); this allows the use of negative weights while retaining tractability, and can be exponentially more expressive efficient than monotone circuits. Unfortunately, we show the reverse also holds, meaning that monotone circuits and squared circuits are incomparable in general. This raises the question of whether we can reconcile, and indeed improve upon the two modeling approaches. We answer in the positive by proposing Inception PCs, a novel type of circuit that naturally encompasses both monotone circuits and squared circuits as special cases, and employs complex parameters. Empirically, we validate that Inception PCs can outperform both monotone and squared circuits on a range of tabular and image datasets.
The Feynman path integral has revolutionized modern approaches to quantum physics. Although the path integral formalism has proven very successful and spawned several approximation schemes, the direct evaluation of real-time … The Feynman path integral has revolutionized modern approaches to quantum physics. Although the path integral formalism has proven very successful and spawned several approximation schemes, the direct evaluation of real-time path integrals is still extremely expensive and numerically delicate due to its high-dimensional and oscillatory nature. We propose an efficient method for the numerical evaluation of the real-time world-line path integral for theories where the potential is dominated by a quadratic at infinity. This is done by rewriting the high-dimensional oscillatory integral in terms of a series of low-dimensional oscillatory integrals, that we efficiently evaluate with Picard-Lefschetz theory or approximate with the eikonal approximation. Subsequently, these integrals are stitched together with a series of fast Fourier transformations to recover the lattice regularized Feynman path integral. Our method directly applies to problems in quantum mechanics, the word-line quantization of quantum field theory, and quantum gravity. Published by the American Physical Society 2025
Abstract We develop the discrete uncertainty principle of the Fourier transform on a finite non Abelian group. Also, we define the Zak basis and subsequently define the finite Zak transform … Abstract We develop the discrete uncertainty principle of the Fourier transform on a finite non Abelian group. Also, we define the Zak basis and subsequently define the finite Zak transform on&amp;#xD;a finite non-Abelian group. Analogous to the conventional finite Zak transform, various properties&amp;#xD;of the Zak transform are studied, and the sparsity of the Zak transform is illustrated through its&amp;#xD;discrete uncertainty principle. It is proven that the Zak transform partially diagonalizes the left regular&amp;#xD;representation into a block diagonal matrix. A few applications of the Zak transform concerning&amp;#xD;quantum information theory and Cayley graphs are discussed. It is observed that, Zak transform&amp;#xD;characterizes a completely positive and trace-preserving operator. Finally, we prove that the Zak&amp;#xD;basis forms the eigenvectors of the adjacency operator of quasi-Abelian Cayley graphs.