Author Description

Login to generate an author description

Ask a Question About This Mathematician

A ``bucket brigade'' architecture for a quantum random memory of $N={2}^{n}$ memory cells needs $n(n+5)/2$ times of quantum manipulation on control circuit nodes per memory call. Here we propose a … A ``bucket brigade'' architecture for a quantum random memory of $N={2}^{n}$ memory cells needs $n(n+5)/2$ times of quantum manipulation on control circuit nodes per memory call. Here we propose a scheme in which only on average $n/2$ times manipulations are required to accomplish a memory call. This scheme may significantly decrease the time spent on a memory call and the average overall error rate per memory call. A physical implementation scheme is discussed for storing an arbitrary state in a selected memory cell followed by reading it out.
Based on the Denoising Diffusion Probabilistic Model (DDPM), medical image segmentation can be described as a conditional image generation task, which allows to compute pixel-wise uncertainty maps of the segmentation … Based on the Denoising Diffusion Probabilistic Model (DDPM), medical image segmentation can be described as a conditional image generation task, which allows to compute pixel-wise uncertainty maps of the segmentation and allows an implicit ensemble of segmentations to boost the segmentation performance. However, DDPM requires many iterative denoising steps to generate segmentations from Gaussian noise, resulting in extremely inefficient inference. To mitigate the issue, we propose a principled acceleration strategy, called pre-segmentation diffusion sampling DDPM (PD-DDPM), which is specially used for medical image segmentation. The key idea is to obtain pre-segmentation results based on a separately trained segmentation network, and construct noise predictions (non-Gaussian distribution) according to the forward diffusion rule. We can then start with noisy predictions and use fewer reverse steps to generate segmentation results. Experiments show that PD-DDPM yields better segmentation results over representative baseline methods even if the number of reverse steps is significantly reduced. Moreover, PD-DDPM is orthogonal to existing advanced segmentation models, which can be combined to further improve the segmentation performance.
Conventional discrete-to-continuum approaches have seen their limitation in describing the collective behavior of the multipolar configurations of dislocations, which are widely observed in crystalline materials. The reason is that dislocation … Conventional discrete-to-continuum approaches have seen their limitation in describing the collective behavior of the multipolar configurations of dislocations, which are widely observed in crystalline materials. The reason is that dislocation dipoles, which play an important role in determining the mechanical properties of crystals, often get smeared out when traditional homogenization methods are applied. To address such difficulties, the collective behavior of a row of dislocation dipoles is studied by using matched asymptotic techniques. The discrete-to-continuum transition is facilitated by introducing two field variables respectively describing the dislocation pair density potential and the dislocation pair width. It is found that the dislocation pair width evolves much faster than the pair density. Such hierarchy in evolution time scales enables us to describe the dislocation dynamics at the coarse-grained level by an evolution equation for the slow-varying variable (the pair density) coupled with an equilibrium equation for the fast-varying variable (the pair width). The time-scale separation method adopted here paves the way for properly incorporating dipole-like (zero net Burgers vector but nonvanishing) dislocation structures, known as the statistically stored dislocations, into macroscopic models of crystal plasticity in three dimensions. Moreover, the natural transition between different equilibrium patterns found here may also shed light on understanding the emergence of the persistent slip bands in fatigue metals induced by cyclic loads.
Materials containing a high proportion of grain boundaries offer significant potential for the development of radiation-resistant structural materials. However, a proper understanding of the connection between the radiation-induced microstructural behavior … Materials containing a high proportion of grain boundaries offer significant potential for the development of radiation-resistant structural materials. However, a proper understanding of the connection between the radiation-induced microstructural behavior of a grain boundary and its impact at long natural time scales is still missing. In this Letter, point defect absorption at interfaces is summarized by a jump Robin-type condition at a coarse-grained level, wherein the role of interface microstructure is effectively taken into account. Then a concise formula linking the sink strength of a polycrystalline aggregate with its grain size is introduced and is well compared with experimental observation. Based on the derived model, a coarse-grained formulation incorporating the coupled evolution of grain boundaries and point defects is proposed, so as to underpin the study of long-time morphological evolution of grains induced by irradiation. Our simulation results suggest that the presence of point defect sources within a grain further accelerates its shrinking process, and radiation tends to trigger the extension of twin boundary sections.
Related DatabasesWeb of Science You must be logged in with an active subscription to view this.Article DataHistorySubmitted: 30 July 2019Accepted: 25 February 2020Published online: 06 May 2020Keywordsgrain boundary, triple junction, … Related DatabasesWeb of Science You must be logged in with an active subscription to view this.Article DataHistorySubmitted: 30 July 2019Accepted: 25 February 2020Published online: 06 May 2020Keywordsgrain boundary, triple junction, disconnection, grain growth, variational Onsager principleAMS Subject Headings35Q74, 74K30, 74E15, 74P10, 49N10, 49S05Publication DataISSN (print): 0036-1399ISSN (online): 1095-712XPublisher: Society for Industrial and Applied MathematicsCODEN: smjmap
The entanglement fidelity provides a measure of how well the entanglement between two subsystems is preserved in a quantum process. By using a simple model, we show that in some … The entanglement fidelity provides a measure of how well the entanglement between two subsystems is preserved in a quantum process. By using a simple model, we show that in some cases this quantity in its original definition fails in the measurement of entanglement preservation. On the contrary, the modified entanglement fidelity, obtained by using a proper local unitary transformation on a subsystem, is shown to exhibit behavior similar to that of the concurrence in quantum evolution.
Using the coordinate transformation method, we solve the one-dimensional Schrödinger equation with position-dependent mass. The explicit expressions for the potentials, energy eigenvalues, and eigenfunctions of the systems are given. The … Using the coordinate transformation method, we solve the one-dimensional Schrödinger equation with position-dependent mass. The explicit expressions for the potentials, energy eigenvalues, and eigenfunctions of the systems are given. The eigenfunctions can be expressed in terms of the Jacobi, Hermite, and generalized Laguerre polynomials. All potentials for these solvable systems have an extra term Vm, which is produced from the dependence of mass on the position, compared with those for the systems of constant mass. The properties of Vm for several mass functions are discussed.
For a simple model we derive analytic expressions of entropy exchange and coherent information, from which relations between them and the concurrence are drawn. We find that in the quantum … For a simple model we derive analytic expressions of entropy exchange and coherent information, from which relations between them and the concurrence are drawn. We find that in the quantum evolution the entropy exchange exhibits behavior opposite to that of the concurrence, whereas the coherent information shows features very similar to those of the concurrence. The meaning of this result for general systems is discussed.
The optical resonance problem is similar to but different from the time-steady Schr\"odinger equation to the point that eigenfunctions in resonance problems are exponentially growing. We introduce the perfectly-matched-layer method … The optical resonance problem is similar to but different from the time-steady Schr\"odinger equation to the point that eigenfunctions in resonance problems are exponentially growing. We introduce the perfectly-matched-layer method and the complex stretching technique to transform eigenfunctions from exponential growth to exponential decay. Accordingly, we construct a Hamiltonian operator to calculate eigenstates of optical resonance systems. We successfully apply our method to calculate the eigenvalues for whispering-gallery modes and the results perfectly agree with existing theory that is developed only for regularly shaped cavities. We also apply the method to investigate the mode evolution near exceptional points---a special phenomenon that only happens in non-Hermitian systems. The presenting method is applicable to optical resonance systems with arbitrary dielectric distributions.
The properties of the s-wave for a quasi-free particle with position-dependent mass (PDM) have been discussed in details. Differed from the system with constant mass in which the localization of … The properties of the s-wave for a quasi-free particle with position-dependent mass (PDM) have been discussed in details. Differed from the system with constant mass in which the localization of the s-wave for the free quantum particle around the origin only occurs in two dimensions, the quasi-free particle with PDM can experience attractive forces in D dimensions except D = 1 when its mass function satisfies some conditions. The effective mass of a particle varying with its position can induce effective interaction, which may be attractive in some cases. The analytical expressions of the eigenfunctions and the corresponding probability densities for the s-waves of the two- and three-dimensional systems with a special PDM are given, and the existences of localization around the origin for these systems are shown.
A continuum model of the two dimensional low angle grain boundary motion and the dislocation structure evolution on the grain boundaries has been developed in [L. Zhang and Y. Xiang, … A continuum model of the two dimensional low angle grain boundary motion and the dislocation structure evolution on the grain boundaries has been developed in [L. Zhang and Y. Xiang, J. Mech. Phys. Solids, 117 (2018), pp. 157--178]. The model is based on the motion and reaction of the constituent dislocations of the grain boundaries. The long-range elastic interaction between dislocations is included in the continuum model, and it maintains a stable dislocation structure described by Frank's formula for grain boundaries. In this paper, we develop a new continuum model for the coupling and sliding motions of grain boundaries that avoids the time-consuming calculation of the long-range elastic interaction. In this model, the long-range elastic interaction is replaced by a constraint of Frank's formula. The constrained evolution problem in our new continuum model is further solved by using the projection method. Effects of the coupling and sliding motions in our new continuum model and relationship with the classical motion by curvature model are discussed. The continuum model is validated by comparisons with discrete dislocation dynamics model and the early continuum model [L. Zhang and Y. Xiang, J. Mech. Phys. Solids, 117 (2018), pp. 157--178] in which the long-range dislocation interaction is explicitly calculated.
High entropy alloys (HEAs) are single phase crystals that consist of random solid solutions of multiple elements in approximately equal proportions. This class of novel materials have exhibited superb mechanical … High entropy alloys (HEAs) are single phase crystals that consist of random solid solutions of multiple elements in approximately equal proportions. This class of novel materials have exhibited superb mechanical properties, such as high strength combined with other desired features. The strength of crystalline materials is associated with the motion of dislocations. In this paper, we derive a stochastic continuum model based on the Peierls--Nabarro framework for interlayer dislocations in a bilayer HEA from an atomistic model that incorporates the atomic level randomness. We use asymptotic analysis and limit theorem in the convergence from the atomistic model to the continuum model. The total energy in the continuum model consists of a stochastic elastic energy in the two layers, and a stochastic misfit energy that accounts for the interlayer nonlinear interaction. The obtained continuum model can be considered as a stochastic generalization of the classical, deterministic Peierls--Nabarro model for the dislocation core and related properties. This derivation also validates the stochastic model adopted by Zhang et al. [Acta Mater., 166 (2019), pp. 424--434].
Quantum mechanics is not the unique no-signaling theory which is endowed with stronger-than-classical correlations, and there exists a broad class of no-signaling theories allowing even stronger-than-quantum correlations. The principle of … Quantum mechanics is not the unique no-signaling theory which is endowed with stronger-than-classical correlations, and there exists a broad class of no-signaling theories allowing even stronger-than-quantum correlations. The principle of information causality has been suggested to distinguish quantum theory from these nonphysical theories, together with an elegant information-theoretic proof of the quantum bound of two-particle correlations. In this work, we extend this to genuine $N$-particle correlations that cannot be reduced to mixtures of states in which a smaller number of particles are entangled. We first express Svetlichny's inequality in terms of multipartite no-signaling boxes, then prove that the strongest genuine multipartite correlations lead to the maximal violation of information causality. The maximal genuine multipartite correlations under the constraint of information causality is found to be equal to the quantum mechanical bound. This result consolidates information causality as a physical principle defining the possible correlations allowed by nature, and provides intriguing insights into the limits of genuine multipartite correlations in quantum theory.
We give a analytic quantitative relation between Hardy's non-locality and Bell operator. We find that Hardy's non-locality is a sufficient condition for violation of Bell inequality, the upper bound of … We give a analytic quantitative relation between Hardy's non-locality and Bell operator. We find that Hardy's non-locality is a sufficient condition for violation of Bell inequality, the upper bound of Hardy's non-locality allowed by information causality just correspond to Tsirelson bound of Bell inequality, and the upper bound of Hardy's non-locality allowed by the principle of no-signaling just correspond to the algebraic maximum of Bell operator. Then we study the Cabello's argument of Hardy's non-locality (a generalization of Hardy's argument) and find a similar relation between it and violation of Bell inequality. Finally, we give a simple derivation of the bound of Hardy's non-locality under the constraint of information causality with the aid of above derived relation between Hardy's non-locality and Bell operator, this bound is the main result of a recent work of Ahanj \emph{et al.} [Phys. Rev. A {\bf81}, 032103(2010)].
The excellent generalization, contextual learning, and emergence abilities in the pre-trained large models (PLMs) handle specific tasks without direct training data, making them the better foundation models in the adversarial … The excellent generalization, contextual learning, and emergence abilities in the pre-trained large models (PLMs) handle specific tasks without direct training data, making them the better foundation models in the adversarial domain adaptation (ADA) methods to transfer knowledge learned from the source domain to target domains. However, existing ADA methods fail to account for the confounder properly, which is the root cause of the source data distribution that differs from the target domains. This study proposes a confounder balancing method in adversarial domain adaptation for PLMs fine-tuning (CadaFT), which includes a PLM as the foundation model for a feature extractor, a domain classifier and a confounder classifier, and they are jointly trained with an adversarial loss. This loss is designed to improve the domain-invariant representation learning by diluting the discrimination in the domain classifier. At the same time, the adversarial loss also balances the confounder distribution among source and unmeasured domains in training. Compared to newest ADA methods, CadaFT can correctly identify confounders in domain-invariant features, thereby eliminating the confounder biases in the extracted features from PLMs. The confounder classifier in CadaFT is designed as a plug-and-play and can be applied in the confounder measurable, unmeasurable, or partially measurable environments. Empirical results on natural language processing and computer vision downstream tasks show that CadaFT outperforms the newest GPT-4, LLaMA2, ViT and ADA methods.
In the case of bipartite two qubits systems, we derive the analytical expression of bound of Bell operator for any given pure state. Our result not only manifest some properties … In the case of bipartite two qubits systems, we derive the analytical expression of bound of Bell operator for any given pure state. Our result not only manifest some properties of Bell inequality, for example which may be violated by any pure entangled state and only be maximally violated for a maximally entangled state, but also give the explicit values of maximal violation for any pure state. Finally we point out that for two qubits systems there is no mixed state which can produce maximal violation of Bell inequality.
In this paper, we perform mathematical validation of the Peierls--Nabarro (PN) models, which are multiscale models of dislocations that incorporate the detailed dislocation core structure. We focus on the static … In this paper, we perform mathematical validation of the Peierls--Nabarro (PN) models, which are multiscale models of dislocations that incorporate the detailed dislocation core structure. We focus on the static and dynamic PN models of an edge dislocation. In a PN model, the total energy includes the elastic energy in the two half-space continua and a nonlinear potential energy across the slip plane, which is always infinite. We rigorously establish the relationship between the PN model in the full space and the reduced problem on the slip plane in terms of both governing equations and energy variations. The shear displacement jump is determined only by the reduced problem on the slip plane while the displacement fields in the two half spaces are determined by linear elasticity. We establish the existence and sharp regularities of classical solutions in Hilbert space. For both the reduced problem and the full PN model, we prove that a static solution is a global minimizer in perturbed sense. We also show that there is a unique classical, global in time solution of the dynamic PN model.
In this paper, we perform mathematical validation of the Peierls--Nabarro (PN) models, which are multiscale models of dislocations that incorporate the detailed dislocation core structure. We focus on the static … In this paper, we perform mathematical validation of the Peierls--Nabarro (PN) models, which are multiscale models of dislocations that incorporate the detailed dislocation core structure. We focus on the static and dynamic PN models of an edge dislocation. In a PN model, the total energy includes the elastic energy in the two half-space continua and a nonlinear potential energy across the slip plane, which is always infinite. We rigorously establish the relationship between the PN model in the full space and the reduced problem on the slip plane in terms of both governing equations and energy variations. The shear displacement jump is determined only by the reduced problem on the slip plane while the displacement fields in the two half spaces are determined by linear elasticity. We establish the existence and sharp regularities of classical solutions in Hilbert space. For both the reduced problem and the full PN model, we prove that a static solution is a global minimizer in perturbed sense. We also show that there is a unique classical, global in time solution of the dynamic PN model.
Conventional discrete-to-continuum approaches have seen their limitation in describing the collective behaviour of the multi-polar configurations of dislocations, which are widely observed in crystalline materials. The reason is that dislocation … Conventional discrete-to-continuum approaches have seen their limitation in describing the collective behaviour of the multi-polar configurations of dislocations, which are widely observed in crystalline materials. The reason is that dislocation dipoles, which play an important role in determining the mechanical properties of crystals, often get smeared out when traditional homogenisation methods are applied. To address such difficulties, the collective behaviour of a row of dislocation dipoles is studied by using matched asymptotic techniques. The discrete-to-continuum transition is facilitated by introducing two field variables respectively describing the dislocation pair density potential and the dislocation pair width. It is found that the dislocation pair width evolves much faster than the pair density. Such hierarchy in evolution time scales enables us to describe the dislocation dynamics at the coarse-grained level by an evolution equation for the slowly varying variable (the pair density) coupled with an equilibrium equation for the fast varying variable (the pair width). The time-scale separation method adopted here paves a way for properly incorporating dipole-like (zero net Burgers vector but non-vanishing) dislocation structures, known as the statistically stored dislocations (SSDs) into macroscopic models of crystal plasticity in three dimensions. Moreover, the natural transition between different equilibrium patterns found here may also shed light on understanding the emergence of the persistent slip bands (PSBs) in fatigue metals induced by cyclic loads.
Dislocations are the main carriers of the permanent deformation of crystals. For simulations of engineering applications, continuum models where material microstructures are represented by continuous density distributions of dislocations are … Dislocations are the main carriers of the permanent deformation of crystals. For simulations of engineering applications, continuum models where material microstructures are represented by continuous density distributions of dislocations are preferred. It is challenging to capture in the continuum model the short-range dislocation interactions, which vanish after the standard averaging procedure from discrete dislocation models. In this study, we consider systems of parallel straight dislocation walls and develop continuum descriptions for the short-range interactions of dislocations by using asymptotic analysis. The obtained continuum short-range interaction formulas are incorporated in the continuum model for dislocation dynamics based on a pair of dislocation density potential functions that represent continuous distributions of dislocations. This derived continuum model is able to describe the anisotropic dislocation interaction and motion. Mathematically, these short-range interaction terms ensure strong stability property of the continuum model that is possessed by the discrete dislocation dynamics model. The derived continuum model is validated by comparisons with the discrete dislocation dynamical simulation results.
Deep learning techniques have shown their success in medical image segmentation since they are easy to manipulate and robust to various types of datasets. The commonly used loss functions in … Deep learning techniques have shown their success in medical image segmentation since they are easy to manipulate and robust to various types of datasets. The commonly used loss functions in the deep segmentation task are pixel-wise loss functions. This results in a bottleneck for these models to achieve high precision for complicated structures in biomedical images. For example, the predicted small blood vessels in retinal images are often disconnected or even missed under the supervision of the pixel-wise losses. This paper addresses this problem by introducing a long-range elastic interaction-based training strategy. In this strategy, convolutional neural network (CNN) learns the target region under the guidance of the elastic interaction energy between the boundary of the predicted region and that of the actual object. Under the supervision of the proposed loss, the boundary of the predicted region is attracted strongly by the object boundary and tends to stay connected. Experimental results show that our method is able to achieve considerable improvements compared to commonly used pixel-wise loss functions (cross entropy and dice Loss) and other recent loss functions on three retinal vessel segmentation datasets, DRIVE, STARE and CHASEDB1.
For two qubits belonging to Alice and Bob, we derive an approach to set up the bound of the Bell operator in the condition that Alice and Bob continue to … For two qubits belonging to Alice and Bob, we derive an approach to set up the bound of the Bell operator in the condition that Alice and Bob continue to perform local vertical measurements. For pure states we find that if the entanglement of the two qubits is less than 0.2644 (measured with von Neumann entropy) the violation of the Bell inequality will never be realized, and only when the entanglement is equal to 1 can the maximal violation $(2\sqrt{2})$ occur. For a specific form of mixed states, we prove that the bound of the Bell inequality depends on the concurrence. Only when the concurrence is greater than 0.6 can the violation of the Bell inequality occur and the maximal violation can never be achieved. We suggest that the bound of the Bell operator in the condition of local vertical measurements may be used as a measure of the entanglement.
Using the coordinate transformation method, we solve the one-dimensional Schrödinger equation with position-dependent mass(PDM). The explicit expressions for the potentials, energy eigenvalues and eigenfunctions of the systems are given. The … Using the coordinate transformation method, we solve the one-dimensional Schrödinger equation with position-dependent mass(PDM). The explicit expressions for the potentials, energy eigenvalues and eigenfunctions of the systems are given. The eigenfunctions can be expressed in terms of the Jacobi, Hemite and generalized Laguerre polynomials. All potentials for these solvable systems have an extra term $V_m$ which produced from the dependence of mass on the coordinate, compared with that for the systems of constant mass. The properties of $V_m$ for several mass functions are discussed.
As a crucial scheme to accelerate the deep neural network (DNN) training, distributed stochastic gradient descent (DSGD) is widely adopted in many real-world applications. In most distributed deep learning (DL) … As a crucial scheme to accelerate the deep neural network (DNN) training, distributed stochastic gradient descent (DSGD) is widely adopted in many real-world applications. In most distributed deep learning (DL) frameworks, DSGD is implemented with Ring-AllReduce architecture (Ring-SGD) and uses a computation-communication overlap strategy to address the overhead of the massive communications required by DSGD. However, we observe that although $O(1)$ gradients are needed to be communicated per worker in Ring-SGD, the $O(n)$ handshakes required by Ring-SGD limits its usage when training with many workers or in high latency network. In this paper, we propose Shuffle-Exchange SGD (SESGD) to solve the dilemma of Ring-SGD. In the cluster of 16 workers with 0.1ms Ethernet latency, SESGD can accelerate the DNN training to $1.7 \times$ without losing model accuracy. Moreover, the process can be accelerated up to $5\times$ in high latency networks (5ms).
We present a continuum model to determine the dislocation structure and energy of low angle grain boundaries in three dimensions. The equilibrium dislocation structure is obtained by minimizing the grain … We present a continuum model to determine the dislocation structure and energy of low angle grain boundaries in three dimensions. The equilibrium dislocation structure is obtained by minimizing the grain boundary energy that is associated with the constituent dislocations subject to the constraint of Frank's formula. The orientation-dependent continuous distributions of dislocation lines on grain boundaries are described conveniently using the dislocation density potential functions, whose contour lines on the grain boundaries represent the dislocations. The energy of a grain boundary is the total energy of the constituent dislocations derived from a discrete dislocation dynamics model, incorporating both the dislocation line energy and reactions of dislocations. The constrained energy minimization problem is solved by the augmented Lagrangian method and projection method. Comparisons with atomistic simulation results show that our continuum model is able to give excellent predictions of the energy and dislocation densities of both planar and curved low angle grain boundaries.
In this paper, we establish a neural network to approximate functionals, which are maps from infinite dimensional spaces to finite dimensional spaces. The approximation error of the neural network is … In this paper, we establish a neural network to approximate functionals, which are maps from infinite dimensional spaces to finite dimensional spaces. The approximation error of the neural network is $O(1/\sqrt{m})$ where $m$ is the size of networks, which overcomes the curse of dimensionality. The key idea of the approximation is to define a Barron spectral space of functionals.
Pre-trained large transformer models have achieved remarkable performance in the fields of natural language processing and computer vision. Since the magnitude of available labeled electroencephalogram (EEG) data is much lower … Pre-trained large transformer models have achieved remarkable performance in the fields of natural language processing and computer vision. Since the magnitude of available labeled electroencephalogram (EEG) data is much lower than that of text and image data, it is difficult for transformer models pre-trained from EEG to be developed as large as GPT-4 100T to fully unleash the potential of this architecture. In this paper, we show that transformers pre-trained from images as well as text can be directly fine-tuned for EEG-based prediction tasks. We design AdaCE, plug-and-play Adapters for Converting EEG data into image as well as text forms, to fine-tune pre-trained vision and language transformers. The proposed AdaCE module is highly effective for fine-tuning pre-trained transformers while achieving state-of-the-art performance on diverse EEG-based prediction tasks. For example, AdaCE on the pre-trained Swin-Transformer achieves 99.6%, an absolute improvement of 9.2%, on the EEG-decoding task of human activity recognition (UCI HAR). Furthermore, we empirically show that applying the proposed AdaCE to fine-tune larger pre-trained models can achieve better performance on EEG-based predicting tasks, indicating the potential of our adapters for even larger transformers. The plug-and-play AdaCE module can be applied to fine-tuning most of the popular pre-trained transformers on many other time-series data with multiple channels, not limited to EEG data and the models we use. Our code will be available at https://github.com/wangbxj1234/AdaCE.
Lowering the memory requirement in full-parameter training on large models has become a hot research area. MeZO fine-tunes the large language models (LLMs) by just forward passes in a zeroth-order … Lowering the memory requirement in full-parameter training on large models has become a hot research area. MeZO fine-tunes the large language models (LLMs) by just forward passes in a zeroth-order SGD optimizer (ZO-SGD), demonstrating excellent performance with the same GPU memory usage as inference. However, the simulated perturbation stochastic approximation for gradient estimate in MeZO leads to severe oscillations and incurs a substantial time overhead. Moreover, without momentum regularization, MeZO shows severe over-fitting problems. Lastly, the perturbation-irrelevant momentum on ZO-SGD does not improve the convergence rate. This study proposes ZO-AdaMU to resolve the above problems by adapting the simulated perturbation with momentum in its stochastic approximation. Unlike existing adaptive momentum methods, we relocate momentum on simulated perturbation in stochastic gradient approximation. Our convergence analysis and experiments prove this is a better way to improve convergence stability and rate in ZO-SGD. Extensive experiments demonstrate that ZO-AdaMU yields better generalization for LLMs fine-tuning across various NLP tasks than MeZO and its momentum variants.
With the development of laser technology, pulse length enters the optical cycle regime and hence the interaction time between laser pulse and atoms becomes prominent. We investigate this problem in … With the development of laser technology, pulse length enters the optical cycle regime and hence the interaction time between laser pulse and atoms becomes prominent. We investigate this problem in this Letter through the photoelectron spectrum of hydrogen atom in few-cycle xuv laser pulses. By solving one-dimensional time-dependent Schrödinger equation, we find that due to the insufficient interaction time, the electron can not gain enough energy from optical field when escaping the bind of the nuclear and then the abnormality appears in the photoelectron spectrum: the peak of photoelectron spectrum shows red shift compared with the well-known Einstein photo-electric effect formula. The shift becomes large as the pulse duration decreases.
The degree of entanglement between two spins may change due to interaction. Regarding this, we find that a wrong result in a recent work by Ge and Wadati [Phys. Rev. … The degree of entanglement between two spins may change due to interaction. Regarding this, we find that a wrong result in a recent work by Ge and Wadati [Phys. Rev. A 72, 052101 (2005)] would breach the basic principle.
The 3rd Workshop on Maritime Computer Vision (MaCVi) 2025 addresses maritime computer vision for Unmanned Surface Vehicles (USV) and underwater. This report offers a comprehensive overview of the findings from … The 3rd Workshop on Maritime Computer Vision (MaCVi) 2025 addresses maritime computer vision for Unmanned Surface Vehicles (USV) and underwater. This report offers a comprehensive overview of the findings from the challenges. We provide both statistical and qualitative analyses, evaluating trends from over 700 submissions. All datasets, evaluation code, and the leaderboard are available to the public at https://macvi.org/workshop/macvi25.
In this paper, we study a constrained minimization problem that arise from materials science to determine the dislocation (line defect) structure of grain boundaries. The problems aims to minimize the … In this paper, we study a constrained minimization problem that arise from materials science to determine the dislocation (line defect) structure of grain boundaries. The problems aims to minimize the energy of the grain boundary with dislocation structure subject to the constraint of Frank's formula. In this constrained minimization problem, the objective function, i.e., the grain boundary energy, is nonconvex and separable, and the constraints are linear. To solve this constrained minimization problem, we modify the alternating direction method of multipliers (ADMM) with an increasing penalty parameter. We provide a convergence analysis of the modified ADMM in this nonconvex minimization problem, with settings not considered by the existing ADMM convergence studies. Specifically, in the linear constraints, the coefficient matrix of each subvariable block is of full column rank. This property makes each subvariable minimization strongly convex if the penalty parameter is large enough, and contributes to the convergence of ADMM without any convex assumption on the entire objective function. We prove that the limit of the sequence from the modified ADMM is primal feasible and is the stationary point of the augmented Lagrangian function. Furthermore, we obtain sufficient conditions to show that the objective function is quasi-convex and thus it has a unique minimum over the given domain. Numerical examples are presented to validate the convergence of the algorithm, and results of the penalty method, the augmented Lagrangian method, and the modified ADMM are compared.
Recent experiments, atomistic simulations, and theoretical predictions have identified various new types of grain boundary motions that are controlled by the dynamics of underlying microstructure of line defects (dislocations or … Recent experiments, atomistic simulations, and theoretical predictions have identified various new types of grain boundary motions that are controlled by the dynamics of underlying microstructure of line defects (dislocations or disconnections), to which the classical motion by mean curvature model does not apply. Different continuum models have been developed by upscaling from discrete line defect dynamics models under different settings (dislocations or disconnections, low angle grain boundaries or high angle grain boundaries, etc.), to account for the specific detailed natures of the microscopic dynamics mechanisms, and these continuum models are not in the variational form. In this paper, we propose a unified variational framework to account for all the underlying line defect mechanisms for the dynamics of both low and high angle grain boundaries and the associated grain rotations. The variational formulation is based on the developed constraints of the dynamic Frank-Bilby equations that govern the microscopic line defect structures. The proposed variational framework is able to recover the available models for different motions under different conditions. The unified variational framework is more efficient to describe the collective behaviors of grain boundary networks at larger length scales. It also provides a mathematically tractable basis for rigorous analysis of these partial differential equation models and for the development of efficient numerical methods.
We revisit the unified two-timescale Q-learning algorithm as initially introduced by Angiuli et al. \cite{angiuli2022unified}. This algorithm demonstrates efficacy in solving mean field game (MFG) and mean field control (MFC) … We revisit the unified two-timescale Q-learning algorithm as initially introduced by Angiuli et al. \cite{angiuli2022unified}. This algorithm demonstrates efficacy in solving mean field game (MFG) and mean field control (MFC) problems, simply by tuning the ratio of two learning rates for mean field distribution and the Q-functions respectively. In this paper, we provide a comprehensive theoretical explanation of the algorithm's bifurcated numerical outcomes under fixed learning rates. We achieve this by establishing a diagram that correlates continuous-time mean field problems to their discrete-time Q-function counterparts, forming the basis of the algorithm. Our key contribution lies in the construction of a Lyapunov function integrating both mean field distribution and Q-function iterates. This Lyapunov function facilitates a unified convergence of the algorithm across the entire spectrum of learning rates, thus providing a cohesive framework for analysis.
Lowering the memory requirement in full-parameter training on large models has become a hot research area. MeZO fine-tunes the large language models (LLMs) by just forward passes in a zeroth-order … Lowering the memory requirement in full-parameter training on large models has become a hot research area. MeZO fine-tunes the large language models (LLMs) by just forward passes in a zeroth-order SGD optimizer (ZO-SGD), demonstrating excellent performance with the same GPU memory usage as inference. However, the simulated perturbation stochastic approximation for gradient estimate in MeZO leads to severe oscillations and incurs a substantial time overhead. Moreover, without momentum regularization, MeZO shows severe over-fitting problems. Lastly, the perturbation-irrelevant momentum on ZO-SGD does not improve the convergence rate. This study proposes ZO-AdaMU to resolve the above problems by adapting the simulated perturbation with momentum in its stochastic approximation. Unlike existing adaptive momentum methods, we relocate momentum on simulated perturbation in stochastic gradient approximation. Our convergence analysis and experiments prove this is a better way to improve convergence stability and rate in ZO-SGD. Extensive experiments demonstrate that ZO-AdaMU yields better generalization for LLMs fine-tuning across various NLP tasks than MeZO and its momentum variants.
The excellent generalization, contextual learning, and emergence abilities in the pre-trained large models (PLMs) handle specific tasks without direct training data, making them the better foundation models in the adversarial … The excellent generalization, contextual learning, and emergence abilities in the pre-trained large models (PLMs) handle specific tasks without direct training data, making them the better foundation models in the adversarial domain adaptation (ADA) methods to transfer knowledge learned from the source domain to target domains. However, existing ADA methods fail to account for the confounder properly, which is the root cause of the source data distribution that differs from the target domains. This study proposes a confounder balancing method in adversarial domain adaptation for PLMs fine-tuning (CadaFT), which includes a PLM as the foundation model for a feature extractor, a domain classifier and a confounder classifier, and they are jointly trained with an adversarial loss. This loss is designed to improve the domain-invariant representation learning by diluting the discrimination in the domain classifier. At the same time, the adversarial loss also balances the confounder distribution among source and unmeasured domains in training. Compared to newest ADA methods, CadaFT can correctly identify confounders in domain-invariant features, thereby eliminating the confounder biases in the extracted features from PLMs. The confounder classifier in CadaFT is designed as a plug-and-play and can be applied in the confounder measurable, unmeasurable, or partially measurable environments. Empirical results on natural language processing and computer vision downstream tasks show that CadaFT outperforms the newest GPT-4, LLaMA2, ViT and ADA methods.
High entropy alloys (HEAs) are a class of novel materials that exhibit superb engineering properties. It has been demonstrated by extensive experiments and first principles/atomistic simulations that short-range order in … High entropy alloys (HEAs) are a class of novel materials that exhibit superb engineering properties. It has been demonstrated by extensive experiments and first principles/atomistic simulations that short-range order in the atomic level randomness strongly influences the properties of HEAs. In this paper, we derive stochastic continuum models for HEAs with short-range order from atomistic models. A proper continuum limit is obtained such that the mean and variance of the atomic level randomness together with the short-range order described by a characteristic length are kept in the process from the atomistic interaction model to the continuum equation. The obtained continuum model with short-range order is in the form of an Ornstein–Uhlenbeck (OU) process. This validates the continuum model based on the OU process adopted phenomenologically by Zhang et al. [Acta Mater., 166 (2019), pp. 424–434] for HEAs with short-range order. We derive such stochastic continuum models with short-range order for both (i) the elastic deformation in HEAs without defects and (ii) HEAs with dislocations (line defects). The obtained stochastic continuum models are based on the energy formulations, whose variations lead to stochastic partial differential equations.
Based on the Denoising Diffusion Probabilistic Model (DDPM), medical image segmentation can be described as a conditional image generation task, which allows to compute pixel-wise uncertainty maps of the segmentation … Based on the Denoising Diffusion Probabilistic Model (DDPM), medical image segmentation can be described as a conditional image generation task, which allows to compute pixel-wise uncertainty maps of the segmentation and allows an implicit ensemble of segmentations to boost the segmentation performance. However, DDPM requires many iterative denoising steps to generate segmentations from Gaussian noise, resulting in extremely inefficient inference. To mitigate the issue, we propose a principled acceleration strategy, called pre-segmentation diffusion sampling DDPM (PD-DDPM), which is specially used for medical image segmentation. The key idea is to obtain pre-segmentation results based on a separately trained segmentation network, and construct noise predictions (non-Gaussian distribution) according to the forward diffusion rule. We can then start with noisy predictions and use fewer reverse steps to generate segmentation results. Experiments show that PD-DDPM yields better segmentation results over representative baseline methods even if the number of reverse steps is significantly reduced. Moreover, PD-DDPM is orthogonal to existing advanced segmentation models, which can be combined to further improve the segmentation performance.
Dislocation climb plays an important role in understanding plastic deformation of metallic materials at high temperature. In this paper, we present a continuum formulation for dislocation climb velocity based on … Dislocation climb plays an important role in understanding plastic deformation of metallic materials at high temperature. In this paper, we present a continuum formulation for dislocation climb velocity based on densities of dislocations. The obtained continuum formulation is an accurate approximation of the Green's function based discrete dislocation dynamics method (Gu et al. J. Mech. Phys. Solids 83:319-337, 2015). The continuum dislocation climb formulation has the advantage of accounting for both the long-range effect of vacancy bulk diffusion and that of the Peach-Koehler climb force, and the two longrange effects are canceled into a short-range effect (integral with fast-decaying kernel) and in some special cases, a completely local effect. This significantly simplifies the calculation in the Green's function based discrete dislocation dynamics method, in which a linear system has to be solved over the entire system for the long-range effect of vacancy diffusion and the long-range Peach-Koehler climb force has to be calculated. This obtained continuum dislocation climb velocity can be applied in any available continuum dislocation dynamics frameworks. We also present numerical validations for this continuum climb velocity and simulation examples for implementation in continuum dislocation dynamics frameworks.
The Merriman-Bence-Osher threshold dynamics method is an efficient algorithm to simulate the motion by mean curvature. It has the advantages of being easy to implement and with high efficiency. In … The Merriman-Bence-Osher threshold dynamics method is an efficient algorithm to simulate the motion by mean curvature. It has the advantages of being easy to implement and with high efficiency. In this paper, we propose a threshold dynamics method for dislocation dynamics in a slip plane, in which the spatial operator is essentially an anisotropic fractional Laplacian. We show that this threshold dislocation dynamics method is able to give { two correct leading orders} in dislocation velocity, including both the $O(\log\varepsilon)$ local curvature force and the $O(1)$ nonlocal force due to the long-range stress field generated by the dislocations as well as the force due to the applied stress, where $\varepsilon$ is the dislocation core size, { if the time step is set to be $\Delta t=\varepsilon$. This generalizes the available result of threshold dynamics with the corresponding fractional Laplacian, which is on the leading order $O(\log\Delta t)$ local curvature velocity under the isotropic kernel.} We also propose a numerical method based on spatial variable stretching to correct the mobility and to rescale the velocity for efficient and accurate simulations, which can be applied generally to any threshold dynamics method. We validate the proposed threshold dislocation dynamics method by numerical simulations of various motions and interaction of dislocations.
Pre-trained large transformer models have achieved remarkable performance in the fields of natural language processing and computer vision. Since the magnitude of available labeled electroencephalogram (EEG) data is much lower … Pre-trained large transformer models have achieved remarkable performance in the fields of natural language processing and computer vision. Since the magnitude of available labeled electroencephalogram (EEG) data is much lower than that of text and image data, it is difficult for transformer models pre-trained from EEG to be developed as large as GPT-4 100T to fully unleash the potential of this architecture. In this paper, we show that transformers pre-trained from images as well as text can be directly fine-tuned for EEG-based prediction tasks. We design AdaCE, plug-and-play Adapters for Converting EEG data into image as well as text forms, to fine-tune pre-trained vision and language transformers. The proposed AdaCE module is highly effective for fine-tuning pre-trained transformers while achieving state-of-the-art performance on diverse EEG-based prediction tasks. For example, AdaCE on the pre-trained Swin-Transformer achieves 99.6%, an absolute improvement of 9.2%, on the EEG-decoding task of human activity recognition (UCI HAR). Furthermore, we empirically show that applying the proposed AdaCE to fine-tune larger pre-trained models can achieve better performance on EEG-based predicting tasks, indicating the potential of our adapters for even larger transformers. The plug-and-play AdaCE module can be applied to fine-tuning most of the popular pre-trained transformers on many other time-series data with multiple channels, not limited to EEG data and the models we use. Our code will be available at https://github.com/wangbxj1234/AdaCE.
In this paper, we propose an energy stable network (EStable-Net) for solving gradient flow equations. The solution update scheme in our neural network EStable-Net is inspired by a proposed auxiliary … In this paper, we propose an energy stable network (EStable-Net) for solving gradient flow equations. The solution update scheme in our neural network EStable-Net is inspired by a proposed auxiliary variable based equivalent form of the gradient flow equation. EStable-Net enables decreasing of a discrete energy along the neural network, which is consistent with the property in the evolution process of the gradient flow equation. The architecture of the neural network EStable-Net consists of a few energy decay blocks, and the output of each block can be interpreted as an intermediate state of the evolution process of the gradient flow equation. This design provides a stable, efficient and interpretable network structure. Numerical experimental results demonstrate that our network is able to generate high accuracy and stable predictions.
Urban segmentation and lane detection are two important tasks for traffic scene perception. Accuracy and fast inference speed of visual perception are crucial for autonomous driving safety. Fine and complex … Urban segmentation and lane detection are two important tasks for traffic scene perception. Accuracy and fast inference speed of visual perception are crucial for autonomous driving safety. Fine and complex geometric objects are the most challenging but important recognition targets in traffic scene, such as pedestrians, traffic signs and lanes. In this paper, a simple and efficient topology-aware energy loss function-based network training strategy named EIEGSeg is proposed. EIEGSeg is designed for multi-class segmentation on real-time traffic scene perception. To be specific, the convolutional neural network (CNN) extracts image features and produces multiple outputs, and the elastic interaction energy loss function (EIEL) drives the predictions moving toward the ground truth until they are completely overlapped. Our strategy performs well especially on fine-scale structure, \textit{i.e.} small or irregularly shaped objects can be identified more accurately, and discontinuity issues on slender objects can be improved. We quantitatively and qualitatively analyze our method on three traffic datasets, including urban scene segmentation data Cityscapes and lane detection data TuSimple and CULane. Our results demonstrate that EIEGSeg consistently improves the performance, especially on real-time, lightweight networks that are better suited for autonomous driving.
A backdoor attack in deep learning inserts a hidden backdoor in the model to trigger malicious behavior upon specific input patterns. Existing detection approaches assume a metric space (for either … A backdoor attack in deep learning inserts a hidden backdoor in the model to trigger malicious behavior upon specific input patterns. Existing detection approaches assume a metric space (for either the original inputs or their latent representations) in which normal samples and malicious samples are separable. We show that this assumption has a severe limitation by introducing a novel SSDT (Source-Specific and Dynamic-Triggers) backdoor, which obscures the difference between normal samples and malicious samples. To overcome this limitation, we move beyond looking for a perfect metric space that would work for different deep-learning models, and instead resort to more robust topological constructs. We propose TED (Topological Evolution Dynamics) as a model-agnostic basis for robust backdoor detection. The main idea of TED is to view a deep-learning model as a dynamical system that evolves inputs to outputs. In such a dynamical system, a benign input follows a natural evolution trajectory similar to other benign inputs. In contrast, a malicious sample displays a distinct trajectory, since it starts close to benign samples but eventually shifts towards the neighborhood of attacker-specified target samples to activate the backdoor. Extensive evaluations are conducted on vision and natural language datasets across different network architectures. The results demonstrate that TED not only achieves a high detection rate, but also significantly outperforms existing state-of-the-art detection approaches, particularly in addressing the sophisticated SSDT attack. The code to reproduce the results is made public on GitHub.
The task of lane detection involves identifying the boundaries of driving areas in real-time. Recognizing lanes with variable and complex geometric structures remains a challenge. In this paper, we explore … The task of lane detection involves identifying the boundaries of driving areas in real-time. Recognizing lanes with variable and complex geometric structures remains a challenge. In this paper, we explore a novel and flexible way of implicit lanes representation named \textit{Elastic Lane map (ELM)}, and introduce an efficient physics-informed end-to-end lane detection framework, namely, ElasticLaneNet (Elastic interaction energy-informed Lane detection Network). The approach considers predicted lanes as moving zero-contours on the flexibly shaped \textit{ELM} that are attracted to the ground truth guided by an elastic interaction energy-loss function (EIE loss). Our framework well integrates the global information and low-level features. The method performs well in complex lane scenarios, including those with large curvature, weak geometry features at intersections, complicated cross lanes, Y-shapes lanes, dense lanes, etc. We apply our approach on three datasets: SDLane, CULane, and TuSimple. The results demonstrate exceptional performance of our method, with the state-of-the-art results on the structurally diverse SDLane, achieving F1-score of 89.51, Recall rate of 87.50, and Precision of 91.61 with fast inference speed.
We investigate an initial-(periodic-)boundary value problem for a continuum equation, which is a model for motion of grain boundaries based on the underlying microscopic mechanisms of line defects (disconnections) and … We investigate an initial-(periodic-)boundary value problem for a continuum equation, which is a model for motion of grain boundaries based on the underlying microscopic mechanisms of line defects (disconnections) and integrated the effects of a diverse range of thermodynamic driving forces. We first prove the global-in-time existence and uniqueness of weak solution to this initial-boundary value problem in the case with positive equilibrium disconnection density parameter $ B $, and then investigate the asymptotic behavior of the solutions as $ B $ goes to zero. The main difficulties in the proof of main theorems are due to the degeneracy of $ B=0 $, a non-local term with singularity, and a non-smooth coefficient of the highest derivative associated with the gradient of the unknown. The key ingredients in the proof are the energy method, an estimate for a singular integral of the Hilbert type, and a compactness lemma.
In this paper, we establish a neural network to approximate functionals, which are maps from infinite dimensional spaces to finite dimensional spaces.The approximation error of the neural network is O(1/ … In this paper, we establish a neural network to approximate functionals, which are maps from infinite dimensional spaces to finite dimensional spaces.The approximation error of the neural network is O(1/ √ m) where m is the size of networks, which overcomes the curse of dimensionality.The key idea of the approximation is to define a Barron space of functionals.
We present a continuum model to determine the dislocation structure and energy of low angle grain boundaries in three dimensions. The equilibrium dislocation structure is obtained by minimizing the grain … We present a continuum model to determine the dislocation structure and energy of low angle grain boundaries in three dimensions. The equilibrium dislocation structure is obtained by minimizing the grain boundary energy that is associated with the constituent dislocations subject to the constraint of Frank's formula. The orientation-dependent continuous distributions of dislocation lines on grain boundaries are described conveniently using the dislocation density potential functions, whose contour lines on the grain boundaries represent the dislocations. The energy of a grain boundary is the total energy of the constituent dislocations derived from a discrete dislocation dynamics model, incorporating both the dislocation line energy and reactions of dislocations. The constrained energy minimization problem is solved by the augmented Lagrangian method and projection method. Comparisons with atomistic simulation results show that our continuum model is able to give excellent predictions of the energy and dislocation densities of both planar and curved low angle grain boundaries.
In this paper, we prove the convergence from the atomistic model to the Peierls--Nabarro (PN) model of two-dimensional bilayer system with complex lattice. We show that the displacement field of … In this paper, we prove the convergence from the atomistic model to the Peierls--Nabarro (PN) model of two-dimensional bilayer system with complex lattice. We show that the displacement field of the dislocation solution of the PN model converges to the dislocation solution of the atomistic model with second-order accuracy. The consistency of PN model and the stability of atomistic model are essential in our proof. The main idea of our approach is to use several low-degree polynomials to approximate the energy due to atomistic interactions of different groups of atoms of the complex lattice.
We study the well-posedness of a modified degenerate Cahn-Hilliard type model for surface diffusion. With degenerate phase-dependent diffusion mobility and additional stabilizing function, this model is able to give the … We study the well-posedness of a modified degenerate Cahn-Hilliard type model for surface diffusion. With degenerate phase-dependent diffusion mobility and additional stabilizing function, this model is able to give the correct sharp interface limit. We introduce a notion of weak solutions for the nonlinear model. The existence result is obtained by approximations of the proposed model with nondegenerate mobilities. We also employ this method to prove existence of weak solutions to a related model where the chemical potential contains a nonlocal term originated from self-climb of dislocations in crystalline materials.
In this paper, we establish a neural network to approximate functionals, which are maps from infinite dimensional spaces to finite dimensional spaces. The approximation error of the neural network is … In this paper, we establish a neural network to approximate functionals, which are maps from infinite dimensional spaces to finite dimensional spaces. The approximation error of the neural network is $O(1/\sqrt{m})$ where $m$ is the size of networks, which overcomes the curse of dimensionality. The key idea of the approximation is to define a Barron spectral space of functionals.
We study the 2+1 dimensional continuum model for long-range elastic interaction on stepped epitaxial surface proposed by Xu and Xiang. The long-range interaction term and the two length scales in … We study the 2+1 dimensional continuum model for long-range elastic interaction on stepped epitaxial surface proposed by Xu and Xiang. The long-range interaction term and the two length scales in this model present challenges for the PDE analysis. In this paper, we prove the existence of both the static and dynamic solutions and derive the minimum energy scaling law for this 2+1 dimensional continuum model. We show that the minimum energy surface profile is attained by surfaces with step meandering instability, which is essentially different from the energy scaling law for the 1+1 dimensional epitaxial surfaces under elastic effects, which is attained by step bunching surface profiles. We also discuss the transition from the step bunching instability to the step meandering instability in 2+1 dimensions.
We develop a continuum model for the dynamics of grain boundaries in three dimensions that incorporates the motion and reaction of the constituent dislocations. The continuum model is based on … We develop a continuum model for the dynamics of grain boundaries in three dimensions that incorporates the motion and reaction of the constituent dislocations. The continuum model is based on a simple representation of densities of curved dislocations on the grain boundary. Illposedness due to nonconvexity of the total energy is fixed by a numerical treatment based on a projection method that maintains the connectivity of the constituent dislocations. An efficient simulation method is developed, in which the critical but computationally expensive long-range interaction of dislocations is replaced by another projection formulation that maintains the constraint of equilibrium of the dislocation structure described by the Frank's formula. This continuum model is able to describe the grain boundary motion and grain rotation due to both coupling and sliding effects, to which the classical motion by mean curvature model does not apply. Comparisons with atomistic simulation results show that our continuum model is able to give excellent predictions of evolutions of low angle grain boundaries and their dislocation structures.
Pruning is a model compression method that removes redundant parameters in deep neural networks (DNNs) while maintaining accuracy. Most available filter pruning methods require complex treatments such as iterative pruning, … Pruning is a model compression method that removes redundant parameters in deep neural networks (DNNs) while maintaining accuracy. Most available filter pruning methods require complex treatments such as iterative pruning, features statistics/ranking, or additional optimization designs in the training process. In this paper, we propose a simple and effective regularization strategy from a new perspective of evolution of features, which we call feature flow regularization (FFR), for improving structured sparsity and filter pruning in DNNs. Specifically, FFR imposes controls on the gradient and curvature of feature flow along the neural network, which implicitly increases the sparsity of the parameters. The principle behind FFR is that coherent and smooth evolution of features will lead to an efficient network that avoids redundant parameters. The high structured sparsity obtained from FFR enables us to prune filters effectively. Experiments with VGGNets, ResNets on CIFAR-10/100, and Tiny ImageNet datasets demonstrate that FFR can significantly improve both unstructured and structured sparsity. Our pruning results in terms of reduction of parameters and FLOPs are comparable to or even better than those of state-of-the-art pruning methods.
Transition metal dichalcogenides layered nano-crystals are emerging as promising candidates for next-generation optoelectronic and quantum devices. In such systems, the interaction between excitonic states and atomic vibrations is crucial for … Transition metal dichalcogenides layered nano-crystals are emerging as promising candidates for next-generation optoelectronic and quantum devices. In such systems, the interaction between excitonic states and atomic vibrations is crucial for many fundamental properties, such as carrier mobilities, quantum coherence loss, and heat dissipation. In particular, to fully exploit their valley-selective excitations, one has to understand the many-body exciton physics of zone-edge states. So far, theoretical and experimental studies have mainly focused on the exciton-phonon dynamics in high-energy direct excitons involving zone-center phonons. Here, we use ultrafast electron diffraction and ab initio calculations to investigate the many-body structural dynamics following nearly-resonant excitation of low-energy indirect excitons in MoS2. By exploiting the large momentum carried by scattered electrons, we identify the excitation of in-plane K- and Q-phonon modes with E^' symmetry as key for the stabilization of indirect excitons generated via near-infrared light at 1.55 eV, and we shed light on the role of phonon anharmonicity and the ensuing structural evolution of the MoS2 crystal lattice. Our results highlight the strong selectivity of phononic excitations directly associated with the specific indirect-exciton nature of the wavelength-dependent electronic transitions triggered in the system.
Buoyant shear layers are encountered in many engineering and environmental applications and have been studied by researchers in the context of experiments and modeling for decades. Often, these flows have … Buoyant shear layers are encountered in many engineering and environmental applications and have been studied by researchers in the context of experiments and modeling for decades. Often, these flows have high Reynolds and Richardson numbers, and this leads to significant/intractable space-time resolution requirements for DNS or LES modeling. On the other hand, many of the important physical mechanisms in these systems, such as stress anisotropy, wake stabilization, and regime transition, inherently render eddy viscosity-based RANS modeling inappropriate. Accordingly, we pursue second-moment closure (SMC), i.e., full Reynolds stress/flux/variance modeling, for moderate Reynolds number non-stratified and stratified shear layers for which DNS is possible. A range of sub-model complexity is pursued for the diffusion of stresses, density fluxes and variance, pressure strain and scrambling, and dissipation. These sub-models are evaluated in terms of how well they are represented by DNS in comparison to the exact Reynolds averaged terms, and how well they impact the accuracy of the full RANS closure. For the non-stratified case, the SMC model predicts the shear layer growth rate and Reynolds shear stress profiles accurately. Stress anisotropy and budgets are captured only qualitatively. Comparing DNS of exact and modeled terms, inconsistencies in model performance and assumptions are observed, including inaccurate prediction of individual statistics, non-negligible pressure diffusion, and dissipation anisotropy. For the stratified case, shear layer and gradient Richardson number growth rates, and stress, flux, and variance decay rates, are captured with less accuracy than corresponding flow parameters in the non-stratified case. These studies lead to several recommendations for model improvement.
We present a continuum model to determine the dislocation structure and energy of low angle grain boundaries in three dimensions. The equilibrium dislocation structure is obtained by minimizing the grain … We present a continuum model to determine the dislocation structure and energy of low angle grain boundaries in three dimensions. The equilibrium dislocation structure is obtained by minimizing the grain boundary energy that is associated with the constituent dislocations subject to the constraint of Frank's formula. The orientation-dependent continuous distributions of dislocation lines on grain boundaries are described conveniently using the dislocation density potential functions, whose contour lines on the grain boundaries represent the dislocations. The energy of a grain boundary is the total energy of the constituent dislocations derived from discrete dislocation dynamics model, incorporating both the dislocation line energy and reactions of dislocations. The constrained energy minimization problem is solved by the augmented Lagrangian method and projection method. Comparisons with atomistic simulation results show that our continuum model is able to give excellent predictions of the energy and dislocation densities of both planar and curved low angle grain boundaries.
The optical resonance problem is similar to but different from the time-steady Schr\"odinger equation to the point that eigenfunctions in resonance problems are exponentially growing. We introduce the perfectly-matched-layer method … The optical resonance problem is similar to but different from the time-steady Schr\"odinger equation to the point that eigenfunctions in resonance problems are exponentially growing. We introduce the perfectly-matched-layer method and the complex stretching technique to transform eigenfunctions from exponential growth to exponential decay. Accordingly, we construct a Hamiltonian operator to calculate eigenstates of optical resonance systems. We successfully apply our method to calculate the eigenvalues for whispering-gallery modes and the results perfectly agree with existing theory that is developed only for regularly shaped cavities. We also apply the method to investigate the mode evolution near exceptional points---a special phenomenon that only happens in non-Hermitian systems. The presenting method is applicable to optical resonance systems with arbitrary dielectric distributions.
The optical resonance problem is similar to but different from time-steady Schrodinger equation. One big challenge is that the eigenfunctions in resonance problem is exponentially growing. We give physical explanation … The optical resonance problem is similar to but different from time-steady Schrodinger equation. One big challenge is that the eigenfunctions in resonance problem is exponentially growing. We give physical explanation to this boundary condition and introduce perfectly matched layer (PML) method to transform eigenfunctions from exponential-growth to exponential-decay. Based on the complex stretching technique, we construct a non-Hermitian Hamiltonian for the optical resonance problem. We successfully validate the effectiveness of the Hamiltonian by calculate its eigenvalues in the circular cavity and compare with the analytical results. We also use the proposed Hamiltonian to investigate the mode evolution around exceptional points in the quad-cosine cavity.
In this paper, we perform mathematical validation of the Peierls--Nabarro (PN) models, which are multiscale models of dislocations that incorporate the detailed dislocation core structure. We focus on the static … In this paper, we perform mathematical validation of the Peierls--Nabarro (PN) models, which are multiscale models of dislocations that incorporate the detailed dislocation core structure. We focus on the static and dynamic PN models of an edge dislocation. In a PN model, the total energy includes the elastic energy in the two half-space continua and a nonlinear potential energy across the slip plane, which is always infinite. We rigorously establish the relationship between the PN model in the full space and the reduced problem on the slip plane in terms of both governing equations and energy variations. The shear displacement jump is determined only by the reduced problem on the slip plane while the displacement fields in the two half spaces are determined by linear elasticity. We establish the existence and sharp regularities of classical solutions in Hilbert space. For both the reduced problem and the full PN model, we prove that a static solution is a global minimizer in perturbed sense. We also show that there is a unique classical, global in time solution of the dynamic PN model.
Deep learning techniques have shown their success in medical image segmentation since they are easy to manipulate and robust to various types of datasets. The commonly used loss functions in … Deep learning techniques have shown their success in medical image segmentation since they are easy to manipulate and robust to various types of datasets. The commonly used loss functions in the deep segmentation task are pixel-wise loss functions. This results in a bottleneck for these models to achieve high precision for complicated structures in biomedical images. For example, the predicted small blood vessels in retinal images are often disconnected or even missed under the supervision of the pixel-wise losses. This paper addresses this problem by introducing a long-range elastic interaction-based training strategy. In this strategy, convolutional neural network (CNN) learns the target region under the guidance of the elastic interaction energy between the boundary of the predicted region and that of the actual object. Under the supervision of the proposed loss, the boundary of the predicted region is attracted strongly by the object boundary and tends to stay connected. Experimental results show that our method is able to achieve considerable improvements compared to commonly used pixel-wise loss functions (cross entropy and dice Loss) and other recent loss functions on three retinal vessel segmentation datasets, DRIVE, STARE and CHASEDB1.
A continuum model of the two dimensional low angle grain boundary motion and the dislocation structure evolution on the grain boundaries has been developed in Ref. [48]. The model is … A continuum model of the two dimensional low angle grain boundary motion and the dislocation structure evolution on the grain boundaries has been developed in Ref. [48]. The model is based on the motion and reaction of the constituent dislocations of the grain boundaries. The long-range elastic interaction between dislocations is included in the continuum model, and it maintains a stable dislocation structure described by the Frank's formula for grain boundaries. In this paper, we develop a new continuum model for the coupling and sliding motions of grain boundaries that avoids the time-consuming calculation of the long-range elastic interaction. In this model, the long-range elastic interaction is replaced by a constraint of the Frank's formula. The constrained evolution problem in our new continuum model is further solved by using the projection method. Effects of the coupling and sliding motions in our new continuum model and relationship with the classical motion by curvature model are discussed. The continuum model is validated by comparisons with discrete dislocation dynamics model and the early continuum model [48] in which the long-range dislocation interaction is explicitly calculated.
High entropy alloys (HEAs) are single phase crystals that consist of random solid solutions of multiple elements in approximately equal proportions. This class of novel materials have exhibited superb mechanical … High entropy alloys (HEAs) are single phase crystals that consist of random solid solutions of multiple elements in approximately equal proportions. This class of novel materials have exhibited superb mechanical properties, such as high strength combined with other desired features. The strength of crystalline materials is associated with the motion of dislocations. In this paper, we derive a stochastic continuum model based on the Peierls-Nabarro framework for inter-layer dislocations in a bilayer HEA from an atomistic model that incorporates the atomic level randomness. We use asymptotic analysis and limit theorem in the convergence from the atomistic model to the continuum model. The total energy in the continuum model consists of a stochastic elastic energy in the two layers, and a stochastic misfit energy that accounts for the inter-layer nonlinear interaction. The obtained continuum model can be considered as a stochastic generalization of the classical, deterministic Peierls-Nabarro model for the dislocation core and related properties. This derivation also validates the stochastic model adopted by Zhang et al. (Acta Mater. 166, 424-434, 2019).
Related DatabasesWeb of Science You must be logged in with an active subscription to view this.Article DataHistorySubmitted: 30 July 2019Accepted: 25 February 2020Published online: 06 May 2020Keywordsgrain boundary, triple junction, … Related DatabasesWeb of Science You must be logged in with an active subscription to view this.Article DataHistorySubmitted: 30 July 2019Accepted: 25 February 2020Published online: 06 May 2020Keywordsgrain boundary, triple junction, disconnection, grain growth, variational Onsager principleAMS Subject Headings35Q74, 74K30, 74E15, 74P10, 49N10, 49S05Publication DataISSN (print): 0036-1399ISSN (online): 1095-712XPublisher: Society for Industrial and Applied MathematicsCODEN: smjmap
As a crucial scheme to accelerate the deep neural network (DNN) training, distributed stochastic gradient descent (DSGD) is widely adopted in many real-world applications. In most distributed deep learning (DL) … As a crucial scheme to accelerate the deep neural network (DNN) training, distributed stochastic gradient descent (DSGD) is widely adopted in many real-world applications. In most distributed deep learning (DL) frameworks, DSGD is implemented with Ring-AllReduce architecture (Ring-SGD) and uses a computation-communication overlap strategy to address the overhead of the massive communications required by DSGD. However, we observe that although $O(1)$ gradients are needed to be communicated per worker in Ring-SGD, the $O(n)$ handshakes required by Ring-SGD limits its usage when training with many workers or in high latency network. In this paper, we propose Shuffle-Exchange SGD (SESGD) to solve the dilemma of Ring-SGD. In the cluster of 16 workers with 0.1ms Ethernet latency, SESGD can accelerate the DNN training to $1.7 \times$ without losing model accuracy. Moreover, the process can be accelerated up to $5\times$ in high latency networks (5ms).
A continuum model of the two dimensional low angle grain boundary motion and the dislocation structure evolution on the grain boundaries has been developed in [L. Zhang and Y. Xiang, … A continuum model of the two dimensional low angle grain boundary motion and the dislocation structure evolution on the grain boundaries has been developed in [L. Zhang and Y. Xiang, J. Mech. Phys. Solids, 117 (2018), pp. 157--178]. The model is based on the motion and reaction of the constituent dislocations of the grain boundaries. The long-range elastic interaction between dislocations is included in the continuum model, and it maintains a stable dislocation structure described by Frank's formula for grain boundaries. In this paper, we develop a new continuum model for the coupling and sliding motions of grain boundaries that avoids the time-consuming calculation of the long-range elastic interaction. In this model, the long-range elastic interaction is replaced by a constraint of Frank's formula. The constrained evolution problem in our new continuum model is further solved by using the projection method. Effects of the coupling and sliding motions in our new continuum model and relationship with the classical motion by curvature model are discussed. The continuum model is validated by comparisons with discrete dislocation dynamics model and the early continuum model [L. Zhang and Y. Xiang, J. Mech. Phys. Solids, 117 (2018), pp. 157--178] in which the long-range dislocation interaction is explicitly calculated.
In this paper, we present a phase field model for the self-climb motion of prismatic dislocation loops via vacancy pipe diffusion driven by elastic interactions. This conserved dynamics model is … In this paper, we present a phase field model for the self-climb motion of prismatic dislocation loops via vacancy pipe diffusion driven by elastic interactions. This conserved dynamics model is developed under the framework of the Cahn-Hilliard equation with incorporation of the climb force on dislocations, and is based on the dislocation self-climb velocity formulation established in Ref.[1]. The phase field model has the advantage of being able to handle the topological and geometrical changes automatically during the simulations. Asymptotic analysis shows that the proposed phase field model gives the dislocation self-climb velocity accurately in the sharp interface limit. Numerical simulations of evolution, translation, coalescence and repelling of prismatic loops by self-climb show excellent agreement with discrete dislocation dynamics simulation results and the experimental observation.