Author Description

Login to generate an author description

Ask a Question About This Mathematician

All published works (23)

Neural networks are rapidly gaining interest in nonlinear system identification due to the model's ability to capture complex input-output relations directly from data. However, despite the flexibility of the approach, … Neural networks are rapidly gaining interest in nonlinear system identification due to the model's ability to capture complex input-output relations directly from data. However, despite the flexibility of the approach, there are still concerns about the safety of these models in this context, as well as the need for large amounts of potentially expensive data. Aluminum electrolysis is a highly nonlinear production process, and most of the data must be sampled manually, making the sampling process expensive and infrequent. In the case of infrequent measurements of state variables, the accuracy and open-loop stability of the long-term predictions become highly important. Standard neural networks struggle to provide stable long-term predictions with limited training data. In this work, we investigate the effect of combining concatenated skip-connections and the sparsity-promoting <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\ell_{1}$</tex> regularization on the open-loop stability and accuracy of forecasts with short, medium, and long prediction horizons. The case study is conducted on a high-dimensional and nonlinear simulator representing an aluminum electrolysis cell's mass and energy balance. The proposed model structure contains concatenated skip connections from the input layer and all intermittent layers to the output layer, referred to as InputSkip. <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\ell_{1}$</tex> regularized InputSkip is called sparse InputSkip. The results show that sparse InputSkip outperforms dense and sparse standard feedforward neural networks and dense InputSkip regarding open-loop stability and long-term predictive accuracy. The results are significant when models are trained on datasets of all sizes (small, medium, and large training sets) and for all prediction horizons (short, medium, and long prediction horizons.)
The capability to adapt compliance by varying muscle stiffness is crucial for dexterous manipulation skills in humans. Incorporating compliance in robot motor control is crucial for enabling real-world force interaction … The capability to adapt compliance by varying muscle stiffness is crucial for dexterous manipulation skills in humans. Incorporating compliance in robot motor control is crucial for enabling real-world force interaction tasks with human-like dexterity. In this study, we introduce a novel approach, we call "deep Model Predictive Variable Impedance Controller (MPVIC)" for compliant robotic manipulation, which combines Variable Impedance Control with Model Predictive Control (MPC). The method involves learning a generalized Cartesian impedance model of a robot manipulator through an exploration strategy to maximize information gain. Within the MPC framework, this learned model is utilized to adapt the impedance parameters of a low-level variable impedance controller, thereby achieving the desired compliance behavior for various manipulation tasks without requiring retraining or finetuning. We assess the efficacy of the proposed deep MPVIC approach using a Franka Emika Panda robotic manipulator in simulations and real-world experiments involving diverse manipulation tasks. Comparative evaluations against model-free and model-based reinforcement learning approaches in variable impedance control are conducted, considering aspects such as transferability between tasks and performance. The results demonstrate the effectiveness and potential of the presented approach for advancing robotic manipulation capabilities.
Deep neural networks have become very popular in modeling complex nonlinear processes due to their extraordinary ability to fit arbitrary nonlinear functions from data with minimal expert intervention. However, they … Deep neural networks have become very popular in modeling complex nonlinear processes due to their extraordinary ability to fit arbitrary nonlinear functions from data with minimal expert intervention. However, they are almost always overparameterized and challenging to interpret due to their internal complexity. Furthermore, the optimization process to find the learned model parameters can be unstable due to the process getting stuck in local minima. In this work, we demonstrate the value of sparse regularization techniques to significantly reduce the model complexity. We demonstrate this for the case of an aluminium extraction process, which is highly nonlinear system with many interrelated subprocesses. We trained a densely connected deep neural network to model the process and then compared the effects of sparsity promoting l1 regularization on generalizability, interpretability, and training stability. We found that the regularization significantly reduces model complexity compared to a corresponding dense neural network. We argue that this makes the model more interpretable, and show that training an ensemble of sparse neural networks with different parameter initializations often converges to similar model structures with similar learned input features. Furthermore, the empirical study shows that the resulting sparse models generalize better from small training sets than their dense counterparts.
Neural networks are rapidly gaining interest in nonlinear system identification due to the model's ability to capture complex input-output relations directly from data. However, despite the flexibility of the approach, … Neural networks are rapidly gaining interest in nonlinear system identification due to the model's ability to capture complex input-output relations directly from data. However, despite the flexibility of the approach, there are still concerns about the safety of these models in this context, as well as the need for large amounts of potentially expensive data. Aluminum electrolysis is a highly nonlinear production process, and most of the data must be sampled manually, making the sampling process expensive and infrequent. In the case of infrequent measurements of state variables, the accuracy and open-loop stability of the long-term predictions become highly important. Standard neural networks struggle to provide stable long-term predictions with limited training data. In this work, we investigate the effect of combining concatenated skip-connections and the sparsity-promoting $\ell_1$ regularization on the open-loop stability and accuracy of forecasts with short, medium, and long prediction horizons. The case study is conducted on a high-dimensional and nonlinear simulator representing an aluminum electrolysis cell's mass and energy balance. The proposed model structure contains concatenated skip connections from the input layer and all intermittent layers to the output layer, referred to as InputSkip. $\ell_1$ regularized InputSkip is called sparse InputSkip. The results show that sparse InputSkip outperforms dense and sparse standard feedforward neural networks and dense InputSkip regarding open-loop stability and long-term predictive accuracy. The results are significant when models are trained on datasets of all sizes (small, medium, and large training sets) and for all prediction horizons (short, medium, and long prediction horizons.)
The exploding research interest for neural networks in modeling nonlinear dynamical systems is largely explained by the networks' capacity to model complex input-output relations directly from data. However, they typically … The exploding research interest for neural networks in modeling nonlinear dynamical systems is largely explained by the networks' capacity to model complex input-output relations directly from data. However, they typically need vast training data before they can be put to any good use. The data generation process for dynamical systems can be an expensive endeavor both in terms of time and resources. Active learning addresses this shortcoming by acquiring the most informative data, thereby reducing the need to collect enormous datasets. What makes the current work unique is integrating the deep active learning framework into nonlinear system identification. We formulate a general static deep active learning acquisition problem for nonlinear system identification. This is enabled by exploring system dynamics locally in different regions of the input space to obtain a simulated dataset covering the broader input space. This simulated dataset can be used in a static deep active learning acquisition scheme referred to as global explorations. The global exploration acquires a batch of initial states corresponding to the most informative state-action trajectories according to a batch acquisition function. The local exploration solves an optimal control problem, finding the control trajectory that maximizes some measure of information. After a batch of informative initial states is acquired, a new round of local explorations from the initial states in the batch is conducted to obtain a set of corresponding control trajectories that are to be applied on the system dynamics to get data from the system. Information measures used in the acquisition scheme are derived from the predictive variance of an ensemble of neural networks. The novel method outperforms standard data acquisition methods used for system identification of nonlinear dynamical systems in the case study performed on simulated data.
Artificial neural networks have a broad array of applications today due to their high degree of flexibility and ability to model nonlinear functions from data. However, the trustworthiness of neural … Artificial neural networks have a broad array of applications today due to their high degree of flexibility and ability to model nonlinear functions from data. However, the trustworthiness of neural networks is limited due to their black-box nature, their poor ability to generalize from small datasets, and their inconsistent convergence during training. Aluminum electrolysis is a complex nonlinear process with many interrelated sub-processes. Artificial neural networks can potentially be well suited for modeling the aluminum electrolysis process, but the safety-critical nature of this process requires trustworthy models. In this work, sparse neural networks are trained to model the system dynamics of an aluminum electrolysis simulator. The sparse model structure has a significantly reduction in model complexity compared to a corresponding dense neural network. We argue that this makes the model more interpretable. Furthermore, the empirical study shows that the sparse models generalize better from small training sets than dense neural networks. Moreover, training an ensemble of sparse neural networks with different parameter initializations show that the models converge to similar model structures with similar learned input features.
The capability to adapt compliance by varying muscle stiffness is crucial for dexterous manipulation skills in humans. Incorporating compliance in robot motor control is crucial to performing real-world force interaction … The capability to adapt compliance by varying muscle stiffness is crucial for dexterous manipulation skills in humans. Incorporating compliance in robot motor control is crucial to performing real-world force interaction tasks with human-level dexterity. This work presents a Deep Model Predictive Variable Impedance Controller for compliant robotic manipulation which combines Variable Impedance Control with Model Predictive Control (MPC). A generalized Cartesian impedance model of a robot manipulator is learned using an exploration strategy maximizing the information gain. This model is used within an MPC framework to adapt the impedance parameters of a low-level variable impedance controller to achieve the desired compliance behavior for different manipulation tasks without any retraining or finetuning. The deep Model Predictive Variable Impedance Control approach is evaluated using a Franka Emika Panda robotic manipulator operating on different manipulation tasks in simulations and real experiments. The proposed approach was compared with model-free and model-based reinforcement approaches in variable impedance control for transferability between tasks and performance.
With the ever-increasing availability of data, there has been an explosion of interest in applying modern machine learning methods to fields such as modeling and control. However, despite the flexibility … With the ever-increasing availability of data, there has been an explosion of interest in applying modern machine learning methods to fields such as modeling and control. However, despite the flexibility and surprising accuracy of such black-box models, it remains difficult to trust them. Recent efforts to combine the two approaches aim to develop flexible models that nonetheless generalize well; a paradigm we call Hybrid Analysis and modeling (HAM). In this work we investigate the Corrective Source Term Approach (CoSTA), which uses a data-driven model to correct a misspecified physics-based model. This enables us to develop models that make accurate predictions even when the underlying physics of the problem is not well understood. We apply CoSTA to model the Hall-H\'eroult process in an aluminum electrolysis cell. We demonstrate that the method improves both accuracy and predictive stability, yielding an overall more trustworthy model.
Deep neural networks have become very popular in modeling complex nonlinear processes due to their extraordinary ability to fit arbitrary nonlinear functions from data with minimal expert intervention. However, they … Deep neural networks have become very popular in modeling complex nonlinear processes due to their extraordinary ability to fit arbitrary nonlinear functions from data with minimal expert intervention. However, they are almost always overparameterized and challenging to interpret due to their internal complexity. Furthermore, the optimization process to find the learned model parameters can be unstable due to the process getting stuck in local minima. In this work, we demonstrate the value of sparse regularization techniques to significantly reduce the model complexity. We demonstrate this for the case of an aluminium extraction process, which is highly nonlinear system with many interrelated subprocesses. We trained a densely connected deep neural network to model the process and then compared the effects of sparsity promoting l1 regularization on generalizability, interpretability, and training stability. We found that the regularization significantly reduces model complexity compared to a corresponding dense neural network. We argue that this makes the model more interpretable, and show that training an ensemble of sparse neural networks with different parameter initializations often converges to similar model structures with similar learned input features. Furthermore, the empirical study shows that the resulting sparse models generalize better from small training sets than their dense counterparts.
A Python module for rapid prototyping of constraint-based closed-loop inverse kinematics controllers is presented. The module allows for combining multiple tasks that are resolved with a quadratic, nonlinear, or model … A Python module for rapid prototyping of constraint-based closed-loop inverse kinematics controllers is presented. The module allows for combining multiple tasks that are resolved with a quadratic, nonlinear, or model predictive optimization-based approach, or a set-based task-priority inverse kinematics approach. The optimization-based approaches are described in relation to the set-based task approach, and a novel multidimensional "in tangent cone" function is presented for set-based tasks. A ROS component is provided, and the controllers are tested with matching a pose using either transformation matrices or dual quaternions, trajectory tracking while remaining in a bounded workspace, maximizing manipulability during a tracking task, tracking an input marker's position, and force compliance.
In this paper, a method is presented for lowering the energy consumption and/or increasing the speed of a standard manipulator spray painting a surface. The approach is based on the … In this paper, a method is presented for lowering the energy consumption and/or increasing the speed of a standard manipulator spray painting a surface. The approach is based on the observation that a small angle between the spray direction and the surface normal does not affect the quality of the paint job. Recent results in set-based kinematic control are utilized to develop a switched control system, where this angle is defined as a set-based task with a maximum allowed limit. Four different set-based methods are implemented and tested on a UR5 manipulator from Universal Robots. Experimental results verify the correctness of the method, and demonstrate that the set-based approaches can substantially lower the paint time and energy consumption compared to the current standard solution.
Several techniques were proposed to model the Piecewise linear (PWL) functions, including convex combination, incremental and multiple choice methods. Although the incremental method was proved to be very efficient, the … Several techniques were proposed to model the Piecewise linear (PWL) functions, including convex combination, incremental and multiple choice methods. Although the incremental method was proved to be very efficient, the attention of the authors in this field was drawn to the convex combination method, especially for discontinuous PWL functions. In this work, we modify the incremental method to make it suitable for discontinuous functions. The numerical results indicate that the modified incremental method could have considerable reduction in computational time, mainly due to the reduction in the number of the required variables. Further, we propose a tighter formulation for optimization problems over separable univariate PWL functions with binary indicators by using the incremental method.
In this article the model predictive path following controller and the model predictive trajectory tracking controller are compared for a robotic manipulator. We consider both the Runge-Kutta and collocation based … In this article the model predictive path following controller and the model predictive trajectory tracking controller are compared for a robotic manipulator. We consider both the Runge-Kutta and collocation based discretization. We show how path-following can stop at obstructions in a way trajectory tracking cannot. We give simulations for a two-link manipulator, and discuss the real-time viability of our implementations.
In this article we show how the model predictive path following controller allows robotic manipulators to stop at obstructions in a way that model predictive trajectory tracking controllers cannot. We … In this article we show how the model predictive path following controller allows robotic manipulators to stop at obstructions in a way that model predictive trajectory tracking controllers cannot. We present both controllers as applied to robotic manipulators, simulations for a two-link manipulator using an interior point solver, consider discretization of the optimal control problem using collocation or Runge-Kutta, and discuss the real-time viability of our implementation of the model predictive path following controller.
In this article we show how the model predictive path following controller allows robotic manipulators to stop at obstructions in a way that model predictive trajectory tracking controllers cannot. We … In this article we show how the model predictive path following controller allows robotic manipulators to stop at obstructions in a way that model predictive trajectory tracking controllers cannot. We present both controllers as applied to robotic manipulators, simulations for a two-link manipulator using an interior point solver, consider discretization of the optimal control problem using collocation or Runge-Kutta, and discuss the real-time viability of our implementation of the model predictive path following controller.
In this paper, a method is presented for lowering the energy consumption and/or increasing the speed of a standard manipulator spray painting a surface. The approach is based on the … In this paper, a method is presented for lowering the energy consumption and/or increasing the speed of a standard manipulator spray painting a surface. The approach is based on the observation that a small angle between the spray direction and the surface normal does not affect the quality of the paint job. Recent results in set-based kinematic control are utilized to develop a switched control system, where this angle is defined as a set-based task with a maximum allowed limit. Four different set-based methods are implemented and tested on a UR5 manipulator from Universal Robots. Experimental results verify the correctness of the method, and demonstrate that the set-based approaches can substantially lower the paint time and energy consumption compared to the current standard solution.
In this paper, a method is presented for lowering the energy consumption and/or increasing the speed of a standard manipulator spray painting a surface. The approach is based on the … In this paper, a method is presented for lowering the energy consumption and/or increasing the speed of a standard manipulator spray painting a surface. The approach is based on the observation that a small angle between the spray direction and the surface normal does not affect the quality of the paint job. Recent results in set-based kinematic control are utilized to develop a switched control system, where this angle is defined as a set-based task with a maximum allowed limit. Four different set-based methods are implemented and tested on a UR5 manipulator from Universal Robots. Experimental results verify the correctness of the method, and demonstrate that the set-based approaches can substantially lower the paint time and energy consumption compared to the current standard solution.
In this work the stability properties of a partial differential equation (PDE) with statedependent parameters and asymmetric boundary conditions are investigated. The PDE describes the temperature distribution inside foodstuff, but … In this work the stability properties of a partial differential equation (PDE) with statedependent parameters and asymmetric boundary conditions are investigated. The PDE describes the temperature distribution inside foodstuff, but can also hold for other applications and phenomena. We show that the PDE converges to a stationary solution given by (fixed) boundary conditions which explicitly diverge from each other. Numerical simulations illustrate the results.
In this paper an approach for optimal boundary control of a parabolic partial differential equation (PDE) is presented. The parabolic PDE is the heat equation for thermal conduction. A technical … In this paper an approach for optimal boundary control of a parabolic partial differential equation (PDE) is presented. The parabolic PDE is the heat equation for thermal conduction. A technical application for this is the freezing of fish in a vertical plate freezer. As it is a dominant phenomenon in the process of freezing, the latent heat of fusion is included in the model. The aim of the optimization is to freeze the interior of a fish block below -18 °C in a predefined time horizon with an energy consumption that is as low as possible assuming that this corresponds to high freezing temperatures.
In this paper we show how a continuous set of orientations can be represented as a positive definiteness test on a given matrix. When this continuous set is restricted by … In this paper we show how a continuous set of orientations can be represented as a positive definiteness test on a given matrix. When this continuous set is restricted by the maximum allowed orientation error in some or all directions it is shown that the requirement for an orientation to satisfy these restrictions is equivalent to positive definiteness for a certain matrix. The problem of finding the optimal orientation that satisfies these restrictions is hence transformed into an optimisation problem on the Riemannian manifold of linearly constrained symmetric positive definite matrices. Thus, the problem of finding the optimal orientation can be solved as a standard optimisation problem with the constraints written in the form of linear matrix inequalities or barrier functions. Linear matrix inequalities have been extensively studied in the optimisation communities and good and efficient algorithms are available.
This paper extends earlier results on (optimal) control of nominally stable models in computational fluid dynamics by means of reduced-order models on state-space form. We consider stabilization of a computational … This paper extends earlier results on (optimal) control of nominally stable models in computational fluid dynamics by means of reduced-order models on state-space form. We consider stabilization of a computational fluid dynamics model of an unstable system. A stabilizing controller is found based on optimal control design for the reduced-order model and then applied to the full model, where it is shown through simulations to stabilize the system
Previous work on stabilization of compressor surge is extended to include control of the angular velocity of the compressor. First, a low order centrifugal compressor model is presented where the … Previous work on stabilization of compressor surge is extended to include control of the angular velocity of the compressor. First, a low order centrifugal compressor model is presented where the states are mass flow, pressure rise and rotational speed of the spool. Energy transfer considerations are used to develop a compressor characteristic. In order to stabilize equilibriums to the left of the surge line, a close coupled valve is used in series with the compressor. Controllers for the valve pressure drop and spool speed are derived. Semi-global exponential stability is proved using a Lyapunov argument.

Commonly Cited References

The following problem is considered: given a matrix A in ${\bf R}^{m\times n}$, (m rows and n columns), a vector b in ${\bf R}^m$, and $\epsilon > 0$, compute a … The following problem is considered: given a matrix A in ${\bf R}^{m\times n}$, (m rows and n columns), a vector b in ${\bf R}^m$, and $\epsilon > 0$, compute a vector x satisfying $\|Ax - b\|_{2} \leq \epsilon $ if such exists, such that x has the fewest number of non-zero entries over all such vectors. It is shown that the problem is NP-hard, but that the well-known greedy heuristic is good in that it computes a solution with at most $\lceil 18 \operatorname{Opt}(\epsilon/2)\|{\bf A}^{+}\|_{2}^{2} \ln ({\| b \|_{2}/\epsilon})\rceil$ non-zero entries, where $\operatorname{Opt}(\epsilon/2)$ is the optimum number of nonzero entries at error $\epsilon/2$, ${\textbf{A}}$ is the matrix obtained by normalizing each column of A with respect to the $L_{2}$ norm, and $\textbf{A}^{+}$ is its pseudo-inverse.
In this work, we introduce, justify and demonstrate the Corrective Source Term Approach (CoSTA)-a novel approach to Hybrid Analysis and Modeling (HAM). The objective of HAM is to combine physics-based … In this work, we introduce, justify and demonstrate the Corrective Source Term Approach (CoSTA)-a novel approach to Hybrid Analysis and Modeling (HAM). The objective of HAM is to combine physics-based modeling (PBM) and data-driven modeling (DDM) to create generalizable, trustworthy, accurate, computationally efficient and self-evolving models. CoSTA achieves this objective by augmenting the governing equation of a PBM model with a corrective source term generated using a deep neural network. In a series of numerical experiments on one-dimensional heat diffusion, CoSTA is found to outperform comparable DDM and PBM models in terms of accuracy - often reducing predictive errors by several orders of magnitude - while also generalizing better than pure DDM. Due to its flexible but solid theoretical foundation, CoSTA provides a modular framework for leveraging novel developments within both PBM and DDM. Its theoretical foundation also ensures that CoSTA can be used to model any system governed by (deterministic) partial differential equations. Moreover, CoSTA facilitates interpretation of the DNN-generated source term within the context of PBM, which results in improved explainability of the DNN. These factors make CoSTA a potential door-opener for data-driven techniques to enter high-stakes applications previously reserved for pure PBM.
Model-based reinforcement learning (MBRL) is widely seen as having the potential to be significantly more sample efficient than model-free RL. However, research in model-based RL has not been very standardized. … Model-based reinforcement learning (MBRL) is widely seen as having the potential to be significantly more sample efficient than model-free RL. However, research in model-based RL has not been very standardized. It is fairly common for authors to experiment with self-designed environments, and there are several separate lines of research, which are sometimes closed-sourced or not reproducible. Accordingly, it is an open question how these various existing MBRL algorithms perform relative to each other. To facilitate research in MBRL, in this paper we gather a wide collection of MBRL algorithms and propose over 18 benchmarking environments specially designed for MBRL. We benchmark these algorithms with unified problem settings, including noisy environments. Beyond cataloguing performance, we explore and unify the underlying algorithmic differences across MBRL algorithms. We characterize three key research challenges for future MBRL research: the dynamics bottleneck, the planning horizon dilemma, and the early-termination dilemma. Finally, to maximally facilitate future research on MBRL, we open-source our benchmark in http://www.cs.toronto.edu/~tingwuwang/mbrl.html.
Recent applications of machine learning, in particular deep learning, motivate the need to address the generalizability of the statistical inference approaches in physical sciences. In this letter, we introduce a … Recent applications of machine learning, in particular deep learning, motivate the need to address the generalizability of the statistical inference approaches in physical sciences. In this letter, we introduce a modular physics guided machine learning framework to improve the accuracy of such data-driven predictive engines. The chief idea in our approach is to augment the knowledge of the simplified theories with the underlying learning process. To emphasise on their physical importance, our architecture consists of adding certain features at intermediate layers rather than in the input layer. To demonstrate our approach, we select a canonical airfoil aerodynamic problem with the enhancement of the potential flow theory. We include features obtained by a panel method that can be computed efficiently for an unseen configuration in our training procedure. By addressing the generalizability concerns, our results suggest that the proposed feature enhancement approach can be effectively used in many scientific machine learning applications, especially for the systems where we can use a theoretical, empirical, or simplified model to guide the learning module.
Neural network pruning techniques can reduce the parameter counts of trained networks by over 90%, decreasing storage requirements and improving computational performance of inference without compromising accuracy. However, contemporary experience … Neural network pruning techniques can reduce the parameter counts of trained networks by over 90%, decreasing storage requirements and improving computational performance of inference without compromising accuracy. However, contemporary experience is that the sparse architectures produced by pruning are difficult to train from the start, which would similarly improve training performance. We find that a standard pruning technique naturally uncovers subnetworks whose initializations made them capable of training effectively. Based on these results, we articulate the "lottery ticket hypothesis:" dense, randomly-initialized, feed-forward networks contain subnetworks ("winning tickets") that - when trained in isolation - reach test accuracy comparable to the original network in a similar number of iterations. The winning tickets we find have won the initialization lottery: their connections have initial weights that make training particularly effective. We present an algorithm to identify winning tickets and a series of experiments that support the lottery ticket hypothesis and the importance of these fortuitous initializations. We consistently find winning tickets that are less than 10-20% of the size of several fully-connected and convolutional feed-forward architectures for MNIST and CIFAR10. Above this size, the winning tickets that we find learn faster than the original network and reach higher test accuracy.
The growing energy and performance costs of deep learning have driven the community to reduce the size of neural networks by selectively pruning components. Similarly to their biological counterparts, sparse … The growing energy and performance costs of deep learning have driven the community to reduce the size of neural networks by selectively pruning components. Similarly to their biological counterparts, sparse networks generalize just as well, if not better than, the original dense networks. Sparsity can reduce the memory footprint of regular networks to fit mobile devices, as well as shorten training time for ever growing networks. In this paper, we survey prior work on sparsity in deep learning and provide an extensive tutorial of sparsification for both inference and training. We describe approaches to remove and add elements of neural networks, different training strategies to achieve model sparsity, and mechanisms to exploit sparsity in practice. Our work distills ideas from more than 300 research papers and provides guidance to practitioners who wish to utilize sparsity today, as well as to researchers whose goal is to push the frontier forward. We include the necessary background on mathematical methods in sparsification, describe phenomena such as early structure adaptation, the intricate relations between sparsity and the training process, and show techniques for achieving acceleration on real hardware. We also define a metric of pruned parameter efficiency that could serve as a baseline for comparison of different sparse networks. We close by speculating on how sparsity can improve future workloads and outline major open problems in the field.
We consider the tracking of geometric paths in output spaces of nonlinear systems subject to input and state constraints without pre-specified timing requirements, commonly referred to as constrained output path-following … We consider the tracking of geometric paths in output spaces of nonlinear systems subject to input and state constraints without pre-specified timing requirements, commonly referred to as constrained output path-following problems. Specifically, we propose a predictive control approach to constrained path-following problems with and without velocity assignments. We present sufficient convergence conditions based on terminal regions and end penalties. Furthermore, we analyze the geometric nature of constrained output path-following problems and thereby provide insight into the computation of suitable terminal control laws and terminal regions. We draw upon an example from robotics to illustrate our findings.
Reinforcement learning algorithms have shown great success in solving different problems ranging from playing video games to robotics. However, they struggle to solve delicate robotic problems, especially those involving contact … Reinforcement learning algorithms have shown great success in solving different problems ranging from playing video games to robotics. However, they struggle to solve delicate robotic problems, especially those involving contact interactions. Though in principle a policy directly outputting joint torques should be able to learn to perform these tasks, in practice we see that it has difficulty to robustly solve the problem without any given structure in the action space. In this letter, we investigate how the choice of action space can give robust performance in presence of contact uncertainties. We propose learning a policy giving as output impedance and desired position in joint space and compare the performance of that approach to torque and position control under different contact uncertainties. Furthermore, we propose an additional reward term designed to regularize these variable impedance control policies, giving them interpretability and facilitating their transfer to real systems. We present extensive experiments in simulation of both floating and fixed-base systems in tasks involving contact uncertainties, as well as results for running the learned policies on a real system (accompanying videos can be seen here: https://youtu.be/AQuuQ-h4dBM).
We investigate the complexity of deep neural networks (DNN) that represent piecewise linear (PWL) functions. In particular, we study the number of linear regions, i.e. pieces, that a PWL function … We investigate the complexity of deep neural networks (DNN) that represent piecewise linear (PWL) functions. In particular, we study the number of linear regions, i.e. pieces, that a PWL function represented by a DNN can attain, both theoretically and empirically. We present (i) tighter upper and lower bounds for the maximum number of linear regions on rectifier networks, which are exact for inputs of dimension one; (ii) a first upper bound for multi-layer maxout networks; and (iii) a first method to perform exact enumeration or counting of the number of regions by modeling the DNN with a mixed-integer linear formulation. These bounds come from leveraging the dimension of the space defining each linear region. The results also indicate that a deep rectifier network can only have more linear regions than every shallow counterpart with same number of neurons if that number exceeds the dimension of the input.
With the ever-increasing availability of data, there has been an explosion of interest in applying modern machine learning methods to fields such as modeling and control. However, despite the flexibility … With the ever-increasing availability of data, there has been an explosion of interest in applying modern machine learning methods to fields such as modeling and control. However, despite the flexibility and surprising accuracy of such black-box models, it remains difficult to trust them. Recent efforts to combine the two approaches aim to develop flexible models that nonetheless generalize well; a paradigm we call Hybrid Analysis and modeling (HAM). In this work we investigate the Corrective Source Term Approach (CoSTA), which uses a data-driven model to correct a misspecified physics-based model. This enables us to develop models that make accurate predictions even when the underlying physics of the problem is not well understood. We apply CoSTA to model the Hall-H\'eroult process in an aluminum electrolysis cell. We demonstrate that the method improves both accuracy and predictive stability, yielding an overall more trustworthy model.
This paper explores the complexity of deep feedforward networks with linear pre-synaptic couplings and rectified linear activations. This is a contribution to the growing body of work contrasting the representational … This paper explores the complexity of deep feedforward networks with linear pre-synaptic couplings and rectified linear activations. This is a contribution to the growing body of work contrasting the representational power of deep and shallow network architectures. In particular, we offer a framework for comparing deep and shallow models that belong to the family of piecewise linear functions based on computational geometry. We look at a deep rectifier multi-layer perceptron (MLP) with linear outputs units and compare it with a single layer version of the model. In the asymptotic regime, when the number of inputs stays constant, if the shallow model has $kn$ hidden units and $n_0$ inputs, then the number of linear regions is $O(k^{n_0}n^{n_0})$. For a $k$ layer model with $n$ hidden units on each layer it is $\Omega(\left\lfloor {n}/{n_0}\right\rfloor^{k-1}n^{n_0})$. The number $\left\lfloor{n}/{n_0}\right\rfloor^{k-1}$ grows faster than $k^{n_0}$ when $n$ tends to infinity or when $k$ tends to infinity and $n \geq 2n_0$. Additionally, even when $k$ is small, if we restrict $n$ to be $2n_0$, we can show that a deep model has considerably more linear regions that a shallow one. We consider this as a first step towards understanding the complexity of these models and specifically towards providing suitable mathematical tools for future analysis.
Many robotic applications, such as milling, gluing, or high precision measurements, require the precise following of a predefined geometric path. We investigate the real-time feasible implementation of model predictive path-following … Many robotic applications, such as milling, gluing, or high precision measurements, require the precise following of a predefined geometric path. We investigate the real-time feasible implementation of model predictive path-following control for an industrial robot. We consider constrained output path following with and without reference speed assignment. Finally, we present the results of an implementation of the proposed model predictive path-following controller on a KUKA LWR IV robot.
Abstract Most modeling approaches lie in either of the two categories: physics‐based or data‐driven. Recently, a third approach which is a combination of these deterministic and statistical models is emerging … Abstract Most modeling approaches lie in either of the two categories: physics‐based or data‐driven. Recently, a third approach which is a combination of these deterministic and statistical models is emerging for scientific applications. To leverage these developments, our aim in this perspective paper is centered around exploring numerous principle concepts to address the challenges of (i) trustworthiness and generalizability in developing data‐driven models to shed light on understanding the fundamental trade‐offs in their accuracy and efficiency and (ii) seamless integration of interface learning and multifidelity coupling approaches that transfer and represent information between different entities, particularly when different scales are governed by different physics, each operating on a different level of abstraction. Addressing these challenges could enable the revolution of digital twin technologies for scientific and engineering applications.
This paper proposes a sparse Bayesian treatment of deep neural networks (DNNs) for system identification. Although DNNs show impressive approximation ability in various fields, several challenges still exist for system … This paper proposes a sparse Bayesian treatment of deep neural networks (DNNs) for system identification. Although DNNs show impressive approximation ability in various fields, several challenges still exist for system identification problems. First, DNNs are known to be too complex that they can easily overfit the training data. Second, the selection of the input regressors for system identification is nontrivial. Third, uncertainty quantification of the model parameters and predictions are necessary. The proposed Bayesian approach offers a principled way to alleviate the above challenges by marginal likelihood/model evidence approximation and structured group sparsity-inducing priors construction. The identification algorithm is derived as an iterative regularised optimisation procedure that can be solved as efficiently as training typical DNNs. Remarkably, an efficient and recursive Hessian calculation method for each layer of DNNs is developed, turning the intractable training/optimisation process into a tractable one. Furthermore, a practical calculation approach based on the Monte-Carlo integration method is derived to quantify the uncertainty of the parameters and predictions. The effectiveness of the proposed Bayesian approach is demonstrated on several linear and nonlinear system identification benchmarks by achieving good and competitive simulation accuracy. The code to reproduce the experimental results is open-sourced and available online.
Along with the great success of deep neural networks, there is also growing concern about their black-box nature. The interpretability issue affects people's trust on deep learning systems. It is … Along with the great success of deep neural networks, there is also growing concern about their black-box nature. The interpretability issue affects people's trust on deep learning systems. It is also related to many ethical problems, e.g., algorithmic discrimination. Moreover, interpretability is a desired property for deep networks to become powerful tools in other research fields, e.g., drug discovery and genomics. In this survey, we conduct a comprehensive review of the neural network interpretability research. We first clarify the definition of interpretability as it has been used in many different contexts. Then we elaborate on the importance of interpretability and propose a novel taxonomy organized along three dimensions: type of engagement (passive vs. active interpretation approaches), the type of explanation, and the focus (from local to global interpretability). This taxonomy provides a meaningful 3D view of distribution of papers from the relevant literature as two of the dimensions are not simply categorical but allow ordinal subcategories. Finally, we summarize the existing interpretability evaluation methods and suggest possible research directions inspired by our new taxonomy.
Industrial robot manipulators are playing a significant role in modern manufacturing industries. Though peg-in-hole assembly is a common industrial task that has been extensively researched, safely solving complex, high-precision assembly … Industrial robot manipulators are playing a significant role in modern manufacturing industries. Though peg-in-hole assembly is a common industrial task that has been extensively researched, safely solving complex, high-precision assembly in an unstructured environment remains an open problem. Reinforcement-learning (RL) methods have proven to be successful in autonomously solving manipulation tasks. However, RL is still not widely adopted in real robotic systems because working with real hardware entails additional challenges, especially when using position-controlled manipulators. The main contribution of this work is a learning-based method to solve peg-in-hole tasks with hole-position uncertainty. We propose the use of an off-policy, model-free reinforcement-learning method, and we bootstraped the training speed by using several transfer-learning techniques (sim2real) and domain randomization. Our proposed learning framework for position-controlled robots was extensively evaluated in contact-rich insertion tasks in a variety of environments.
We study a generic minimization problem with separable nonconvex piecewise linear costs, showing that the linear programming (LP) relaxation of three textbook mixed-integer programming formulations each approximates the cost function … We study a generic minimization problem with separable nonconvex piecewise linear costs, showing that the linear programming (LP) relaxation of three textbook mixed-integer programming formulations each approximates the cost function by its lower convex envelope. We also show a relationship between this result and classical Lagrangian duality theory.
A numerical integrator is derived for a class of models that describe the attitude dynamics of a rigid body in the presence of an attitude dependent potential. The numerical integrator … A numerical integrator is derived for a class of models that describe the attitude dynamics of a rigid body in the presence of an attitude dependent potential. The numerical integrator is obtained from a discrete variational principle, and exhibits excellent geometric conservation properties. In particular, by performing computations at the level of the Lie algebra, and updating the solution using the matrix exponential, the attitude automatically evolves on the rotation group embedded in the space of matrices. The geometric conservation properties of the numerical integrator imply long time numerical stability. We apply this variational integrator to the uncontrolled 3D pendulum, that is a rigid asymmetric body supported at a frictionless pivot acting under the influence of uniform gravity. Interesting dynamics of the 3D pendulum are exposed
We study the modeling of nonconvex piecewise-linear functions as mixed-integer programming (MIP) problems. We review several new and existing MIP formulations for continuous piecewise-linear functions with special attention paid to … We study the modeling of nonconvex piecewise-linear functions as mixed-integer programming (MIP) problems. We review several new and existing MIP formulations for continuous piecewise-linear functions with special attention paid to multivariate nonseparable functions. We compare these formulations with respect to their theoretical properties and their relative computational performance. In addition, we study the extension of these formulations to lower semicontinuous piecewise-linear functions.
Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly … Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers - 8Ɨ deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.
The ability to discover physical laws and governing equations from data is one of humankind's greatest intellectual achievements. A quantitative understanding of dynamic constraints and balances in nature has facilitated … The ability to discover physical laws and governing equations from data is one of humankind's greatest intellectual achievements. A quantitative understanding of dynamic constraints and balances in nature has facilitated rapid development of knowledge and enabled advanced technological achievements, including aircraft, combustion engines, satellites, and electrical power. In this work, we combine sparsity-promoting techniques and machine learning with nonlinear dynamical systems to discover governing physical equations from measurement data. The only assumption about the structure of the model is that there are only a few important terms that govern the dynamics, so that the equations are sparse in the space of possible functions; this assumption holds for many physical systems. In particular, we use sparse regression to determine the fewest terms in the dynamic governing equations required to accurately represent the data. The resulting models are parsimonious, balancing model complexity with descriptive ability while avoiding overfitting. We demonstrate the algorithm on a wide range of problems, from simple canonical systems, including linear and nonlinear oscillators and the chaotic Lorenz system, to the fluid vortex shedding behind an obstacle. The fluid example illustrates the ability of this method to discover the underlying dynamics of a system that took experts in the community nearly 30 years to resolve. We also show that this method generalizes to parameterized, time-varying, or externally forced systems.
Convex optimization problems arise frequently in many different fields. A comprehensive introduction to the subject, this book shows in detail how such problems can be solved numerically with great efficiency. … Convex optimization problems arise frequently in many different fields. A comprehensive introduction to the subject, this book shows in detail how such problems can be solved numerically with great efficiency. The focus is on recognizing convex optimization problems and then finding the most appropriate technique for solving them. The text contains many worked examples and homework exercises and will appeal to students, researchers and practitioners in fields such as engineering, computer science, mathematics, statistics, finance, and economics.
Model pruning seeks to induce sparsity in a deep neural network's various connection matrices, thereby reducing the number of nonzero-valued parameters in the model. Recent reports (Han et al., 2015; … Model pruning seeks to induce sparsity in a deep neural network's various connection matrices, thereby reducing the number of nonzero-valued parameters in the model. Recent reports (Han et al., 2015; Narang et al., 2017) prune deep networks at the cost of only a marginal loss in accuracy and achieve a sizable reduction in model size. This hints at the possibility that the baseline models in these experiments are perhaps severely over-parameterized at the outset and a viable alternative for model compression might be to simply reduce the number of hidden units while maintaining the model's dense connection structure, exposing a similar trade-off in model size and accuracy. We investigate these two distinct paths for model compression within the context of energy-efficient inference in resource-constrained environments and propose a new gradual pruning technique that is simple and straightforward to apply across a variety of models/datasets with minimal tuning and can be seamlessly incorporated within the training process. We compare the accuracy of large, but pruned models (large-sparse) and their smaller, but dense (small-dense) counterparts with identical memory footprint. Across a broad range of neural network architectures (deep CNNs, stacked LSTM, and seq2seq LSTM models), we find large-sparse models to consistently outperform small-dense models and achieve up to 10x reduction in number of non-zero parameters with minimal loss in accuracy.
Neural network training relies on our ability to find "good" minimizers of highly non-convex loss functions. It is well-known that certain network architecture designs (e.g., skip connections) produce loss functions … Neural network training relies on our ability to find "good" minimizers of highly non-convex loss functions. It is well-known that certain network architecture designs (e.g., skip connections) produce loss functions that train easier, and well-chosen training parameters (batch size, learning rate, optimizer) produce minimizers that generalize better. However, the reasons for these differences, and their effects on the underlying loss landscape, are not well understood. In this paper, we explore the structure of neural loss functions, and the effect of loss landscapes on generalization, using a range of visualization methods. First, we introduce a simple "filter normalization" method that helps us visualize loss function curvature and make meaningful side-by-side comparisons between loss functions. Then, using a variety of visualizations, we explore how network architecture affects the loss landscape, and how training parameters affect the shape of minimizers.
Efficient exploration is an unsolved problem in Reinforcement Learning which is usually addressed by reactively rewarding the agent for fortuitously encountering novel situations. This paper introduces an efficient active exploration … Efficient exploration is an unsolved problem in Reinforcement Learning which is usually addressed by reactively rewarding the agent for fortuitously encountering novel situations. This paper introduces an efficient active exploration algorithm, Model-Based Active eXploration (MAX), which uses an ensemble of forward models to plan to observe novel events. This is carried out by optimizing agent behaviour with respect to a measure of novelty derived from the Bayesian perspective of exploration, which is estimated using the disagreement between the futures predicted by the ensemble members. We show empirically that in semi-random discrete environments where directed exploration is critical to make progress, MAX is at least an order of magnitude more efficient than strong baselines. MAX scales to high-dimensional continuous environments where it builds task-agnostic models that can be used for any downstream task.
Reinforcement learning (RL) is a powerful tool to perform data-driven optimal control without relying on a model of the system. However, RL struggles to provide hard guarantees on the behavior … Reinforcement learning (RL) is a powerful tool to perform data-driven optimal control without relying on a model of the system. However, RL struggles to provide hard guarantees on the behavior of the resulting control scheme. In contrast, nonlinear model predictive control (NMPC) and economic NMPC (ENMPC) are standard tools for the closed-loop optimal control of complex systems with constraints and limitations, and benefit from a rich theory to assess their closed-loop behavior. Unfortunately, the performance of (E)NMPC hinges on the quality of the model underlying the control scheme. In this paper, we show that an (E)NMPC scheme can be tuned to deliver the optimal policy of the real system even when using a wrong model. This result also holds for real systems having stochastic dynamics. This entails that ENMPC can be used as a new type of function approximator within RL. Furthermore, we investigate our results in the context of ENMPC and formally connect them to the concept of dissipativity, which is central for the ENMPC stability. Finally, we detail how these results can be used to deploy classic RL tools for tuning (E)NMPC schemes. We apply these tools on both, a classical linear MPC setting and a standard nonlinear example, from the ENMPC literature.
We study the complexity of functions computable by deep feedforward neural networks with piecewise linear activations in terms of the symmetries and the number of linear regions that they have. … We study the complexity of functions computable by deep feedforward neural networks with piecewise linear activations in terms of the symmetries and the number of linear regions that they have. Deep networks are able to sequentially map portions of each layer's input-space to the same output. In this way, deep models compute functions that react equally to complicated patterns of different inputs. The compositional structure of these functions enables them to re-use pieces of computation exponentially often in terms of the network's depth. This paper investigates the complexity of such compositional maps and contributes new theoretical results regarding the advantage of depth for neural networks with piecewise linear activation functions. In particular, our analysis is not specific to a single family of models, and as an example, we employ it for rectifier and maxout networks. We improve complexity bounds from pre-existing work and investigate the behavior of units in higher layers.
Efficient exploration is a long-standing problem in sensorimotor learning. Major advances have been demonstrated in noise-free, non-stochastic domains such as video games and simulation. However, most of these formulations either … Efficient exploration is a long-standing problem in sensorimotor learning. Major advances have been demonstrated in noise-free, non-stochastic domains such as video games and simulation. However, most of these formulations either get stuck in environments with stochastic dynamics or are too inefficient to be scalable to real robotics setups. In this paper, we propose a formulation for exploration inspired by the work in active learning literature. Specifically, we train an ensemble of dynamics models and incentivize the agent to explore such that the disagreement of those ensembles is maximized. This allows the agent to learn skills by exploring in a self-supervised manner without any external reward. Notably, we further leverage the disagreement objective to optimize the agent's policy in a differentiable manner, without using reinforcement learning, which results in a sample-efficient exploration. We demonstrate the efficacy of this formulation across a variety of benchmark environments including stochastic-Atari, Mujoco and Unity. Finally, we implement our differentiable exploration on a real robot which learns to interact with objects completely from scratch. Project videos and code are at https://pathak22.github.io/exploration-by-disagreement/
Large neural networks are very successful in various tasks. However, with limited data, the generalization capabilities of deep neural networks are also very limited. In this paper, we empirically start … Large neural networks are very successful in various tasks. However, with limited data, the generalization capabilities of deep neural networks are also very limited. In this paper, we empirically start showing that intrinsically sparse neural networks with adaptive sparse connectivity, which by design have a strict parameter budget during the training phase, have better generalization capabilities than their fully-connected counterparts. Besides this, we propose a new technique to train these sparse models by combining the Sparse Evolutionary Training (SET) procedure with neurons pruning. Operated on MultiLayer Perceptron (MLP) and tested on 15 datasets, our proposed technique zeros out around 50% of the hidden neurons during training, while having a linear number of parameters to optimize with respect to the number of neurons. The results show a competitive classification and generalization performance.
Recent work has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those … Recent work has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output. In this paper, we embrace this observation and introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion. Whereas traditional convolutional networks with L layers have L connections-one between each layer and its subsequent layer-our network has L(L+1)/2 direct connections. For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers. DenseNets have several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters. We evaluate our proposed architecture on four highly competitive object recognition benchmark tasks (CIFAR-10, CIFAR-100, SVHN, and ImageNet). DenseNets obtain significant improvements over the state-of-the-art on most of them, whilst requiring less memory and computation to achieve high performance. Code and pre-trained models are available at https://github.com/liuzhuang13/DenseNet.
In many real-world scenarios, rewards extrinsic to the agent are extremely sparse, or absent altogether. In such cases, curiosity can serve as an intrinsic reward signal to enable the agent … In many real-world scenarios, rewards extrinsic to the agent are extremely sparse, or absent altogether. In such cases, curiosity can serve as an intrinsic reward signal to enable the agent to explore its environment and learn skills that might be useful later in its life. We formulate curiosity as the error in an agent's ability to predict the consequence of its own actions in a visual feature space learned by a self-supervised inverse dynamics model. Our formulation scales to high-dimensional continuous state spaces like images, bypasses the difficulties of directly predicting pixels, and, critically, ignores the aspects of the environment that cannot affect the agent. The proposed approach is evaluated in two environments: VizDoom and Super Mario Bros. Three broad settings are investigated: 1) sparse extrinsic reward; 2) exploration with no extrinsic reward; and 3) generalization to unseen scenarios (e.g. new levels of the same game).
Designing reinforcement learning (RL) problems that can produce delicate and precise manipulation policies requires careful choice of the reward function, state, and action spaces. Much prior work on applying RL … Designing reinforcement learning (RL) problems that can produce delicate and precise manipulation policies requires careful choice of the reward function, state, and action spaces. Much prior work on applying RL to manipulation tasks has defined the action space in terms of direct joint torques or reference positions for a joint-space proportional derivative (PD) controller. In practice, it is often possible to add additional structure by taking advantage of model-based controllers that support both accurate positioning and control of the dynamic response of the manipulator. In this paper, we evaluate how the choice of action space for dynamic manipulation tasks affects the sample complexity as well as the final quality of learned policies. We compare learning performance across three tasks (peg insertion, hammering, and pushing), four action spaces (torque, joint PD, inverse dynamics, and impedance control), and using two modern reinforcement learning algorithms (Proximal Policy optimization and Soft Actor-Critic). Our results lend support to the hypothesis that learning references for a task-space impedance controller significantly reduces the number of samples needed to achieve good performance across all tasks and algorithms.
Reinforcement Learning (RL) of contact-rich manipulation tasks has yielded impressive results in recent years. While many studies in RL focus on varying the observation space or reward model, few efforts … Reinforcement Learning (RL) of contact-rich manipulation tasks has yielded impressive results in recent years. While many studies in RL focus on varying the observation space or reward model, few efforts focused on the choice of action space (e.g. joint or end-effector space, position, velocity, etc.). However, studies in robot motion control indicate that choosing an action space that conforms to the characteristics of the task can simplify exploration and improve robustness to disturbances. This paper studies the effect of different action spaces in deep RL and advocates for variable impedance control in end-effector space (VICES) as an advantageous action space for constrained and contact-rich tasks. We evaluate multiple action spaces on three prototypical manipulation tasks: Path Following (task with no contact), Door Opening (task with kinematic constraints), and Surface Wiping (task with continuous contact). We show that VICES improves sample efficiency, maintains low energy consumption, and ensures safety across all three experimental setups. Further, RL policies learned with VICES can transfer across different robot models in simulation, and from simulation to real for the same robot. Further information is available at https://stanfordvl.github.io/vices.
Reinforcement Learning (RL) methods have been proven successful in solving manipulation tasks autonomously. However, RL is still not widely adopted on real robotic systems because working with real hardware entails … Reinforcement Learning (RL) methods have been proven successful in solving manipulation tasks autonomously. However, RL is still not widely adopted on real robotic systems because working with real hardware entails additional challenges, especially when using rigid position-controlled manipulators. These challenges include the need for a robust controller to avoid undesired behavior, that risk damaging the robot and its environment, and constant supervision from a human operator. The main contributions of this work are, first, we proposed a learning-based force control framework combining RL techniques with traditional force control. Within said control scheme, we implemented two different conventional approaches to achieve force control with position-controlled robots; one is a modified parallel position/force control, and the other is an admittance control. Secondly, we empirically study both control schemes when used as the action space of the RL agent. Thirdly, we developed a fail-safe mechanism for safely training an RL agent on manipulation tasks using a real rigid robot manipulator. The proposed methods are validated on simulation and a real robot, an UR3 e-series robotic arm.
Reinforcement learning (RL) has had its fair share of success in contact-rich manipulation tasks but it still lags behind in benefiting from advances in robot control theory such as impedance … Reinforcement learning (RL) has had its fair share of success in contact-rich manipulation tasks but it still lags behind in benefiting from advances in robot control theory such as impedance control and stability guarantees. Recently, the concept of variable impedance control (VIC) was adopted into RL with encouraging results. However, the more important issue of stability remains unaddressed. To clarify the challenge in stable RL, we introduce the term all-the-time-stability that unambiguously means that every possible rollout should be stability certified. Our contribution is a model-free RL method that not only adopts VIC but also achieves all-the-time-stability. Building on a recently proposed stable VIC controller as the policy parameterization, we introduce a novel policy search algorithm that is inspired by Cross-Entropy Method and inherently guarantees stability. Our experimental studies confirm the feasibility and usefulness of stability guarantee and also features, to the best of our knowledge, the first successful application of RL with all-the-time-stability on the benchmark problem of peg-in-hole.
Robots that physically interact with their surroundings, in order to accomplish some tasks or assist humans in their activities, require to exploit contact forces in a safe and proficient manner. … Robots that physically interact with their surroundings, in order to accomplish some tasks or assist humans in their activities, require to exploit contact forces in a safe and proficient manner. Impedance control is considered as a prominent approach in robotics to avoid large impact forces while operating in unstructured environments. In such environments, the conditions under which the interaction occurs may significantly vary during the task execution. This demands robots to be endowed with online adaptation capabilities to cope with sudden and unexpected changes in the environment. In this context, variable impedance control arises as a powerful tool to modulate the robot's behavior in response to variations in its surroundings. In this survey, we present the state-of-the-art of approaches devoted to variable impedance control from control and learning perspectives (separately and jointly). Moreover, we propose a new taxonomy for mechanical impedance based on variability, learning, and control. The objective of this survey is to put together the concepts and efforts that have been done so far in this field, and to describe advantages and disadvantages of each approach. The survey concludes with open issues in the field and an envisioned framework that may potentially solve them.
We introduce a sample-efficient method for learning state-dependent stiffness control policies for dexterous manipulation. The ability to control stiffness facilitates safe and reliable manipulation by providing compliance and robustness to … We introduce a sample-efficient method for learning state-dependent stiffness control policies for dexterous manipulation. The ability to control stiffness facilitates safe and reliable manipulation by providing compliance and robustness to uncertainties. Most current reinforcement learning approaches to achieve robotic manipulation have exclusively focused on position control, often due to the difficulty of learning high-dimensional stiffness control policies. This difficulty can be partially mitigated via policy guidance such as imitation learning. However, expert stiffness control demonstrations are often expensive or infeasible to record. Therefore, we present an approach to learn Stiffness Control from Augmented Position control Experiences (SCAPE) that bypasses this difficulty by transforming position control demonstrations into approximate, suboptimal stiffness control demonstrations. Then, the suboptimality of the augmented demonstrations is addressed by using complementary techniques that help the agent safely learn from both the demonstrations and reinforcement learning. By using simulation tools and experiments on a robotic testbed, we show that the proposed approach efficiently learns safe manipulation policies and outperforms learned position control policies and several other baseline learning algorithms.
Abstract This paper presents the application of the tensor product (TP)‐based model transformation approach to produce Tower CRrane (TCR) systems models. The modeling approach starts with a nonlinear model of … Abstract This paper presents the application of the tensor product (TP)‐based model transformation approach to produce Tower CRrane (TCR) systems models. The modeling approach starts with a nonlinear model of TCR systems as representative multi‐input–multi‐output controlled processes. A linear parameter‐varying model is next derived, and the modeling steps specific to TP–based model transformation are proceeded to obtain the TP model. The TP model is tested on TCR laboratory equipment in two open‐loop scenarios considering chirp signals and pseudorandom binary step signals applied to the three model inputs (control inputs). The nonlinear and TP model outputs in the two scenarios are the payload position, the cart position, and the arm angular position. The nonlinear and TP model outputs are collected, measured, and compared. The simulation results prove that the derived TP model approximately mimics the behavior of the nonlinear model; both system responses and numerical approximation errors are illustrated.
Modern, torque-controlled service robots can regulate contact forces when interacting with their environment. Model Predictive Control (MPC) is a powerful method to solve the underlying control problem, allowing to plan … Modern, torque-controlled service robots can regulate contact forces when interacting with their environment. Model Predictive Control (MPC) is a powerful method to solve the underlying control problem, allowing to plan for whole-body motions while including different constraints imposed by the robot dynamics or its environment. However, an accurate model of the robot-environment is needed to achieve a satisfying closed-loop performance. Currently, this necessity undermines the performance and generality of MPC in manipulation tasks. In this work, we combine an MPC-based whole-body controller with two adaptive schemes, derived from online system identification and adaptive control. As a result, we enable a general mobile manipulator to interact with unknown environments, without any need for re-tuning parameters or pre-modeling the interacting objects. In combination with the MPC controller, the two adaptive approaches are validated and benchmarked with a ball-balancing manipulator in door opening and object lifting tasks.
Model-based deep reinforcement learning has achieved success in various domains that require high sample efficiencies, such as Go and robotics. However, there are some remaining issues, such as planning efficient … Model-based deep reinforcement learning has achieved success in various domains that require high sample efficiencies, such as Go and robotics. However, there are some remaining issues, such as planning efficient explorations to learn more accurate dynamic models, evaluating the uncertainty of the learned models, and more rational utilization of models. To mitigate these issues, we present MEEE, a model-ensemble method that consists of optimistic exploration and weighted exploitation. During exploration, unlike prior methods directly selecting the optimal action that maximizes the expected accumulative return, our agent first generates a set of action candidates and then seeks out the optimal action that takes both expected return and future observation novelty into account. During exploitation, different discounted weights are assigned to imagined transition tuples according to their model uncertainty respectively, which will prevent model predictive error propagation in agent training. Experiments on several challenging continuous control benchmark tasks demonstrated that our approach outperforms other model-free and model-based state-of-the-art methods, especially in sample complexity.
Physical human-robot interaction can improve human ergonomics, task efficiency, and the flexibility of automation, but often requires application-specific methods to detect human state and determine robot response. At the same … Physical human-robot interaction can improve human ergonomics, task efficiency, and the flexibility of automation, but often requires application-specific methods to detect human state and determine robot response. At the same time, many potential human-robot interaction tasks involve discrete modes, such as phases of a task or multiple possible goals, where each mode has a distinct objective and human behavior. In this paper, we propose a novel method for multi-modal physical human-robot interaction that builds a Gaussian process model for human force in each mode of a collaborative task. These models are then used for Bayesian inference of the mode, and to determine robot reactions through model predictive control. This approach enables optimization of robot trajectory based on the belief of human intent, while considering robot impedance and human joint configuration, according to ergonomic- and/or task-related objectives. The proposed method reduces programming time and complexity, requiring only a low number of demonstrations (here, three per mode) and a mode-specific objective function to commission a flexible online human-robot collaboration task. We validate the method with experiments on an admittance-controlled robot, performing a collaborative assembly task with two modes where assistance is provided in full six degrees of freedom. It is shown that the developed algorithm robustly re-plans to changes in intent or robot initial position, achieving online control at 15 Hz.