Optimization and Optimal Control in Machine Learning



Journal Title

Journal ISSN

Volume Title



The objectives of this study are the analysis and design of efficient computational methods for deep learning, focusing on numerical optimization schemes. Optimization is about computing the best element concerning some criterion from a set of available alternatives (the search space). If we define this criterion to be a function that assigns costs to different realizations of the elements in the search space, we can formulate the search for the best element as the minimization of this “cost function.” A prominent approach to compute this minimizer is gradient descent strategies. However, it is established that these methods typically require a prohibitive amount of iterations until one reaches an acceptable minimizer, and they do not scale with the number of unknowns. Methods that consider second-order derivative information are in many cases more effective but also more involved. Initially, we considered a non-linear least squares problem as a simplified example for a learning problem. We developed an iterative, matrix-free Newton--Krylov method. We tested our method for the MNIST dataset. Subsequently, we considered an optimal control formulation for training deep neural networks motivated by a partial differential equation interpretation of the forward propagation. We discretized the forward propagation using different numerical schemes available in the literature. In particular, we studied explicit Euler schemes with antisymmetric weight matrices and a Verlet method to solve the associated Hamiltonian system. In our future work, we will explore these implementations in the context of evaluating adjoint operators as they arise in the optimization setting.