Linear Algebra Proofs of Optimality for Gradient Descent
Prior reading: Gradient Descent and Backpropagation When Gradient Descent Is Provably Optimal For convex functions, gradient descent converges to the global minimum. For strongly convex functions, it converges exponentially fast. The proofs are clean linear algebra. The Convex Case A function $f$ is convex if for all $x, y$: $$f(\lambda x + (1-\lambda)y) \leq \lambda f(x) + (1-\lambda) f(y)$$ This means every local minimum is a global minimum. Gradient descent can't get stuck. ...