This book studies mathematical theories of machine learning. The first
part of the book explores the optimality and adaptivity of choosing step
sizes of gradient descent for escaping strict saddle points in
non-convex optimization problems. In the second part, the authors
propose algorithms to find local minima in nonconvex optimization and to
obtain global minima in some degree from the Newton Second Law without
friction. In the third part, the authors study the problem of subspace
clustering with noisy and missing data, which is a problem
well-motivated by practical applications data subject to stochastic
Gaussian noise and/or incomplete data with uniformly missing entries. In
the last part, the authors introduce an novel VAR model with Elastic-Net
regularization and its equivalent Bayesian model allowing for both a
stable sparsity and a group selection.