Sorry in advance if this question sounds too broad or a little bit too obvious.
I know for sure that gradient descent, i.e., the update equation
$x_{k+1} = x_k – epsilon_k nabla f(x_k)$
converge to the unique minimizer of $f$ with domain $text{dom}(f) = mathbb{R}^n$ whenever $f$ is strictly or strongly convex.
However, I could not remember if it converges to a minimizer in convex functions, and how it achieves this convergence.
What is bothering me is that
-
I’ve seen some conflicting results where instead of $x_k$, an averaged sequence $hat x_{k} = frac{1}{K} sum_k x_k$ converges.
-
I’ve also seen conflicting results where the step size is decreasing $o(1/k)$ vs it is constant.
-
There is also the issue of weak vs strong convergence. I’m not sure what this means exactly.
I have know some results but they are for quadratic functions, not for convex functions in general.
Can someone chime in on what this basic result in optimization look like?