机器学习中的优化 Optimization Chapter 1 Mathematics Background(数学基础)

大道之行,天下为公。这篇文章主要讲述机器学习中的优化 Optimization Chapter 1 Mathematics Background(数学基础)相关的知识,希望能为你提供帮助。
1. Notation【机器学习中的优化 Optimization Chapter 1 Mathematics Background(数学基础)】||$x$||: Euclidean norm ($l_2$ norm),
$$
||x||^2 = x^Tx=\\sum_i=1^dxi^2
$$
$\\mathbbR
+ =x\\in \\mathbbR: x\\geq 0 $
2. Cauchy-Schwarz inequalityLet $u,v\\in \\mathbbR^d$, then
$$
|u^Tv|\\leq ||u||\\cdot ||v||
$$
3. Spectral NormLet $A$ be amatrix, $A\\in \\mathbbR^m\\times d$, then
$$
||A|| = \\max_||v||=1||Av||
$$
where $v\\in \\mathbbR^d$. Besides,
$$
||Av|| \\leq ||A||\\cdot ||v||
$$
Furthermore, we have triangle inequality:
$$
||A+B||\\leq ||A||+||B||
$$
4. Mean Value Theorem$\\large\\textTheorem\\ 1.3$: Let $a< b$ be real numbers, $h:[a,b]\\rightarrow \\mathbbR$ be a continuous function which is differential on $(a,b)$. Then there exists $c\\in (a,b)$ such that:
$$
h(c) = \\frach(b)-h(a)b-a
$$
5. Differentiability$\\large\\textDefinition\\ 1.5$: $f: dom(f)\\rightarrow \\mathbbR^m, dom(f)\\rightarrow \\mathbbR^d. f \\text is called differentiable atx \\text in the interior ofdom(f) \\text if there exists a matrixA\\in \\mathbbR^m\\times d \\text and an error function r:\\mathbbR^d \\rightarrow \\mathbbR^m\\text defined in some neighborhood of0\\in \\mathbbR^d \\textsuch that for all y \\text in some neighborhood of x,$
$$
f(y) = f(x)+A(y-x)+r(y-x)
$$
where
$$
\\limv\\rightarrow 0 \\frac||r(v)||||v||=0
$$
Besides, $Df(x)
ij = \\frac\\partial f_i\\partial x_j(x)$
6. Convex Set$\\large\\textTheorem\\ 1.9$: $f:dom(f)\\rightarrow \\mathbbR^m \\text be differentiable, X\\in dom(f) \\text a convex set, B\\in \\mathbbR_+. \\text IfX\\in dom(f) \\text is non-empty and open the following statements are equivalent:$
$$
\\beginalign
& (i)\\large||f(x)-f(y)||\\leq B||x-y||(\\large\\bf\\textB-Lipschitz) \\
& (ii)\\large||Df(x)|| \\leq B, \\forall x\\in X
\\endalign
$$
7. Epigraph$$
\\beginequation
\\bfepi(f)=(x,\\alpha)\\in \\mathbbR^d+1:x\\in dom(f), \\alpha\\geq f(x)
\\endequation
$$
$\\large\\textLemma\\ 1.11$: $f\\text is a convex function if and only if\\bf epi(f) \\text is a convex set$
$\\large\\textLemma\\ 1.15$: $f\\text is convex if and only ifdom(f) \\text is convex and $:
$$
\\beginequation
f(y)\\geq f(x)+\\nabla f(x)^T(y-x)
\\endequation
$$
$\\textholds for all x,y\\in dom(f)$.
$\\large\\bf\\textLemma\\ 1.16$: $\\textSuppose that dom(f) \\text is open andf\\text is differentiable. Thenf\\text is convex if and only ifdom(f) \\text is convex and$
$$
\\beginequation
\\large(\\nabla f(y)-\\nabla f(x))^T(y-x)\\geq 0
\\endequation
$$
$\\textholds for all x,y\\in dom(f)$.
$\\bfProof: \\textIf f \\text is convex,from first-order convex property we have:$
$$
\\beginalign
f(y)\\geq f(x)+\\nabla f(x)^T(y-x)\\
f(x)\\geq f(y)+\\nabla f(y)^T(x-y)
\\endalign
$$
$\\textfor allx,y\\in dom(f). \\text After adding up these two inequalities, then we get:$
$$
\\nabla f(x)^T(y-x)+\\nabla f(y)^T(x-y) = (\\nabla f(y)-\\nabla f(x))^T(x-y)\\leq 0
$$
8. Second-Order Characterization of Convexity$\\large\\textLemma\\ 1.17$: $f\\text is convex if and only ifdom(f) \\text is convex and $:
$$
\\beginequation
\\large\\nabla^2 f(x)\\geq 0
\\endequation
$$
$\\textholds for all x,y\\in dom(f) \\text Positive Semidefinite:M\\geq 0, x^TMx\\geq 0 \\text for all x\\neq 0$.
$\\large\\textLemma\\ 1.25$: $f:dom(f)\\rightarrow \\mathbbR\\text be strictly convex. Then f \\text has just at most one global minimum.$
$\\large\\textLemma\\ 1.27$: $\\textSuppose f:dom(f)\\rightarrow \\mathbbR\\text is convex and differentiable. X\\in dom(f) \\text be a convex set. Pointx^\\in X \\text is a\\bf minimizer \\text off \\text if and only if$
$$
\\large\\nabla f(x^
)^T(x-x^*)\\geq 0, \\forall x\\in X
$$
9. log-sum-exp function$\\bf \\large Statement: \\textlog-sum-exp function is a convex function$
$\\bf Proof$: $f(x) = \\log(\\sum_i e^x_i)$
$$
\\beginalign
f(\\theta x+(1-\\theta)y)& =\\log(\\sum_i e^\\theta x_i+(1-\\theta)y_i)\\
& =\\log(\\sum_i e^\\theta x_ie^(1-\\theta)y_i)\\
& =\\log(\\sum_i ui^\\thetavi^(1-\\theta))
\\endalign
$$
$\\textBy Inequality:$
$$
\\beginalign
\\sum_i x_iy_i\\leq (\\sum_i |x_i|^p)^1/p(\\sum_i |y_i|^q)^1/q
\\endalign
$$
$\\textWhere 1/p+1/q=1. \\text Therefore,$
$$
\\beginalign
\\log(\\sum_i ui^\\thetavi^(1-\\theta)) & \\leq \\log[(\\sum_i |u_i|^\\theta\\cdot 1/\\theta)^\\theta(\\sum_i |v_i|^(1-\\theta)\\cdot(1/(1-\\theta)))^1-\\theta]\\
& = \\theta\\log(\\sum_i u_i)+(1-\\theta)\\log(\\sum_i v_i)
\\endalign
$$

    推荐阅读