最小二乘法的数学解释

Post author:xfxia
Post published:2023年3月15日
Post category:其他

我们做出如下假设：

(

)

⊤

(

)

(

)

y^{(i)}=\theta^\top x^{(i)} + \epsilon^{(i)}

$y^{(i)} = θ^{⊤} x^{(i)} + ϵ^{(i)}$
其中

(

)

∼

(

)

\epsilon^{(i)} \sim N(0, \sigma^2)

$ϵ^{(i)} \sim N (0, σ^{2})$ ，代表unmodeled effects和random noises
亦即

(

)

exp

⁡

(

−

(

)

P(\epsilon^{(i)}) = \dfrac{1}{\sqrt{2\pi}\sigma} \exp \left(-\dfrac{(\epsilon^{(i)})^2}{2\sigma^2} \right)

$P (ϵ^{(i)}) = \frac{1}{2 π σ} exp (- \frac{( ϵ ^{(i)} ) ^{2}}{2 σ ^{2}})$
并且

(

)

\epsilon^{(i)}

$ϵ^{(i)}$ 是独立同分布 IID(Independent and Identically Distribution)

这些假设意味着：

(

)

∣

(

)

;

)

exp

⁡

(

−

(

)

−

⊤

(

)

P(y^{(i)} | x^{(i)} ; \theta) = \dfrac{1}{\sqrt{2\pi}\sigma} \exp \left(-\dfrac{(y^{(i)}-\theta^\top x^{(i)})^2}{2\sigma^2} \right)

$P (y^{(i)} ∣ x^{(i)}; θ) = \frac{1}{2 π σ} exp (- \frac{( y ^{(i)} - θ ^{⊤} x ^{(i)} ) ^{2}}{2 σ ^{2}})$
使用极大似然估计MLE (Maximum Likelihood Estimation)
设

(

)

L(\theta)

$L (θ)$ 表示 likelihood of

\theta

$θ$

(

)

(

⃗

∣

⃗

;

)

∏

(

)

∣

(

)

;

)

∏

exp

⁡

(

−

(

)

−

⊤

(

)

L(\theta) = P(\vec y | \vec x ; \theta) = \prod_{i=1}^m P(y^{(i)} | x^{(i)} ; \theta) \\ = \prod_{i=1}^m\frac{1}{\sqrt{2\pi} \sigma} \exp \left(-\dfrac{(y^{(i)}-\theta^\top x^{(i)})^2}{2\sigma^2} \right)

$L (θ) = P (y ∣ x; θ) = i = 1 \prod m P (y^{(i)} ∣ x^{(i)}; θ) = i = 1 \prod m \frac{1}{2 π σ} exp (- \frac{( y ^{(i)} - θ ^{⊤} x ^{(i)} ) ^{2}}{2 σ ^{2}})$
设

(

)

l(\theta)

$l (θ)$ 表示 log likelihood

(

)

log

⁡

(

)

log

⁡

∏

exp

⁡

(

−

(

)

−

⊤

(

)

log

⁡

−

∑

(

)

−

⊤

(

)

\begin{aligned} l(\theta) & = \log L(\theta) \\ & = \log \prod_{i=1}^m\frac{1}{\sqrt{2\pi} \sigma} \exp \left(-\dfrac{(y^{(i)}-\theta^\top x^{(i)})^2}{2\sigma^2} \right) \\ & = m \log \frac{1}{\sqrt{2\pi}} – \frac{1}{2\sigma^2} \sum_{i=1}^m (y^{(i)}-\theta^\top x^{(i)})^2 \end{aligned}

$l (θ) = lo g L (θ) = lo g i = 1 \prod m \frac{1}{2 π σ} exp (- \frac{( y ^{(i)} - θ ^{⊤} x ^{(i)} ) ^{2}}{2 σ ^{2}}) = m lo g \frac{1}{2 π} - \frac{1}{2 σ ^{2}} i = 1 \sum m (y^{(i)} - θ^{⊤} x^{(i)})^{2}$
为了使

(

)

L(\theta)

$L (θ)$ 尽可能大，需使

∑

(

)

−

⊤

(

)

\sum_{i=1}^m (y^{(i)}-\theta^\top x^{(i)})^2

$\sum_{i = 1}^{m} (y^{(i)} - θ^{⊤} x^{(i)})^{2}$ 尽可能小

原文链接：https://blog.csdn.net/w112348/article/details/115423019

你可能也喜欢