最小二乘法的数学解释

  • Post author:
  • Post category:其他

我们做出如下假设:

y

(

i

)

=

θ

x

(

i

)

+

ϵ

(

i

)

y^{(i)}=\theta^\top x^{(i)} + \epsilon^{(i)}

y(i)=θx(i)+ϵ(i)
其中

ϵ

(

i

)

N

(

0

,

σ

2

)

\epsilon^{(i)} \sim N(0, \sigma^2)

ϵ(i)N(0,σ2),代表unmodeled effects和random noises
亦即

P

(

ϵ

(

i

)

)

=

1

2

π

σ

exp

(

(

ϵ

(

i

)

)

2

2

σ

2

)

P(\epsilon^{(i)}) = \dfrac{1}{\sqrt{2\pi}\sigma} \exp \left(-\dfrac{(\epsilon^{(i)})^2}{2\sigma^2} \right)

P(ϵ(i))=2π
σ
1
exp(2σ2(ϵ(i))2)

并且

ϵ

(

i

)

\epsilon^{(i)}

ϵ(i) 是独立同分布 IID(Independent and Identically Distribution)

这些假设意味着:

P

(

y

(

i

)

x

(

i

)

;

θ

)

=

1

2

π

σ

exp

(

(

y

(

i

)

θ

x

(

i

)

)

2

2

σ

2

)

P(y^{(i)} | x^{(i)} ; \theta) = \dfrac{1}{\sqrt{2\pi}\sigma} \exp \left(-\dfrac{(y^{(i)}-\theta^\top x^{(i)})^2}{2\sigma^2} \right)

P(y(i)x(i);θ)=2π
σ
1
exp(2σ2(y(i)θx(i))2)

使用极大似然估计MLE (Maximum Likelihood Estimation)

L

(

θ

)

L(\theta)

L(θ) 表示 likelihood of

θ

\theta

θ

L

(

θ

)

=

P

(

y

x

;

θ

)

=

i

=

1

m

P

(

y

(

i

)

x

(

i

)

;

θ

)

=

i

=

1

m

1

2

π

σ

exp

(

(

y

(

i

)

θ

x

(

i

)

)

2

2

σ

2

)

L(\theta) = P(\vec y | \vec x ; \theta) = \prod_{i=1}^m P(y^{(i)} | x^{(i)} ; \theta) \\ = \prod_{i=1}^m\frac{1}{\sqrt{2\pi} \sigma} \exp \left(-\dfrac{(y^{(i)}-\theta^\top x^{(i)})^2}{2\sigma^2} \right)

L(θ)=P(y
x
;θ)=
i=1mP(y(i)x(i);θ)=i=1m2π
σ
1
exp(2σ2(y(i)θx(i))2)

l

(

θ

)

l(\theta)

l(θ) 表示 log likelihood

l

(

θ

)

=

log

L

(

θ

)

=

log

i

=

1

m

1

2

π

σ

exp

(

(

y

(

i

)

θ

x

(

i

)

)

2

2

σ

2

)

=

m

log

1

2

π

1

2

σ

2

i

=

1

m

(

y

(

i

)

θ

x

(

i

)

)

2

\begin{aligned} l(\theta) & = \log L(\theta) \\ & = \log \prod_{i=1}^m\frac{1}{\sqrt{2\pi} \sigma} \exp \left(-\dfrac{(y^{(i)}-\theta^\top x^{(i)})^2}{2\sigma^2} \right) \\ & = m \log \frac{1}{\sqrt{2\pi}} – \frac{1}{2\sigma^2} \sum_{i=1}^m (y^{(i)}-\theta^\top x^{(i)})^2 \end{aligned}

l(θ)=logL(θ)=logi=1m2π
σ
1
exp(2σ2(y(i)θx(i))2)
=mlog2π
1
2σ21i=1m(y(i)θx(i))2

为了使

L

(

θ

)

L(\theta)

L(θ) 尽可能大,需使

i

=

1

m

(

y

(

i

)

θ

x

(

i

)

)

2

\sum_{i=1}^m (y^{(i)}-\theta^\top x^{(i)})^2

i=1m(y(i)θx(i))2 尽可能小


版权声明:本文为w112348原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。