Binary logistic regression model
-
是
分类模型
,由概率分布
P(
Y
∣
X
)
P(Y|X)
P
(
Y
∣
X
)
计算,是参数化的Logistic分布
先概述一下这个模型的条件概率分布
P
(
Y
=
1
∣
x
)
=
e
x
p
(
w
⋅
x
+
b
)
1
+
e
x
p
(
w
⋅
x
+
b
)
P(Y=1|x)=\frac{exp(w\cdot{x}+b)}{1+exp(w\cdot{x}+b)}
P
(
Y
=
1
∣
x
)
=
1
+
e
x
p
(
w
⋅
x
+
b
)
e
x
p
(
w
⋅
x
+
b
)
P
(
Y
=
0
∣
x
)
=
1
1
+
e
x
p
(
w
⋅
x
+
b
)
P(Y=0|x)=\frac{1}{1+exp(w\cdot{x}+b)}
P
(
Y
=
0
∣
x
)
=
1
+
e
x
p
(
w
⋅
x
+
b
)
1
什么是一个事情的几率?
一件事情发生的概率
p
p
p
比上这件事情不发生的概率
1
−
p
1-p
1
−
p
即
p
1
−
p
\frac{p}{1-p}
1
−
p
p
那么对数几率
l
o
g
i
t
(
p
)
=
l
o
g
e
p
1
−
p
logit(p)=log_e{\frac{p}{1-p}}
l
o
g
i
t
(
p
)
=
l
o
g
e
1
−
p
p
l
o
g
i
t
(
p
)
=
l
o
g
e
p
1
−
p
=
l
o
g
i
t
(
P
(
Y
=
1
∣
x
)
)
=
l
o
g
e
(
e
x
p
(
w
⋅
x
+
b
)
)
=
w
⋅
x
+
b
logit(p)=log_e{\frac{p}{1-p}}=logit(P(Y=1|x))=log_e(exp(w\cdot{x}+b))=w\cdot{x}+b
l
o
g
i
t
(
p
)
=
l
o
g
e
1
−
p
p
=
l
o
g
i
t
(
P
(
Y
=
1
∣
x
)
)
=
l
o
g
e
(
e
x
p
(
w
⋅
x
+
b
)
)
=
w
⋅
x
+
b
所以对于
Y
=
1
Y=1
Y
=
1
的对数几率,是一个
线性函数
而这个式子
P
(
Y
=
1
∣
x
)
=
e
x
p
(
w
⋅
x
+
b
)
1
+
e
x
p
(
w
⋅
x
+
b
)
P(Y=1|x)=\frac{exp(w\cdot{x}+b)}{1+exp(w\cdot{x}+b)}
P
(
Y
=
1
∣
x
)
=
1
+
e
x
p
(
w
⋅
x
+
b
)
e
x
p
(
w
⋅
x
+
b
)
,就相当于把
w
⋅
x
+
b
w\cdot{x}+b
w
⋅
x
+
b
转化为概率,在这种情况下
w
⋅
x
+
b
w\cdot{x}+b
w
⋅
x
+
b
越接近于正无穷,概率值就越接近1
模型的参数估计
令
P
(
Y
=
1
∣
x
)
=
p
P(Y=1|x)=p
P
(
Y
=
1
∣
x
)
=
p
and
P
(
Y
=
1
∣
x
)
=
1
−
p
P(Y=1|x)=1-p
P
(
Y
=
1
∣
x
)
=
1
−
p
那么似然函数就是:# 不懂似然函数,先后面有讲似然函数,看完再回来
∏
i
=
1
N
=
[
p
i
]
y
i
[
1
−
p
i
]
1
−
y
i
\prod_{i=1}^N=[p_i]^{y_i}[1-p_i]^{1-{y^i}}
i
=
1
∏
N
=
[
p
i
]
y
i
[
1
−
p
i
]
1
−
y
i
然后取对数,得到对数似然函数
L
(
w
)
L(w)
L
(
w
)
L
(
w
)
=
∑
i
=
1
N
[
y
i
l
o
g
e
p
i
+
(
1
−
y
i
)
l
o
g
e
(
1
−
p
i
)
]
L(w)=\sum_{i=1}^N[y_i{log_e{p_i}}+(1-y_i)log_e(1-p_i)]
L
(
w
)
=
i
=
1
∑
N
[
y
i
l
o
g
e
p
i
+
(
1
−
y
i
)
l
o
g
e
(
1
−
p
i
)
]
=
y
i
l
o
g
e
p
i
(
1
−
p
i
)
+
l
o
g
e
(
1
−
p
i
)
=y_i{log_e\frac{p_i}{(1-p_i)}}+log_e(1-p_i)
=
y
i
l
o
g
e
(
1
−
p
i
)
p
i
+
l
o
g
e
(
1
−
p
i
)
=
(
w
⋅
x
+
b
)
+
l
o
g
e
1
1
+
e
x
p
(
w
⋅
x
+
b
)
=(w\cdot{x}+b)+log_e\frac{1}{1+exp(w\cdot{x}+b)}
=
(
w
⋅
x
+
b
)
+
l
o
g
e
1
+
e
x
p
(
w
⋅
x
+
b
)
1
=
(
w
⋅
x
+
b
)
−
l
o
g
e
(
1
+
e
x
p
(
w
⋅
x
+
b
)
)
=(w\cdot{x}+b)-log_e(1+exp(w\cdot{x}+b))
=
(
w
⋅
x
+
b
)
−
l
o
g
e
(
1
+
e
x
p
(
w
⋅
x
+
b
)
)
下一步使用梯度下降求使得
L
(
w
)
L(w)
L
(
w
)
最大的
w
w
w
的值就可以
似然函数和极大似然估计
似然函数的定义是:
L
(
θ
∣
x
)
=
f
(
x
∣
θ
)
L(\theta|x) = f(x|\theta)
L
(
θ
∣
x
)
=
f
(
x
∣
θ
)
可以看具体数学含义:
先看1再看2再看1