梯度下降法 – 小飞侠

梯度下降法的定义就是从百科上拿下来的，梯度下降是迭代法的一种,可以用于求解最小二乘问题(线性和非线性都可以)。在求解机器学习算法的模型参数，即无约束优化问题时，梯度下降（Gradient Descent）是最常采用的方法之一，另一种常用的方法是最小二乘法。在求解损失函数的最小值时，可以通过梯度下降法来一步步的迭代求解，得到最小化的损失函数和模型参数值。反过来，如果我们需要求解损失函数的最大值，这时就需要用梯度上升法来迭代了。换句话说，梯度向量指向上坡，负梯度向量值向下坡，我们在负梯度方向上移动可以减少函数f值，这被称为梯度下降法。

在机器学习中，基于基本的梯度下降法发展了三种梯度下降方法，分别为随机梯度下降法、批量梯度下降法和mini-batch随机梯度下降法。批量梯度下降法是求得所有样本上的风险函数最小值，随机梯度下降法在于每次迭代的风险是针对单个样本而不是整个样本，所以该方法的收敛速度非常快。而mini-batch随机梯度下降法是结合了前2者的优点，属于是一个折中的方法，每一迭代的时候是拿数据集中一部分样本进行训练的。

如果一个实值函数在a点处可微并且有定义，那么函数f(x)在a点沿着梯度相反的方法-∇f(a)下降最快。梯度下降法的迭代公式为：

其中λ>0是梯度方向上的搜索步长，搜索步长必须合适，如果过大就不会收敛，如果过小就使得收敛点速度过慢，一般步长可以由线性搜索算法来确定。

线性回归的损失函数通常定义为平方损失函数：

也可表示为

其中f(x,a)是关于线性回归方程的拟合函数，i表示样本标号，j表示样本维度标号，对于线性回归的参数求解方法可以是最小二乘法了，这里介绍梯度下降法来求解，为了在求导的时候消除平方的影响，故原方程为：

为第i次迭代选取的样本。

批量梯度下降法

（1）计算L对a_j的偏导数，得到a_j的偏导数

（2）更新每个参数a_j的负梯度方向，其中α为步长，更新公式为：

（3）从上式中可以看出，每迭代一次都要用到训练集中所有的数据，如果数据集中的样本比较多，那么迭代的速度就会非常慢。

随机梯度下降法

随机梯度下降法是每次迭代的时候都会使用一个样本，不会用到整体训练集的样本。更新公式为：

mini-batch随机梯度下降法

在mini-batch梯度下降方法中，我们从数据集m个样本中选择k个样本进行迭代，1<k<m，可以根据不同的样本数据来调整k。对应的更新公式为：

在使用梯度下降算法时，可以通过改变步长、算法参考的初始值、归一化特征等操作来提高算法的效率。

梯度下降法和其他方法的比较：

（1）最小二乘法不需要选择步长，是通过计算方程组求组，而梯度下降法需要选择步长，并通过迭代方法求解。当样本数量非常少时，最小二乘法具有优势，并且计算速度快。梯度下降法适合样本数据比较多的情况。最小二乘法可以得到最优解，而梯度下降法很有可能得到的是局部最小值。

例子：

随机1000个点，100次的迭代。

（1）随机梯度下降法

import numpy as np

import matplotlib.pyplot as plt

from mpl_toolkits.mplot3d import axes3d

from matplotlib import style

#构造数据

def get_data(sample_num=1000):

“””

拟合函数为y = 2*x1 + 3*x2

“””

x1 = np.linspace(0, 9, sample_num)

x2 = np.linspace(4, 13, sample_num)

x = []

for i in range(sample_num):

tmp = []

tmp.append(x1[i])

tmp.append(x2[i])

x.append(tmp)

x = np.array(x)

#np.concatenate(([x1], [x2]), axis=0).T

y = []

for i in range(sample_num):

y.append(x1[i]*2+3*x2[i])

y = np.array(y)

painter3D(x1,x2,y)

return x, y

#随机梯度下降法

def SGD(samples, y, step_size=0.01, max_iter_count=100):

“””

:param samples: 样本

:param y: 结果value

:param step_size: 每一接迭代的步长

:param max_iter_count: 最大的迭代次数

:param batch_size: 随机选取的相对于总样本的大小

:return:

“””

#确定样本数量以及变量的个数初始化theta值

m, var = samples.shape #100*2

theta = np.zeros(2) #参数

#进入循环内

loss = 1

iter_count = 0

iter_list=[]

loss_list=[]

theta1=[]

theta2=[]

#当损失精度大于0.01且迭代此时小于最大迭代次数时，进行

while loss > 0.01 and iter_count < max_iter_count:

loss = 0

#梯度计算

theta1.append(theta[0])

theta2.append(theta[1])

#样本维数下标

rand1 = np.random.randint(0,m,1)

h = np.dot(theta,samples[rand1].T)

#关键点，只需要一个样本点来更新权值

for i in range(len(theta)):

theta[i] =theta[i] – step_size*(h – y[rand1])*samples[rand1,i]

#计算总体的损失精度，等于各个样本损失精度之和

for i in range(m):

h = theta[0] * samples[i][0] + theta[1] * samples[i][1]

#每组样本点损失的精度

every_loss = (1/(var*m))*np.power((h – y[i]), 2)

loss = loss + every_loss

print(“iter_count: “, iter_count, “the loss:”, loss)

iter_list.append(iter_count)

loss_list.append(loss)

iter_count += 1

plt.plot(iter_list,loss_list)

plt.xlabel(“iter”)

plt.ylabel(“loss”)

plt.show()

return theta1,theta2,theta,loss_list

def painter3D(theta1,theta2,loss):

style.use(‘ggplot’)

fig = plt.figure()

ax1 = fig.add_subplot(111, projection=’3d’)

x,y,z = theta1,theta2,loss

ax1.plot_wireframe(x,y,z, rstride=5, cstride=5)

ax1.set_xlabel(“x”)

ax1.set_ylabel(“y”)

ax1.set_zlabel(“z”)

plt.show()

if __name__ == ‘__main__’:

samples, y = get_data()

theta1,theta2,theta,loss_list = SGD(samples, y)

print(theta) # 会很接近[2, 3]

painter3D(theta1,theta2,loss_list)

（2）批量梯度下降法

import numpy as np

import matplotlib.pyplot as plt

from mpl_toolkits.mplot3d import axes3d

from matplotlib import style

m, var = samples.shape #100*2

theta = np.zeros(2) #参数

#进入循环内

loss = 1

iter_count = 0

iter_list=[]

loss_list=[]

theta1=[]

theta2=[]

#当损失精度大于0.01且迭代此时小于最大迭代次数时，进行

while loss > 0.01 and iter_count < max_iter_count:

loss = 0

#梯度计算

theta1.append(theta[0])

theta2.append(theta[1])

for j in range(m):

h = np.dot(theta,samples[j].T)

for i in range(len(theta)):

theta[i] =theta[i] – step_size*(1/m)*(h – y[j])*samples[j,i]

#计算总体的损失精度，等于各个样本损失精度之和

for i in range(m):

h = theta[0] * samples[i][0] + theta[1] * samples[i][1]

#每组样本点损失的精度

every_loss = (1/(var*m))*np.power((h – y[i]), 2)

loss = loss + every_loss

print(“iter_count: “, iter_count, “the loss:”, loss)

iter_list.append(iter_count)

loss_list.append(loss)

iter_count += 1

plt.plot(iter_list,loss_list)

plt.xlabel(“iter”)

plt.ylabel(“loss”)

plt.show()

return theta1,theta2,theta,loss_list

if __name__ == ‘__main__’:

samples, y = get_data()

theta1,theta2,theta,loss_list = SGD(samples, y)

print(theta) # 会很接近[2, 3]

painter3D(theta1,theta2,loss_list)

（3）mini-batch梯度下降法

import numpy as np

import matplotlib.pyplot as plt

from mpl_toolkits.mplot3d import axes3d

from matplotlib import style

#构造数据

def get_data(sample_num=1000):

“””

拟合函数为y = 2*x1 + 3*x2

“””

x1 = np.linspace(0, 9, sample_num)

x2 = np.linspace(4, 13, sample_num)

x = []

for i in range(sample_num):

tmp = []

tmp.append(x1[i])

tmp.append(x2[i])

x.append(tmp)

x = np.array(x)

#np.concatenate(([x1], [x2]), axis=0).T

y = []

for i in range(sample_num):

y.append(x1[i]*2+3*x2[i])

y = np.array(y)

painter3D(x1,x2,y)

return x, y

#mini梯度下降法

def SGD(samples, y, step_size=0.01, max_iter_count=100):

“””

:param samples: 样本

:param y: 结果value

:param step_size: 每一接迭代的步长

:param max_iter_count: 最大的迭代次数

:param batch_size: 随机选取的相对于总样本的大小

:return:

“””

#确定样本数量以及变量的个数初始化theta值

m, var = samples.shape #100*2

theta = np.zeros(2) #参数

#进入循环内

loss = 1

iter_count = 0

iter_list=[]

loss_list=[]

theta1=[]

theta2=[]

#当损失精度大于0.01且迭代此时小于最大迭代次数时，进行

while loss > 0.01 and iter_count < max_iter_count:

loss = 0

#梯度计算

theta1.append(theta[0])

theta2.append(theta[1])

j = 0

while j < m:

h = np.dot(theta,samples[j].T)

for w in range(10):

if j < m:

for i in range(len(theta)):

theta[i] =theta[i] – step_size*(1/10)*(h – y[j])*samples[j,i]

j += 1

j += 1

#计算总体的损失精度，等于各个样本损失精度之和

for i in range(m):

h = theta[0] * samples[i][0] + theta[1] * samples[i][1]

#每组样本点损失的精度

every_loss = (1/(var*m))*np.power((h – y[i]), 2)

loss = loss + every_loss

print(“iter_count: “, iter_count, “the loss:”, loss)

iter_list.append(iter_count)

loss_list.append(loss)

iter_count += 1

plt.plot(iter_list,loss_list)

plt.xlabel(“iter”)

plt.ylabel(“loss”)

plt.show()

return theta1,theta2,theta,loss_list

if __name__ == ‘__main__’:

samples, y = get_data()

theta1,theta2,theta,loss_list = SGD(samples, y)

print(theta) # 会很接近[2, 3]

painter3D(theta1,theta2,loss_list)

总结

通过上述3个方法可以看出，mini-batch方法具有很好的效果，可以达到1.92和3.09的一个真实值。而在批量梯度方法中可以得到最优解，在随机梯度方法中出现了震荡的情况，这是由于步长的问题导致了在最小值进行徘徊，所以，针对不同的梯度下降方法要进行步长的设定。

改进方向：动量、Nesterov 动量

参考文献

深度学习

https://segmentfault.com/a/1190000011994447

https://blog.csdn.net/lilyth_lilyth/article/details/8973972

https://www.cnblogs.com/pinard/p/5970503.html

https://blog.csdn.net/m2284089331/article/details/76397658

https://blog.csdn.net/pengjian444/article/details/71075544

原文链接：https://blog.csdn.net/shushi6969/article/details/79997925

你可能也喜欢