#全连接型神经网络及其优化(C/C++)
摘要
:基于C语言描述全连接型神经网络(基于BP梯度下降算法),并依据相关理论对神经网络做出优化。测试数据主要包括与运算,异或运算,以及处理电子科技大学2018年数学建模校赛的数据库。这些数据均以.csv数据库形式存放。在神经网络的隐藏层和输出层都使用sigmoid函数。不过考虑到有些学者的论文[[4]],输出层可以考虑换为pureline函数,读者自己操作求导替换。
##一、基于BP算法的未优化的全连接型神经网络(SDBP最速下降)
1、接口说明
attributes——输入属性矩阵,包含num_sample个样本,num_attribute个属性。
identify——输出标记向量,包含num_sample个样本。
learning_rate——学习率。
hidden_neu——隐藏层神经元个数。
minimum_error——迭代停止误差
2、算法设计思路
输入
属性矩阵,标记向量,学习率,隐层神经元个数,迭代停止误差。
输出
权重矩阵(.csv文件),权重向量(.csv),每一轮迭代的误差。
定义数据结构
聚合变量:权重矩阵(含阈值)、权重向量(含阈值)、隐藏层神经元输出
标量:误差、梯度1(隐层-输出)、梯度2(输入-隐层)、权重改变1、权重改变2、神经网络输出、临时线性输出结果。
初始化
利用-1 to 1之间的数随机初始化权重矩阵、权重向量。
在误差未达标时训练
对每个样本,依次前向传播、修正输出层权重、修正输入层权重。
计算累积误差
每遍历所有样本,检查均方误差是否达到标准。
3、算法设计技巧
C语言中每一个循环理解为,把循环变量代表的神经元作为对象,循环体是对这个对象的操纵。
严格遵循C语言程序设计中数据结构+算法的设计原则。算法优化时分别针对优化对象改变数据结构、算法。数据结构优化时指明变量类型,标量或是聚合变量。提醒大家一下,程序写完一定要进行安全性检查,防止出现内存泄露。
在读取与保存数据时要严格区分列向量与行向量。
4、算法实现(C/C++语言)
void bp_net_training(double** const attributes, double* const identify, long long num_sample, int num_attribute,double learning_rate,double minimun_error,int hidden_neu)
{
//Define data structure.
/*数据结构中的权重矩阵均有一组权重留给阈值,方便做统一化处理*/
double error = 100.0;
double** weight_hidden = new double*[hidden_neu];//Include theta
for (int i = 0; i < hidden_neu ; i++)
weight_hidden[i] = new double[num_attribute+1];
double* weight_out = new double[hidden_neu+1];//Include theta2.
double* neural = new double[hidden_neu];//The outputs of hidden neural.
double g,e,delta_weight_hidden,delta_weight_output,produce_output,temp_neural,temp_out;//Store temporary variable.
//Random produce.
srand((unsigned)time(NULL));
for(int i=0;i<hidden_neu;++i)
for (int j = 0; j < num_attribute+1; ++j)
weight_hidden[i][j] = rand_between_p1ton1();
for (int i = 0; i < hidden_neu+1; i++)
weight_out[i] = rand_between_p1ton1();
//Training
int count = 0;
while (error/num_sample > minimun_error)
{
count++;
error = 0.0;
for (int i = 0; i < num_sample; ++i)//Look every sample.
{
//Look forward-hidden(前向传播输入-隐藏层)
for (int j = 0; j < hidden_neu; j++)
{
temp_neural = 0;
for (int k = 0; k < num_attribute; k++)
temp_neural += weight_hidden[j][k] * attributes[i][k];
temp_neural -= weight_hidden[j][num_attribute];
neural[j] = sigmoid(temp_neural);
//Look forward-out(前向传播隐藏层-输出)
temp_out = 0.0;
for (int j = 0; j < hidden_neu; j++)
temp_out += neural[j] * weight_out[j];
temp_out -= weight_out[hidden_neu];
produce_output = sigmoid(temp_out);
//Look backward-out(反向优化输出层-隐藏层)
g = produce_output * (1 - produce_output)*(identify[i] - produce_output);//按照sigmoid函数,计算梯度(导数)
for (int j = 0; j < hidden_neu; j++)//Renew each weight to out.
{
//按照梯度下降法更新每一输出层权重
weight_out[j] += learning_rate*g*neural[j];
}
weight_out[hidden_neu] -= learning_rate * g;//Renew fake neural to out.//处理伪节点(提供阈值输入的节点)
//Look backward-hidden(反向优化隐藏层-输入)
for (int j = 0; j < hidden_neu; j++)//对每一个神经元对象
{
e = neural[j] * (1 - neural[j])*weight_out[j] * g;
for (int k = 0; k < num_attribute; k++)
{
delta_weight_hidden = learning_rate * e*attributes[i][k];
weight_hidden[j][k] += delta_weight_hidden;
}
weight_hidden[j][num_attribute] -= e * learning_rate;//单独处理阈值
}
error += (produce_output - identify[i])*(produce_output - identify[i]);
}
//每一轮迭代后在屏幕上输出误差和累计平均误差。
cout << "This is " << count << " iteration.The error is " << error;
cout<< ".The accumulation error is " <<error/num_sample<<'.' <<endl;
}
write_csv("weight_to_hidden.csv",weight_hidden,hidden_neu , num_attribute+1);
write_csv("weight_to_out.csv", weight_out, hidden_neu + 1);
for (int i = 0; i < hidden_neu ; i++)
delete[] weight_hidden[i];
delete[] weight_hidden;
delete[] weight_out;
delete[] neural;
}
double sigmoid(double x)
{
return 1 / (1 + exp(-x));
}
double rand_between_p1ton1()//Exclude 1 and -1.
{
double rand_temp;
while ((rand_temp = (double)rand() / RAND_MAX) == 1||rand_temp==0);
return 2 * rand_temp - 1;
}
总体评价与准确性研究
:在进行异或运算的模拟实现时,当隐层神经元数量为7和3,学习率为2时效果较好。当隐层神经元偏高为10时,输出近似为1,当隐层神经元偏低为4时,输出近似为0。综上,隐层神经元数目存在周期性,按照常用的经验公式[2],可以进行选择。隐层神经元数目的选择直接影响结果的准确性。而学习率仅影响收敛速率和误差而不直接影响结果准确性。这一点在参数的调节中显得尤为重要。
##二、基于BP算法优化的全连接型神经网络算法(算法层优化)
根据 Kplmogorov 定理,只要一个隐层的三层BP网络就可以实现对任意精度的逼近。因此,在优化时不在考虑增加网络层数,虽然增加网络层数会提高网络的拟合效果。
1、动量BP算法改进学习率[1]
为了防止陷入局部最优解,引入动量因子,使本次的梯度下降不仅与本次的计算有关,还与上一个样本作用于网络时的梯度下降有关。因此,数据结构层增加聚合变量gl表示上次梯度下降,增加mc标量表示动量因子,是权值修正具有惯性和震荡能力,提高收敛速度。但是,当新的误差变化率相对旧的误差变化率过大时,有必要取消前一步计算所得到的权值变化。
我们常常采用当本次误差>1.04倍的上次误差,mc=0.当本次误差<上次误差,mc=0.95.其他情况动量不变。
带有附加动量因子的权值调节公式为
Δ
w
i
j
(
k
+
1
)
=
(
1
−
m
c
)
η
δ
i
p
j
+
m
c
Δ
w
i
j
(
k
)
\Delta w_{ij}(k+1)=(1-mc)\eta \delta _{i}p_{j}+mc\Delta w_{ij}(k)
Δ
w
i
j
(
k
+
1
)
=
(
1
−
m
c
)
η
δ
i
p
j
+
m
c
Δ
w
i
j
(
k
)
阈值调节公式为
Δ
b
i
(
k
+
1
)
=
(
1
−
m
c
)
η
δ
i
+
m
c
Δ
b
i
(
k
)
\Delta b_{i}(k+1)=(1-mc)\eta \delta_{i}+mc\Delta b_{i}(k)
Δ
b
i
(
k
+
1
)
=
(
1
−
m
c
)
η
δ
i
+
m
c
Δ
b
i
(
k
)
函数头改为
void bp_net_training(double** const attributes, double* const identify, long long num_sample, int num_attribute,double learning_rate,double minimun_error,int hidden_neu,double mc)
mc为动量因子,一般取值在0.95左右。
数据结构部分增加下列初始化语句,初始化聚合变量保存上次梯度下降值。
double* gl_out = new double[hidden_neu+1];
double** el_hidden = new double*[hidden_neu];
for (int i = 0; i < hidden_neu ; i++)
el_hidden[i] = new double[num_attribute+1];
for(int i=0;i<hidden_neu+1;i++)
gl_out[i]=0;
for(int i=0;i<hidden_neu;i++)
{
for(int j=0;j<num_attribute+1;++j)
el_hidden[i][j]=0;
}
double last_error=error;
迭代开始时,增加下列语句判定动量的赋值,第一次迭代时执行mc=mc。
if(last_error*1.04<error)
mc=0;
else if(last_error>error)
mc=0.95;
else
mc=mc;
算法结束后增加下列语句释放内存,保证软件安全性。
for (int i = 0; i < hidden_neu ; i++)
delete[] el_hidden[i];
delete[] el_hidden;
delete[] gl_out;
向后传播(输出-隐层)改为
g = produce_output * (1 - produce_output)*(identify[i] - produce_output);
for (int j = 0; j < hidden_neu; j++)//Renew each weight to out.
{
weight_out[j] += learning_rate*g*neural[j]*(1-mc)+mc*gl_out[j];
gl_out[j]=-g*neural[j];
}
weight_out[hidden_neu] -= learning_rate * g;//Renew fake neural to out.
向后传播(隐层-输入)改为
for (int j = 0; j < hidden_neu; j++)
{
e = neural[j] * (1 - neural[j])*weight_out[j] * g;
for (int k = 0; k < num_attribute; k++)
{
delta_weight_hidden = learning_rate * e*attributes[i][k];
weight_hidden[j][k] += delta_weight_hidden*(1-mc)+mc*el_hidden[j][k];
el_hidden[j][k]=-e*attributes[i][k];
}
weight_hidden[j][num_attribute] -= e * learning_rate;
}
2、指数衰减法改进学习率
指数衰减法的基本思想是先使用较大的学习率快速得到一个比较好的解,再使用较小的学习率进一步优化使模型趋于稳定,既加快了收敛速度又防止解出现震荡。指数衰减法的学习率计算公式如下:
学
习
率
=
初
始
学
习
率
×
衰
减
系
数
迭
代
次
数
衰
减
步
长
学习率=初始学习率\times衰减系数^\frac{迭代次数}{衰减步长}
学
习
率
=
初
始
学
习
率
×
衰
减
系
数
衰
减
步
长
迭
代
次
数
iteration_count——迭代次数
decay_coefficient——衰减系数
decay_step——衰减步长(迭代多少次进行衰减)
注意:这里有两种衰减方式,如果使用double类型,则
迭
代
次
数
衰
减
步
长
\frac{迭代次数}{衰减步长}
衰
减
步
长
迭
代
次
数
是浮点数,每一次迭代学习率都会减小。如果使用int类型,则每隔一个衰减步长才会衰减一次。这里的衰减系数又称为衰减速度decay_rate。这里使用第一种double方法衰减。衰减系数是一个取值在0-1之间的浮点数,迭代次数与衰减步长均为整数。
在原始BP算法基础上做以下改动:
函数头改为
void bp_net_training(double** const attributes, double* const identify, long long num_sample, int num_attribute,double init_learning_rate,double minimun_error,int hidden_neu,double decay_coefficient,double decay_step)
在数据结构初始化区域初始化学习率
double learning_rate=init_learning_rate;
在外层循环内部紧接着迭代次数自增的地方加入每次迭代都更新的学习率。
iteration_count++;
learning_rate=init_learning_rate*pow(decay_coefficient,iteration_count/decay_step);
因为没有采用C++面向对象的设计方法,因此缺少了矩阵和向量的支持,下一步优化的方向是引入C++线性代数库eigen。基于矩阵对象方法进一步做L-M优化算法减少过拟合等。
##三、多输出的神经网络算法(数据结构层优化)
下面是采用C/C++语言描述的多输出全连接型神经网络,我使用架构的神经网络预测了一组随机生成的数据集,发现效果很好,泛化能力也比较强,下一步是针对算法做进一步的优化。下面是算法的源代码。
#include<iostream>
#include<cstdlib>
#include<stdio.h>
#include<string.h>
#include<ctime>
#include<cmath>
#include<cctype>
#define HIDDEN 10
#define ATTRIBUTE 5
#define SAMPLE 30000
#define TESTNUM 142
#define OUT 2
#define EPI 80
using namespace std;
void one(double** data, long long m, int n, double*&max);
void look_forward(double** attributes, int which_sample, double* neural, int hidden_neu, double* produce_output, int num_identify, double** weight_hidden, int num_attribute, double** weight_out);
double rand_between_p1ton1();
void de_one(double** data, long long m, int n, double*&max);
bool read_csv(const char* file_name, long long m, int n, double** data_set, int MAX_UNIT);
bool read_csv(const char* file_name, long long m, double* data_set, int MAX_UNIT);
bool write_csv(const char* file_name, double** mat, int m, int n);
bool write_csv(const char* file_name, double* vec, int n);
void bp_net_test(double**attributes, int num_att, int num_sam, int num_out, double**net_hidden, double**net_out, int neu_num,double*&max);
void bp_net_training(double** const attributes, double** const identify, long long num_sample, int num_attribute, int num_identify, double learning_rate, double minimun_error, int hidden_neu);
int main(void)
{
//train
double** training_att_set = new double*[SAMPLE];
double*max1, *max2;
for (long long i = 0; i < SAMPLE; i++)
training_att_set[i] = new double[ATTRIBUTE];
double** training_out_set = new double*[SAMPLE];
for (long long i = 0; i<SAMPLE; i++)
training_out_set[i] = new double[OUT];
if ((read_csv("training_attri.csv", SAMPLE, ATTRIBUTE, training_att_set, EPI)) == false)
cout << "False!!!" << endl;
if ((read_csv("training_out.csv", SAMPLE, OUT, training_out_set, EPI)) == false)
cout << "False!!!" << endl;
//one(training_att_set, 4, 2, max1);
one(training_out_set, SAMPLE, OUT, max2);
bp_net_training(training_att_set, training_out_set, SAMPLE, ATTRIBUTE, OUT, 0.9, 0.008, HIDDEN);
for (long long i = 0; i < SAMPLE; i++)
delete[]training_att_set[i];
delete[]training_att_set;
delete[]training_out_set;
//test
double** weight_hidden = new double*[HIDDEN];//Include theta
for (int i = 0; i < HIDDEN; i++)
weight_hidden[i] = new double[ATTRIBUTE + 1];
double** testing_att = new double*[TESTNUM];//Include theta
for (int i = 0; i < TESTNUM; i++)
testing_att[i] = new double[ATTRIBUTE];
double** weight_out = new double*[OUT];//Include theta2.
for (int i = 0; i<OUT; i++)
weight_out[i] = new double[HIDDEN+1];
read_csv("weight_to_hidden.csv", HIDDEN, ATTRIBUTE + 1, weight_hidden, EPI);
read_csv("weight_to_out.csv", OUT, HIDDEN + 1, weight_out, EPI);
if (read_csv("testing_attri.csv", SAMPLE, ATTRIBUTE, testing_att, EPI) == false)
cout << "Wrong!" << endl;
bp_net_test(testing_att, ATTRIBUTE, TESTNUM, OUT, weight_hidden, weight_out, HIDDEN,max2);
for (int i = 0; i < HIDDEN; i++)
delete[] weight_hidden[i];
delete[] weight_hidden;
for (int i = 0; i<OUT; i++)
delete[]weight_out[i];
delete[] weight_out;
for (int i = 0; i < ATTRIBUTE ; i++)
delete[] testing_att[i];
delete[]testing_att;
return 0;
}
//Input nuu_num should exclude the fake neual.
void bp_net_test(double**attributes, int num_att, int num_sam, int num_out, double**net_hidden, double**net_out, int neu_num,double*& max)
{
double* temp_mat = new double[neu_num];
double** output = new double*[num_sam];
for (int i = 0; i < num_sam; i++)
output[i] = new double[num_out];
for (int ns = 0; ns < num_sam; ns++)
{
look_forward(attributes, ns, temp_mat, neu_num, output[ns], num_out, net_hidden, num_att, net_out);
}
de_one(output, num_sam, num_out, max);
write_csv("test_output.csv", output, num_sam, num_out);
delete[]temp_mat;
for (int i = 0; i < neu_num; i++)
delete[] output[i];
delete[] output;
}
void one(double** data, long long m, int n, double*&max)
{
max = new double[n];
for (int i = 0; i < n; i++)
{
max[i] = data[0][i];
for (long long j = 0; j < m; j++)
{
if (data[j][i] > max[i])
max[i] = data[j][i];
}
}
for (int i = 0; i<m; i++)
for (int j = 0; j < n; j++)
{
data[i][j] /= max[j];
}
}
void one(double* data, long long m, double*&max)
{
max = new double;
*max = data[0];
for (long long int i = 1; i < m; i++)
{
if (*max < data[i])
*max = data[i];
}
for (long long i = 0; i < m; i++)
{
data[i] /= *max;
}
}
//For Matrix
void de_one(double** data, long long m, int n, double*&max)
{
for (long long i = 0; i < m; i++)
{
for (int j = 0; j < n; j++)
{
data[i][j] *= max[j];
}
}
delete[]max;
}
//Only one kind of identify.m is the number of samples.n is the number of attributes.
//When every sample, only remain the weight but delete the neural output.
//When every iteration, only remain the weight but reset the error.
//minimum error is minimum accumulation error
void bp_net_training(double** const attributes, double** const identify, long long num_sample, int num_attribute, int num_identify, double learning_rate, double minimun_error, int hidden_neu)
{
//Define data structure.
double error = 10000.0;
double** weight_hidden = new double*[hidden_neu];//Include theta
for (int i = 0; i < hidden_neu; i++)
weight_hidden[i] = new double[num_attribute + 1];
double** weight_out = new double*[num_identify];//Include theta2.
for (int i = 0; i<num_identify; i++)
weight_out[i] = new double[hidden_neu+1];
double* neural = new double[hidden_neu];//The outputs of hidden neural.
double* produce_output = new double[num_identify];
double e, delta_weight_hidden, delta_weight_output, temp_neural, temp_out, sum_temp;//Store temporary variable.
double* g = new double[num_identify];
//Random produce.
srand((unsigned)time(NULL));
for (int i = 0; i<hidden_neu; ++i)
for (int j = 0; j < num_attribute + 1; ++j)
weight_hidden[i][j] = rand_between_p1ton1();
for (int i = 0; i < num_identify; i++)
for (int j = 0; j<hidden_neu + 1; j++)
weight_out[i][j] = rand_between_p1ton1();
//Training
int iteration_count = 0;
while (error / num_sample > minimun_error)
{
iteration_count++;
error = 0.0;
for (int i = 0; i < num_sample; ++i)//Look every sample.
{
look_forward(attributes, i, neural, hidden_neu, produce_output, num_identify, weight_hidden, num_attribute, weight_out);
//Look backward-out
for (int k = 0; k < num_identify; k++)
{
g[k] = produce_output[k] * (1 - produce_output[k])*(identify[i][k] - produce_output[k]);
for (int j = 0; j < hidden_neu; j++)//Renew each weight to out.
{
weight_out[k][j] += learning_rate * g[k] * neural[j];
}
weight_out[k][hidden_neu] -= learning_rate * g[k];//Renew fake neural to out.
}
//Look backward-hidden
for (int j = 0; j < hidden_neu; j++)
{
sum_temp = 0;
for (int k = 0; k < num_identify; k++)
{
sum_temp += weight_out[k][j] * g[k];
}
e = neural[j] * (1 - neural[j])*sum_temp;
for (int k = 0; k < num_attribute; k++)
{
delta_weight_hidden = learning_rate * e*attributes[i][k];
weight_hidden[j][k] += delta_weight_hidden;
}
weight_hidden[j][num_attribute] -= e * learning_rate;
}
for (int k = 0; k<num_identify; k++)
error += (produce_output[k] - identify[i][k])*(produce_output[k] - identify[i][k]);
}
cout << "This is " << iteration_count << " iteration.The error is " << error;
cout << ".The accumulation error is " << error / num_sample << '.' << endl;
}
write_csv("weight_to_hidden.csv", weight_hidden, hidden_neu, num_attribute + 1);
write_csv("weight_to_out.csv", weight_out, num_identify, hidden_neu + 1);
for (int j = 0; j < hidden_neu; j++)
delete[] weight_hidden[j];
delete[] weight_hidden;
for (int i = 0; i < num_identify; i++)
delete[] weight_out[i];
delete[] weight_out;
delete[] neural;
delete[] produce_output;
delete[] g;
}
double sigmoid(double x)
{
return 1 / (1 + exp(-x));
}
double rand_between_p1ton1()//Exclude 1 and -1.
{
double rand_temp;
while ((rand_temp = (double)rand() / RAND_MAX) == 1 || rand_temp == 0);
return 2 * rand_temp - 1;
}
void look_forward(double** attributes, int which_sample, double* neural, int hidden_neu, double* produce_output, int num_identify, double** weight_hidden, int num_attribute, double** weight_out)
{
double temp_neural, temp_out;
//Look forward-hidden
for (int j = 0; j < hidden_neu; j++)
{
temp_neural = 0;
for (int k = 0; k < num_attribute; k++)
temp_neural += weight_hidden[j][k] * attributes[which_sample][k];
temp_neural -= weight_hidden[j][num_attribute];
neural[j] = sigmoid(temp_neural);
}
//Look forward-out
for (int k = 0; k<num_identify; k++)
{
temp_out = 0.0;
for (int j = 0; j < hidden_neu; j++)
temp_out += neural[j] * weight_out[k][j];
temp_out -= weight_out[k][hidden_neu];
produce_output[k] = sigmoid(temp_out);
}
}
//m-The number of samples.n-The number of attributes.MAX_UNIT-The maxmum byte of a unit.
//For matrix.
bool read_csv(const char* file_name, long long m, int n, double** data_set, int MAX_UNIT)
{
FILE*fp = fopen(file_name, "r");
if (fp == NULL)
return false;
char* buf = new char[n *MAX_UNIT];//Every line data.
char* record, *line;
long long count_row = 0;
int count_col = 0;
double d;
while ((line = fgets(buf, n * MAX_UNIT, fp)) != NULL && count_row < m)
{
count_col = 0;
record = strtok(line, ",");
while (record != NULL && count_col<n)
{
d = atof(record);
data_set[count_row][count_col] = d;
record = strtok(NULL, ",");
count_col++;
}
count_row++;
}
delete[]buf;
fclose(fp);
return true;
}
//Overloading for the vector.
bool read_csv(const char* file_name, long long m, double* data_set, int MAX_UNIT)
{
FILE*fp = fopen(file_name, "r");
if (fp == NULL)
return false;
char* buf = new char[MAX_UNIT];
char* record, *line;
long long count_row = 0;
double d;
while ((line = fgets(buf, MAX_UNIT, fp)) != NULL && count_row < m)
{
record = strtok(line, ",");
d = atof(record);
data_set[count_row] = d;
record = strtok(NULL, ",");
count_row++;
}
delete[]buf;
fclose(fp);
return true;
}
//For matrix.
bool write_csv(const char* file_name, double** mat, int m, int n)
{
FILE*fp = fopen(file_name, "w+");
if (fp == NULL)
return false;
long long i = 0;
int j;
while (i < m)
{
j = 0;
while (j < n)
{
fprintf(fp, "%lf", mat[i][j]);
fputc(',', fp);
j++;
}
fputc('\n', fp);
i++;
}
fclose(fp);
return true;
}
//Overloading for the vector.
bool write_csv(const char* file_name, double* vec, int n)
{
FILE*fp = fopen(file_name, "w+");
if (fp == NULL)
return false;
long long i = 0;
while (i < n)
{
fprintf(fp, "%lf", vec[i]);
fputc('\n', fp);
i++;
}
fclose(fp);
return true;
}
参考文献
[1]王雅楠,孟晓景.基于动量BP算法的神经网络房价预测研究[J].软件导刊,2015,14(02):59-61.
[2]蔡荣辉,崔雨轩,薛培静.三层BP神经网络隐层节点数确定方法探究[A].电脑与信息技术,2017,10.
[3]ZhangZipeng,XueLiqin,Wang Shuqing.A fuzzy neural network controller for air-condition with frequency change[C].Proceedings of the Fifth International Conference on Fuzzy Systems and Knowledge Discovery.Computer Society,2008:47-51.
[4]王向宇,须文波,孙俊,赵琪.利用VLRBP神经网络改善汇率预测[J].计算机工程与应用,2010,46(06):208-213+244.