贝叶斯优化github地址:
https://github.com/fmfn/BayesianOptimization
paper地址:
http://papers.nips.cc/paper/4522-practical-bayesian%20-optimization-of-machine-learning-algorithms.pdf
Snoek, Jasper, Hugo Larochelle, and Ryan P. Adams. “Practical bayesian optimization of machine learning algorithms.” Advances in neural information processing systems 25 (2012).
以随机森林为例:
1. 构造数据源
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
from bayes_opt import BayesianOptimization
import numpy as np
import pandas as pd
然后构造一个二分类任务:
x, y = make_classification(n_samples=1000, n_features=5, n_classes=2)
2. 构造黑盒目标函数
def rf_cv(n_estimators, min_samples_split, max_features, max_depth):
val = cross_val_score(
# 这些是随机森林的
RandomForestClassifier(n_estimators=int(n_estimators),
min_samples_split=int(min_samples_split),
max_features=min(max_features, 0.999), # float
max_depth=int(max_depth),
random_state=2),
x, y, scoring=['f1', 'accuracy'], cv=5
).mean()
return val
3. 确定取值空间
pbounds = {'n_estimators': (10, 250), # 表示取值范围为10至250
'min_samples_split': (2, 25),
'max_features': (0.1, 0.999),
'max_depth': (5, 15)}
这里字典里的
key
要与目标函数的参数名对应
4. 构造贝叶斯优化器
optimizer = BayesianOptimization(
f=rf_cv, # 黑盒目标函数
pbounds=pbounds, # 取值空间
verbose=2, # verbose = 2 时打印全部,verbose = 1 时打印运行中发现的最大值,verbose = 0 将什么都不打印
random_state=1,
)
5. 运行,导出结果与最优参数
optimizer.maximize( # 运行
init_points=5, # 随机搜索的步数
n_iter=25, # 执行贝叶斯优化迭代次数
)
print(optimizer.res) # 所有优化的结果
print(optimizer.max) # 最好的结果与对应的参数
全部代码
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
from bayes_opt import BayesianOptimization
import numpy as np
import pandas as pd
# 产生随机分类数据集,10个特征, 2个类别
x, y = make_classification(n_samples=1000, n_features=5, n_classes=2)
# 步骤一:构造黑盒目标函数
def rf_cv(n_estimators, min_samples_split, max_features, max_depth):
val = cross_val_score(
RandomForestClassifier(n_estimators=int(n_estimators),
min_samples_split=int(min_samples_split),
max_features=min(max_features, 0.999), # float
max_depth=int(max_depth),
random_state=2),
x, y, scoring='f1', cv=5
).mean()
return val
# 步骤二:确定取值空间
pbounds = {'n_estimators': (10, 250), # 表示取值范围为10至250
'min_samples_split': (2, 25),
'max_features': (0.1, 0.999),
'max_depth': (5, 15)}
# 步骤三:构造贝叶斯优化器
optimizer = BayesianOptimization(
f=rf_cv, # 黑盒目标函数
pbounds=pbounds, # 取值空间
verbose=2, # verbose = 2 时打印全部,verbose = 1 时打印运行中发现的最大值,verbose = 0 将什么都不打印
random_state=1,
)
optimizer.maximize( # 运行
init_points=5, # 随机搜索的步数
n_iter=25, # 执行贝叶斯优化迭代次数
)
print(optimizer.res) # 打印所有优化的结果
print(optimizer.max) # 最好的结果与对应的参数
结果显示如下:
iter | target | max_depth | max_fe… | min_sa… | n_esti… |
---|---|---|---|---|---|
1 | 0.9521 | 9.17 | 0.7476 | 2.003 | 82.56 |
2 | 0.9475 | 6.468 | 0.183 | 6.284 | 92.93 |
3 | 0.9502 | 8.968 | 0.5844 | 11.64 | 174.5 |
4 | 0.952 | 7.045 | 0.8894 | 2.63 | 170.9 |
5 | 0.9521 | 9.173 | 0.6023 | 5.229 | 57.54 |
6 | 0.9522 | 8.304 | 0.6073 | 5.086 | 57.32 |
7 | 0.9511 | 10.59 | 0.7231 | 2.47 | 74.19 |
8 | 0.9466 | 6.611 | 0.2431 | 3.667 | 49.53 |
9 | 0.9492 | 6.182 | 0.8803 | 4.411 | 62.05 |
10 | 0.9514 | 7.735 | 0.1164 | 4.576 | 79.58 |
11 | 0.9531 | 12.72 | 0.4108 | 4.389 | 81.27 |
12 | 0.9513 | 14.28 | 0.7338 | 3.12 | 84.51 |
13 | 0.9501 | 14.8 | 0.8398 | 6.767 | 77.78 |
14 | 0.9512 | 12.65 | 0.2956 | 2.376 | 79.54 |
15 | 0.9523 | 12.04 | 0.1053 | 6.513 | 82.47 |
16 | 0.9501 | 11.79 | 0.6655 | 2.21 | 168.6 |
17 | 0.9533 | 8.374 | 0.422 | 9.813 | 56.87 |
18 | 0.9523 | 11.81 | 0.8737 | 11.05 | 56.84 |
19 | 0.9523 | 8.27 | 0.6367 | 13.32 | 57.61 |
20 | 0.9514 | 8.126 | 0.4081 | 11.01 | 53.97 |
21 | 0.9495 | 9.323 | 0.1 | 10.2 | 60.14 |
22 | 0.9546 | 8.76 | 0.1512 | 7.381 | 55.76 |
23 | 0.9505 | 10.76 | 0.1433 | 7.155 | 55.15 |
24 | 0.9555 | 7.206 | 0.4456 | 6.973 | 55.74 |
25 | 0.9543 | 5.359 | 0.9809 | 7.835 | 55.49 |
26 | 0.9554 | 7.083 | 0.4153 | 8.075 | 55.05 |
27 | 0.9554 | 6.963 | 0.5163 | 8.687 | 56.26 |
28 | 0.9543 | 14.52 | 0.7094 | 16.4 | 56.91 |
29 | 0.9515 | 12.07 | 0.7272 | 19.06 | 56.5 |
30 | 0.9512 | 14.3 | 0.524 | 14.43 | 59.32 |
最优的参数为:
{'target': 0.9554574460534715,
'params':{
'max_depth': 7.2061957920136965,
'max_features': 0.44564993926538743,
'min_samples_split': 6.972807143834928,
'n_estimators': 55.73671041246315
}}