微软NNI进行神经网络模型剪枝压缩的踩坑记录
最近做毕设嵌入式部署神经网络,想着对网络进行一下剪枝压缩加加速什么的,结果nni使用起来一脸懵逼。。。第一次在CSDN写文章,也是当作自己的学习笔记了~
NNI进行模型剪枝分类
第一眼看上去nni真的支持不少论文中的简直操作,基本都复现了一遍。但有些方法一调用就出错,还有的给了用法,不知道参数是个啥。。。
NNI剪枝的流程
以level为例
from nni.algorithms.compression.pytorch.pruning import LevelPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['default'] }]
#default,需要修改的层
pruner = LevelPruner(model, config_list)
pruner.compress()
pruner.export_model(
os.path.join(args.experiment_data_dir, 'model_masked.pth'),os.path.join(args.experiment_data_dir, 'mask.pth'))#模型保存与模型掩码保存
m_speedup = ModelSpeedup(model, dummy_input, masks_file, device)
m_speedup.speedup_model()#一定要进行speedup才能加速
evaluation_result = evaluator(model)# 评估模型
torch.save(model.state_dict(), os.path.join(args.experiment_data_dir, 'model_speed_up.pth'))
最终保存模型
其中speedup最终为重要,这样才能加速。
但speedup过程真的时间很长,要进行掩码和model参数的计算。
NNI现有剪枝方法
首先是单一简单的剪枝操作:
1.Level Pruner
最简单的基本的一次性 Pruner:可设置目标稀疏度(以分数表示,0.6 表示会剪除 60%)。首先按照绝对值对指定层的权重排序。 然后按照所需的稀疏度,将值最小的权重屏蔽为 0。
from nni.algorithms.compression.pytorch.pruning import LevelPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['default'] }]
#default,需要修改的层
pruner = LevelPruner(model, config_list)
pruner.compress()
2.Slim Pruner
One-Shot Pruner,它在训练过程中对 batch normalization(BN)层的比例因子进行稀疏正则化,以识别不重要的通道。 比例因子值较小的通道将被修剪。
from nni.algorithms.compression.pytorch.pruning import SlimPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['BatchNorm2d'] }]
pruner = SlimPruner(model, config_list)
pruner.compress()
3.FPGM Pruner
One-Shot Pruner,用最小的几何中值修剪卷积滤波器。 FPGM 选择最可替换的滤波器。
from nni.algorithms.compression.pytorch.pruning import FPGMPruner
config_list = [{
'sparsity': 0.5,
'op_types': ['Conv2d'] #修改所有的卷积层
}]
pruner = FPGMPruner(model, config_list)
pruner.compress()
4.L1Filter Pruner
One-Shot Pruner,它修剪 卷积层 中的滤波器。L1正则化剪枝。
from nni.algorithms.compression.pytorch.pruning import L1FilterPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
pruner = L1FilterPruner(model, config_list)
pruner.compress()
5.L2Filter Pruner
这是一种结构化剪枝算法,用于修剪权重的最小 L2 规范卷积滤波器,算是一种一次性修剪器。
from nni.algorithms.compression.pytorch.pruning import L2FilterPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
pruner = L2FilterPruner(model, config_list)
pruner.compress()
还有几种剩下不常用的剪枝操作就暂时不贴出来了。
接下来是组合型的剪枝操作,也是坑比较多的操作:
6.AGP Pruner
一种自动逐步剪枝算法,在 n 个剪枝步骤中,稀疏度从初始的稀疏度(通常为 0)增加到最终的稀疏度。
from nni.algorithms.compression.pytorch.pruning import AGPPruner
config_list = [{
'initial_sparsity': 0,
'final_sparsity': 0.8,
'start_epoch': 0,
'end_epoch': 10,
'frequency': 1,
'op_types': ['default']
}]
# 读取预训练的模型,或在使用 Pruner 前进行训练。
# model = MyModel()
# model.load_state_dict(torch.load('mycheckpoint.pth'))
# AGP Pruner 会在 optimizer. step() 上回调,在微调模型时剪枝,
# 因此,必须要有 optimizer 才能完成模型剪枝。
optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9, weight_decay=1e-4)
pruner = AGPPruner(model, config_list, optimizer, pruning_algorithm='level')
pruner.compress()
AGP剪枝在我使用过程中经常报错,比如我遇到过的报错有
‘0不能转换为float型’,于是我把参数改成了:
config_list = [{
'initial_sparsity': 0.01,
'final_sparsity': 0.8,
'start_epoch': 1,
'end_epoch': 10,
'frequency': 1,
'op_types': ['default']
}]
我是这样解决的,但确实很玄学。。。
同时AGP Pruner 默认使用 LevelPruner 算法来修建权重,还可以设置 pruning_algorithm 参数来使用其它剪枝算法:
level: LevelPruner
slim: SlimPruner
l1: L1FilterPruner
l2: L2FilterPruner
fpgm: FPGMPruner
taylorfo: TaylorFOWeightFilterPruner
apoz: ActivationAPoZRankFilterPruner
mean_activation: ActivationMeanRankFilterPruner
7.NetAdapt Pruner
NetAdapt 在算力足够的情况下,自动简化预训练的网络。 给定整体稀疏度,NetAdapt 可通过迭代剪枝自动为不同层生成不同的稀疏分布。
from nni.algorithms.compression.pytorch.pruning import NetAdaptPruner
config_list = [{
'sparsity': 0.5,
'op_types': ['Conv2d']
}]
pruner = NetAdaptPruner(model, config_list, short_term_fine_tuner=short_term_fine_tuner, evaluator=evaluator,base_algo='l1', experiment_data_dir='./')
pruner.compress()
大坑出现了,在GitHub官方文档里并没有写short_term_fine_tuner和evaluator这两个参数是啥,base_algo是正则化,experiment_data_dir则是模型保存的位置。我翻了很久的sample找到了short_term_fine_tuner和evaluator这两个参数相应定义:
def evaluator(model):
return test(model, device, criterion, val_loader)
def test(model, device, criterion, val_loader):
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in val_loader:
data, target = data.to(device), target.to(device)
output = model(data)
# sum up batch loss
test_loss += criterion(output, target).item()
# get the index of the max log-probability
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(val_loader.dataset)
accuracy = correct / len(val_loader.dataset)
print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.2f}%)\n'.format(
test_loss, correct, len(val_loader.dataset), 100. * accuracy))
return
仔细一看其实就是简单的返回一个accuracy,并且输入仅有model
def short_term_fine_tuner(model, epochs=1):
for epoch in range(epochs):
train(args, model, device, train_loader, criterion, optimizer, epoch)
short_term_fine_tuner是一个epoch的参数微调,也就是训练一次。
8.SimulatedAnnealing Pruner
模拟退火剪枝,此 Pruner 基于先验经验,实现了引导式的启发搜索方法,模拟退火(SA)算法。 增强的模拟退火算法基于以下理论:具有更多权重的深度神经网络层通常具有较高的可压缩度,对整体精度的影响更小。
from nni.algorithms.compression.pytorch.pruning import SimulatedAnnealingPruner
config_list = [{
'sparsity': 0.5,
'op_types': ['Conv2d']
}]
pruner = SimulatedAnnealingPruner(model, config_list, evaluator=evaluator, base_algo='l1', cool_down_rate=0.9, experiment_data_dir='./')
pruner.compress()
evaluator如上
9.AutoCompress Pruner
每一轮中,AutoCompressPruner 会用相同的稀疏度对模型进行剪枝,从而达到总体的稀疏度。
AutoCompress是基于模拟退火的算法。
from nni.algorithms.compression.pytorch.pruning import AutoCompressPruner
config_list = [{
'sparsity': 0.5,
'op_types': ['Conv2d']
}]
pruner = AutoCompressPruner(
model, config_list, trainer=trainer, evaluator=evaluator,
dummy_input=dummy_input, num_iterations=3, optimize_mode='maximize', base_algo='l1',
cool_down_rate=0.9, admm_num_iterations=30, admm_training_epochs=5, experiment_data_dir='./')
pruner.compress()
10.AMC Pruner
AMC Pruner 利用强化学习来提供模型压缩策略。 这种基于学习的压缩策略比传统的基于规则的压缩策略有更高的压缩比, 更好地保存了精度,节省了人力。
from nni.algorithms.compression.pytorch.pruning import AMCPruner
config_list = [{
'op_types': ['Conv2d', 'Linear']
}]
pruner = AMCPruner(model, config_list, evaluator, val_loader, flops_ratio=0.5)
pruner.compress()
val_loader是使用pytorch的dataloader进行数据读入的验证集。
剩下的剪枝操作比较复杂还没有研究透,不过应该大同小异,有机会继续研究更新。
最后贴一下我的测试代码
import nni
import torch
import torch.nn as nn
from nni.compression.pytorch import ModelSpeedup
from model_input import Model_input
import numpy as np
import data_loading
from prefetcher import data_prefetcher
from torch.utils.data import DataLoader
from PIL import Image
from torchvision import transforms
if __name__ == '__main__':
model_name = 'shufflenet'
pruning_class = 'SimulatedAnnealing'
model = Model_input(model_name).model_final
if model_name != 'shufflenet_pruned':
dict_save_path = 'model/' + model_name + '_20210311.pkl'
model.load_state_dict(torch.load(dict_save_path, map_location="cuda:0"))
else :
dict_save_path = 'model/' + model_name + '_' +pruning_class + ".pth"
model.load_state_dict(torch.load(dict_save_path, map_location="cuda:0"))
model_final = model
def evaluator(model):
loss_fn = nn.CrossEntropyLoss()
test_dataset = data_loading.Dataset_loading('D:/Dataset_all/weld_Dataset_unlabel/al5083/test/test.json')
dataloader = DataLoader(test_dataset, shuffle=True, batch_size=32, num_workers=1, pin_memory=True)
model.cuda()
model.eval()
all_data_num = 0
correct_data_num = 0
loss_all = []
corr_num_all = []
loss = 0
losses = 0
prefetcher = data_prefetcher(dataloader)
with torch.no_grad():
images, labels = prefetcher.next()
steps = 0
while images is not None:
steps += 1
images, labels = images.cuda(), labels.cuda()
output = model(images)
loss = loss_fn(output, labels)
losses += loss
loss_all.append(loss)
pred_labels = output.argmax(dim=1)
all_data_num += labels.size(0)
correct_data_num += (pred_labels == labels).sum().item()
corr_num_all.append(correct_data_num)
images, labels = prefetcher.next()
acc = (correct_data_num / all_data_num)
# print('评估结果:test_loss:', np.array(losses.cpu()), 'test_acc:{:.2f}'.format(acc), '%')
# loss_record_path = 'train_record/'+str(time.ctime())+'-'+str(epoch)+'-'+'loss'+'.txt'
# acc_record_path = 'train_record/' + str(time.ctime()) + '-' + str(epoch) + '-'+'acc'+'.txt'
return acc
def pruner_load(pruning_class,model):
global short_term_fine_tuner,evaluator,trainer,dummy_input,val_loader,fine_tuner
if pruning_class == 'level':
from nni.algorithms.compression.pytorch.pruning import LevelPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['default'] }]
pruner = LevelPruner(model, config_list)
elif pruning_class == 'slim':
from nni.algorithms.compression.pytorch.pruning import SlimPruner
config_list = [{'sparsity': 0.8, 'op_types': ['BatchNorm2d']}]
pruner = SlimPruner(model, config_list)
elif pruning_class == 'FPGM':
from nni.algorithms.compression.pytorch.pruning import FPGMPruner
config_list = [{
'sparsity': 0.5,
'op_types': ['Conv2d']}]
pruner = FPGMPruner(model, config_list)
elif pruning_class == 'L1':
from nni.algorithms.compression.pytorch.pruning import L1FilterPruner
config_list = [{'sparsity': 0.8, 'op_types': ['Conv2d']}]
pruner = L1FilterPruner(model, config_list)
elif pruning_class == 'L2':
from nni.algorithms.compression.pytorch.pruning import L2FilterPruner
config_list = [{'sparsity': 0.8, 'op_types': ['Conv2d']}]
pruner = L2FilterPruner(model, config_list)
elif pruning_class == 'AGP':
from nni.algorithms.compression.pytorch.pruning import AGPPruner
config_list = [{
'initial_sparsity': 0,
'final_sparsity': 0.8,
'start_epoch': 0,
'end_epoch': 10,
'frequency': 1,
'op_types': ['default']
}]
optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9, weight_decay=1e-4)
pruner = AGPPruner(model, config_list, optimizer, pruning_algorithm='level')
elif pruning_class == 'NetAdapt':
from nni.algorithms.compression.pytorch.pruning import NetAdaptPruner
config_list = [{
'sparsity': 0.5,
'op_types': ['Conv2d']
}]
pruner = NetAdaptPruner(model, config_list, short_term_fine_tuner=short_term_fine_tuner, evaluator=evaluator,
base_algo='l1', experiment_data_dir='./')
elif pruning_class == 'SimulatedAnnealing':
from nni.algorithms.compression.pytorch.pruning import SimulatedAnnealingPruner
config_list = [{
'sparsity': 0.8,
'op_types': ['Conv2d']
}]
pruner = SimulatedAnnealingPruner(model, config_list, evaluator=evaluator, base_algo='l1', cool_down_rate=0.9,
experiment_data_dir='./')
elif pruning_class == 'AutoCompress':
from nni.algorithms.compression.pytorch.pruning import AutoCompressPruner
config_list = [{
'sparsity': 0.5,
'op_types': ['Conv2d']
}]
pruner = AutoCompressPruner(
model, config_list, trainer=trainer, evaluator=evaluator,
dummy_input=dummy_input, num_iterations=3, optimize_mode='maximize', base_algo='l1',
cool_down_rate=0.9, admm_num_iterations=30, admm_training_epochs=5, experiment_data_dir='./')
elif pruning_class == 'AMC':
from nni.algorithms.compression.pytorch.pruning import AMCPruner
config_list = [{
'op_types': ['Conv2d', 'Linear']
}]
pruner = AMCPruner(model, config_list, evaluator, val_loader, flops_ratio=0.5)
elif pruning_class == 'ADMM':
from nni.algorithms.compression.pytorch.pruning import ADMMPruner
config_list = [{
'sparsity': 0.8,
'op_types': ['Conv2d'],
'op_names': ['conv1']
}, {
'sparsity': 0.92,
'op_types': ['Conv2d'],
'op_names': ['conv2']
}]
pruner = ADMMPruner(model, config_list, trainer=trainer, num_iterations=30, epochs=5)
elif pruning_class == 'Sensitivity':
from nni.algorithms.compression.pytorch.pruning import SensitivityPruner
config_list = [{
'sparsity': 0.5,
'op_types': ['Conv2d']
}]
pruner = SensitivityPruner(model, config_list, finetuner=fine_tuner, evaluator=evaluator)
# eval_args and finetune_args 分别是传给 evaluator 和 finetuner 的参数
#pruner.compress(eval_args=[model], finetune_args=[model])
else:
raise ValueError(
"Pruner not supported.")
return pruner
#pruner = pruner_load(pruning_class,model_final)
#model_process = pruner.compress()
model_process = model_final
eval_result = evaluator(model_process)
print(eval_result)
masks_file = 'model/' + model_name + pruning_class + "_mask.pth"
model_file = 'model/' + model_name + '_' + pruning_class + ".pth"
#pruner.export_model(model_path=model_final, mask_path=masks_file)
dummy_input = torch.randn([1, 1, 224, 224]).to('cuda')
m_speedup = ModelSpeedup(model_process, dummy_input, masks_file, 'cuda')
m_speedup.speedup_model()
eval_result = evaluator(model_process)
print(eval_result)
torch.save(model_process.state_dict(),'model/' + model_name + '_' +pruning_class + "_pruned.pth")
GitHub项目地址:
https://github.com/microsoft/nni/blob/master/docs/zh_CN/Compression/Pruner.rst#id47