深度学习|【深度学习】经典网络-VGG复现(使用Tensorflow实现) cifar10|tensorflow|图像分类|VGG

论文地址:https://arxiv.org/abs/1409.1556
本文所包含代码GitHub地址：https://github.com/shankezh/DL_HotNet_Tensorflow
如果对机器学习有兴趣，不仅仅满足将深度学习模型当黑盒模型使用的，想了解为何机器学习可以训练拟合最佳模型，可以看我过往的博客，使用数学知识推导了机器学习中比较经典的案例，并且使用了python撸了一套简单的神经网络的代码框架用来加深理解：https://blog.csdn.net/shankezh/article/category/7279585
项目帮忙或工作机会请邮件联系：cloud_happy@163.com
数据集下载地址：https://www.kaggle.com/c/cifar-10 or http://www.cs.toronto.edu/~kriz/cifar.html
论文精华关键信息提取 1.介绍了之前增加网络正确率的方法，一种是利用更小的感受窗口和更短的步长在第一层卷积层，另一种方法是对整幅图像进行多尺度变换然后进行密集训练（说白了就是第二种就是一种等同于数据扩充手法）;
2.介绍了本文发现的新的提高网络正确率的关键因素----增加卷积层深度，并且全部使用了3x3的kernel;
【深度学习|【深度学习】经典网络-VGG复现(使用Tensorflow实现)】3.卷积网络配置的细节在第二章节;
4.图像分类训练与评估的细节在第三章节;
5.训练前的预处理仅使减去了训练数据集图像RGB通道的均值;
6.网络不包含LRN;
7.LRN不能提升正确率，却会增加内存消耗和计算时间;
8.论文写了网络有A-E五种，其中A只有11层(8 conv+ 3 fc)，E有19层(16层卷积,3层全连接);
9.卷积通道数量尽可能少，开始的第一层只有64通道，每过一层最大池化层，那么通道数就会乘以2，直到到达512通道;
10.堆叠两个3x3相当于一个5x5,堆叠三个3x3相当于一个7x7效果;
11.减少了计算参数，建设卷积层为C通道，那么使用三个堆叠3x3的计算参数为3(3x3xC) = 27C,而7x7的卷积参数为7x7xC=49C
12.作者团队使用了caffe构建；
13.同样深度的配置C表现上不如配置D，因为配置C中含有1x1,而D中只有3x3;
14.证明了更深的网络有助于提高分类正确率,深度很重要;
15.感谢Nvidia～
16.池化方法为最大池化,2x2窗口,步长为2,论文中提到有五个最大池化层,跟在卷积层后面,但并不是所有卷积层后面都跟池化;

训练细节 1.使用了mini-batch方式;
2.使用了momentum优化器,冲量0.9,batch_size256;
3.训练通过权重衰退做到正则化,L2惩罚乘子设置为 0.0005;
4.前两个全连接有dropout,系数为0.5;
5.学习速率初始化为0.01,当正确率不再提升的时候，则10倍减少,论文中学习速率总共减少了3次;
6.作者团队先训练了配置A（随机初始化）,后来训练更深的配置结构时候,其第一层四卷积层时和后三层全连接,使用了训练好的网络A的参数作为初始化，其余层都是随机初始化,而且并没有因为预初始化了更深的网络层就改变了初始的学习速率(因此初始学习速率还是0.01)，只是在学习过程中会改变学习速率;
7.对于随机初始化问题,我们使用了正态分布,设置了均值0,变量为0.01,偏置全部初始化为0;
8.后来提交论文的时候发现了根据2010年的一篇论文，可以不使用预训练结果初始化权重，直接使用随机初始化就可以;
9.为了增加数据集，进行了随机水平翻转和随机RGB偏移,网络输入为224x224,顺便介绍了对于训练数据的扩充方法，需剪裁图片尺寸固定为224x224，原则上被剪裁的图片最短边不能小于224，对于224的图片，剪裁就是本身原图，对于大于224尺寸的图片，剪裁责剪裁其中包含对象或一部分对象的图片;
10.作者的经验论,定义了两种比方法设置训练比例S,第一种,固定S,评估下来,固定选择S=256和S=384,先用S=256的情况下去训练网络,然后再用S=256的预训练权重初始化S=384的网络，调整更小的学习速率--0.001; (个人觉得描述了作者先将数据集合A图片全部改成256x256,然后从图片中随机剪裁224x224,扩充为数据B,然后训练,训练结束后,再将数据集A改成384x384,然后进行随机剪裁224x224,扩充为数据集合C,在C上进行训练,训练C的时候,用了训练好B的权重进行初始化,同时改了学习速率);
11.第二种是设置多个S比例进行训练,设置S范围[Smin, Smax](作者使用了 Smin = 256,Smax = 512),随机改变训练图像比例来扩充数据集,作者认为这效果非常好，但因为速度缘故，他们用了第一种方法S=384来预训练模型，并不断的利用第二种方法fine-tune模型;

模型结构图
文章图片

Tensorflow代码实现说明 1.代码总体上而言遵循了论文的标准进行了复现，会有一些细微差别（比如说初始化的偏置没有遵循论文，batch size和论文不同，没有对数据集进行数据扩充）
2.训练数据使用了cifar-10的数据;
3.训练速率的改变，我是通过模型重载进行更改的，而非动态调整
4.输入没有遵循224x224，我是直接使用原图32x32进行训练，其实可以在制作tfrecords时对素材进行放大的；
5.补充：训练时因为疏忽，搭建结构的时候没有使用dropout，因此训练的时候，会一直盯住误差和正确率，避免过拟合趋势，一旦发现过拟合，立即会停止并载入之前的模型进行在训练；
代码模型
VGGNet.py

def VGG16_slim(inputs,num_cls,vgg_mean,keep_prob=0.5): net = inputs# net = tf.cast(net, tf.float32) / 255.# 转换数据类型并归一化# with tf.name_scope('reshpae'): #net = tf.reshape(net,[-1,224,224,3]) with tf.variable_scope('vgg_net'): with slim.arg_scope([slim.conv2d, slim.fully_connected], weights_initializer=slim.xavier_initializer(), # weights_initializer=tf.truncated_normal_initializer(0.0, 0.01), # weights_regularizer=slim.l2_regularizer(0.0005), biases_initializer=tf.constant_initializer(0.0) ): with slim.arg_scope([slim.conv2d, slim.max_pool2d], padding='same', stride=1): net = slim.repeat(net,2,slim.conv2d,64,[3,3],scope='conv1') net = slim.max_pool2d(net,[2,2],stride=2,scope='maxpool1')net = slim.repeat(net,2,slim.conv2d,128,[3,3],scope='conv2') net = slim.max_pool2d(net,[2,2],stride=2,scope='maxpool2')net = slim.repeat(net,3,slim.conv2d,256,[3,3],scope='conv3') net = slim.max_pool2d(net,[2,2],stride=2,scope='maxpool3')net = slim.repeat(net,3,slim.conv2d,512,[3,3],scope='conv4') net = slim.max_pool2d(net,[2,2],stride=2,scope='maxpool4')net = slim.repeat(net,3,slim.conv2d,512,[3,3],scope='conv5') net = slim.max_pool2d(net,[2,2],stride=2,scope='maxpool5')net = slim.flatten(net,scope='flatten')net = slim.stack(net,slim.fully_connected,[1024,1024,num_cls],scope='fc') net = slim.softmax(net,scope='softmax') return net

训练
VGG_Cifar10.py

def run(): model_dir = '' logdir = '' img_prob = [32, 32, 3] num_cls = 10 is_train = False is_load_model = True is_stop_test_eval = False BATCH_SIZE = 100 EPOCH_NUM = 50 ITER_NUM = 500# 50000 / 100 LEARNING_RATE_VAL = 0.001 if utils.isLinuxSys(): logdir = r'' model_dir = r'' else: model_dir = r'D:\DataSets\cifar\cifar\model_flie\vgg16' logdir = r'D:\DataSets\cifar\cifar\logs\train'train_img_batch, train_label_batch = pre_pro.get_cifar10_batch(is_train = True, batch_size=BATCH_SIZE, num_cls=num_cls) test_img_batch, test_label_batch = pre_pro.get_cifar10_batch(is_train=False,batch_size=BATCH_SIZE,num_cls=num_cls)inputs = tf.placeholder(tf.float32,[None, img_prob[0], img_prob[1], img_prob[2]]) labels = tf.placeholder(tf.float32,[None, num_cls]) LEARNING_RATE = tf.placeholder(tf.float32)logits = VggNet.VGG16_slim(inputs,num_cls,0)train_loss = coms.loss(logits,labels) train_optim = coms.optimizer(lr=LEARNING_RATE,loss=train_loss,fun='mm') train_eval = coms.evaluation(logits,labels)saver = tf.train.Saver(max_to_keep=4) max_acc = 0.config = tf.ConfigProto(allow_soft_placement=True) with tf.Session(config=config) as sess: if utils.isHasGpu(): dev = '/gpu:0' else: dev = '/cpu:0' with tf.device(dev): sess.run(tf.global_variables_initializer()) coord = tf.train.Coordinator() threads = tf.train.start_queue_runners(sess= sess, coord=coord)try: if is_train: if is_load_model: ckpt = tf.train.get_checkpoint_state(model_dir) if ckpt and ckpt.model_checkpoint_path: saver.restore(sess,ckpt.model_checkpoint_path) print('model load successful ...') else: print('model load failed ...') return n_time = time.strftime("%Y-%m-%d %H-%M", time.localtime()) logdir = os.path.join(logdir, n_time) writer = tf.summary.FileWriter(logdir, sess.graph)for epoch in range(EPOCH_NUM): if coord.should_stop(): print('coord should stop ...') break for step in range(1,ITER_NUM+1): if coord.should_stop(): print('coord should stop ...') breakbatch_train_img, batch_train_label = sess.run([train_img_batch,train_label_batch])_, batch_train_loss, batch_train_acc = sess.run([train_optim,train_loss,train_eval],feed_dict={inputs:batch_train_img,labels:batch_train_label,LEARNING_RATE:LEARNING_RATE_VAL}) global_step = int(epoch * ITER_NUM + step + 1)print("epoch %d , step %d train end ,loss is : %f ,accuracy is %f ... ..." % (epoch, step, batch_train_loss, batch_train_acc))train_summary = tf.Summary(value=https://www.it610.com/article/[tf.Summary.Value(tag='train_loss',simple_value=https://www.it610.com/article/batch_train_loss) ,tf.Summary.Value(tag='train_batch_accuracy',simple_value=https://www.it610.com/article/batch_train_acc)]) writer.add_summary(train_summary,global_step)writer.flush()if is_stop_test_eval: if not is_load_model: if epoch == 0 and step < (ITER_NUM / 20): continueif step % 200 == 0: print('test sets evaluation start ...') ac_iter = int(10000/BATCH_SIZE) # cifar-10测试集数量10000张 ac_sum = 0. loss_sum = 0. for ac_count in range(ac_iter): batch_test_img, batch_test_label = sess.run([test_img_batch,test_label_batch]) test_loss, test_accuracy = sess.run([train_loss,train_eval],feed_dict={inputs:batch_test_img,labels:batch_test_label}) ac_sum += test_accuracy loss_sum += test_loss ac_mean = ac_sum / ac_iter loss_mean = loss_sum / ac_iter print('epoch {} , step {} , accuracy is {}'.format(str(epoch),str(step),str(ac_mean))) test_summary = tf.Summary( value=https://www.it610.com/article/[tf.Summary.Value(tag='test_loss', simple_value=https://www.it610.com/article/loss_mean) , tf.Summary.Value(tag='test_accuracy', simple_value=https://www.it610.com/article/ac_mean)]) writer.add_summary(test_summary,global_step=global_step) writer.flush()if ac_mean>= max_acc: max_acc = ac_mean saver.save(sess, model_dir + '/' + 'cifar10_{}_step_{}.ckpt'.format(str(epoch),str(step)),global_step=step) print('max accuracy has reaching ,save model successful ...') print('saving last model ...') saver.save(sess, model_dir + '/' + 'cifar10_last.ckpt') print('train network task was run over') else: model_file = tf.train.latest_checkpoint(model_dir) saver.restore(sess, model_file) cls_list = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'] for i in range(1,11): name = str(i) + '.jpg' img = cv2.imread(name) img = cv2.resize(img,(32,32)) img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # img = img / 255. img = np.array([img]) res = sess.run(logits, feed_dict={inputs:img})# print(res) print('{}.jpg detect result is : '.format(str(i)) + cls_list[np.argmax(res)] )except tf.errors.OutOfRangeError: print('done training -- opoch files run out of ...') finally: coord.request_stop()coord.join(threads) sess.close()

结果利用碎片时间训练了一下结果，没有做数据扩充，硬件使用GTX1070Ti,测试集最高为78.5%，如果做数据扩充的话，我相信会高于这个数字；同时，为了使我的网络更快收敛，我对图片做了标准化处理，没有resize图片到224x224，而是直接使用了32x32的原尺寸，为了减少参数量和提升训练速度，我将VGG的全连接参数由4096变成了1024。训练结果可视化如下,分了三次训练,因此三个日志，训练误差:

文章图片

文章图片

测试误差：

文章图片

文章图片

正确率：

文章图片

文章图片

loss最终从起始的2.35xxx 变化到1.4xxx左右,接下来,从百度上下载了10张图片，用来验证自己的模型分类能力，百度下的图片，不是从训练集或者测试集中提取的！样图如下：

文章图片

可以看到，cifar10的类别 ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'] 分别对应了编号1-10;
载入模型进行分类，结果如下：

文章图片

可以看到，只有第2张和第6张没有正确分类，其他全对，相信如果对数据进行扩充，必定会使得分类上升一个数量级。
至此，VGG论文复现基本完成，权重文件后续上传;
权重文件，百度云地址为链接：链接：https://pan.baidu.com/s/1BdMZYvkiYT9Fts0dLIgrog
提取码：0rmi