深度学习|CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning 深度学习|CS231n学习笔记

Agenda

文章图片

Hardware 101: the Family

文章图片

Hardware 101: Number Representation

文章图片

Hardware 101: Number Representation

文章图片

1. Algorithms for Efficient Inference 1.1 Pruning Neural Networks

文章图片

文章图片

Iteratively Retrain to Recover Accuracy

文章图片

Pruning RNN and LSTM

文章图片

pruning之后准确率有所提升：

文章图片

【深度学习|CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning】Pruning Changes Weight Distribution

文章图片

1.2 Weight Sharing
Trained Quantization

文章图片

文章图片

How Many Bits do We Need?

文章图片

Pruning + Trained Quantization Work Together

文章图片

Huffman Coding

文章图片

Summary of Deep Compression

文章图片

Results: Compression Ratio

文章图片

SqueezeNet

文章图片

Compressing SqueezeNet

文章图片

1.3 Quantization
Quantizing the Weight and Activation

文章图片
**Quantization Result**：选择8bit
文章图片
1.4 Low Rank Approximation
Low Rank Approximation for Conv：类似Inception Module

文章图片

Low Rank Approximation for FC :矩阵分解

文章图片

1.5 Binary / Ternary Net
Trained Ternary（三元） Quantization

文章图片

Weight Evolution during Training

文章图片

Error Rate on ImageNet

文章图片

1.6 Winograd Transformation
3x3 DIRECT Convolutions

文章图片

Direct convolution: we need 9xCx4 = 36xC FMAs for 4 outputs
3x3 WINOGRAD Convolutions：
Transform Data to Reduce Math Intensity

文章图片

Direct convolution: we need 9xCx4 = 36xC FMAs for 4 outputs
Winograd convolution: we need 16xC FMAs for 4 outputs: 2.25x fewer FMAs
2. Hardware for Efficient Inference Hardware for Efficient Inference：
a common goal: minimize memory access

文章图片

Google TPU

文章图片

文章图片

Roofline Model: Identify Performance Bottleneck

文章图片

Log Rooflines for CPU, GPU, TPU

文章图片

EIE: the First DNN Accelerator for Sparse, Compressed Model：
不保存、计算0值

文章图片

文章图片

EIE Architecture

文章图片

Micro Architecture for each PE

文章图片

Comparison: Throughput

文章图片

Comparison: Energy Efficiency

文章图片

3. Algorithms for Efficient Training 3.1 Parallelization
Data Parallel – Run multiple inputs in parallel

文章图片

Parameter Update
参数共享更新

文章图片

Model-Parallel Convolution – by output region (x,y)

文章图片

Model Parallel Fully-Connected Layer (M x V)

文章图片

文章图片

Summary of Parallelism

文章图片

3.2 Mixed Precision with FP16 and FP32

文章图片

Mixed Precision Training

文章图片

结果对比：

文章图片

3.3 Model Distillation
student model has much smaller model size

文章图片

Softened outputs reveal the dark knowledge

文章图片

Softened outputs reveal the dark knowledge

文章图片

3.4 DSD: Dense-Sparse-Dense Training

文章图片

DSD produces same model architecture but can find better optimization solution, arrives at better local minima, and achieves higher prediction accuracy across a wide range of deep neural networks on CNNs / RNNs / LSTMs.
DSD: Intuition

文章图片

DSD is General Purpose: Vision, Speech, Natural Language

文章图片

DSD on Caption Generation

文章图片

4. Hardware for Efficient Training GPU / TPU

文章图片

Google Cloud TPU

文章图片

Future

文章图片

Outlook: the Focus for Computation

文章图片

深度学习|CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning

推荐阅读

住宅入门的四大风水禁忌住宅入门的四大风水禁忌介绍

运行出现庆东锅炉故障码17怎么解决？

怎么查自己的支付密码支付宝忘记支付密码怎么办

电脑热键怎么重新设置win10

情人节要给男朋友送礼物么

pr编辑标记名称方法介绍

六款补血药膳助你赶走贫血

苹果xr相机怎么设置更清晰苹果xr相机设置更清晰的操作步骤

我家千金成长记|我家千金成长记||那些让我无言以对的话

草图大师设计自定义贴图的操作过程视频草图大师设计自定义贴图的操作过程

SAP|SAP UI5 应用在 Business Application Studio 里的构建单步分析

韶关：查询提取自助办理！公积金事项上线“粤智助”

在线教育凛冬将至，一份面试指南分享给大家

iphone 12|iPhone 13今晚发布，信息全部曝光，懂行人：还是iPhone 12香

爱普生5790改废墨

电影《囧妈》哪些地方最打动你？

南京劳动法辞退员工的补偿标准2022年是多少

绯红之王什么梗

人生格言的句子

wireshark分析tcp报文,用wireshark分析ip报文