😆Lecture 6-7
● Activation functions
○ Sigmoid, tanh, Relu
分别是什么
● Data preprocessing :make data more mutable for more effiecient nerual network training
zeromean;
Varience in each dimension is one
因为rgb的variance基本一样
PCA不help也不harm on conv
● Weight initialization
Q:若W=0;b=0
A: 则output0;且backpro 的gradient也是0-》无法update weight
A: no update for the weight
all the activations will be identical in each layer;;redundant nodes
若weight不用,可以做sematy breaking
Din=3
Dout=2
w->5W
○ Xavier Initialization, Kaiming
Varience collapes to 0
To fix this->
之前关于deep NN 结构:VGGnetwork(visual geomerty group created)
90 多layer,Xavier not good;Kaiming ok
● Batch normalization
可以zero-centering input;
但是其他layer呢,如何保证它们zero-center
//若with kaiming still hard to train the more comples deeper neural network
想要standardized the varience of each layer;;;因为想要防止graient collapse into 0
想要weight be updated at the roughly same rate
如果太hard of a constraint
-》
Goal 也会lear gama beta让final loss low
好处of Bath normalization:
可以scale;可以解决hard of constraint;可以recover the identity function
Test的时候用train时平均下来的 mean和std;;不再算一次
Bacth norm for convnets:
如果在fully connect中,中用comput the batch dimesion
由于在convolution networkd 中我们try to preserve the spatial dimension,我们还要do the spatial batch norm
BN帮助gradient not collapse into 0
(x+y)c?
grapgh/function? compute the gradient
Out respect to input
Do norm sperately for each group
Good use in: object detection
Eg weight for edges and color blocks
● Transfer learning
Why?
只有一些training date,又不能overfitting
个数& relate的程度
Last updated