😆Lecture 6-7

● Activation functions

○ Sigmoid, tanh, Relu

分别是什么

● Data preprocessing :make data more mutable for more effiecient nerual network training

zeromean;

Varience in each dimension is one

因为rgb的variance基本一样

PCA不help也不harm on conv

● Weight initialization

Q:若W=0;b=0

A: 则output0;且backpro 的gradient也是0-》无法update weight

A: no update for the weight

all the activations will be identical in each layer;;redundant nodes

若weight不用,可以做sematy breaking

Din=3

Dout=2

w->5W

○ Xavier Initialization, Kaiming

Varience collapes to 0

To fix this->

之前关于deep NN 结构:VGGnetwork(visual geomerty group created)

90 多layer,Xavier not good;Kaiming ok

● Batch normalization

可以zero-centering input;

但是其他layer呢,如何保证它们zero-center

//若with kaiming still hard to train the more comples deeper neural network

想要standardized the varience of each layer;;;因为想要防止graient collapse into 0

想要weight be updated at the roughly same rate

如果太hard of a constraint

-》

Goal 也会lear gama beta让final loss low

  • 好处of Bath normalization:

可以scale;可以解决hard of constraint;可以recover the identity function

Test的时候用train时平均下来的 mean和std;;不再算一次

  • Bacth norm for convnets:

    • 如果在fully connect中,中用comput the batch dimesion

    • 由于在convolution networkd 中我们try to preserve the spatial dimension,我们还要do the spatial batch norm

  • BN帮助gradient not collapse into 0

(x+y)c?

grapgh/function? compute the gradient

Out respect to input

Do norm sperately for each group

Good use in: object detection

Eg weight for edges and color blocks

● Transfer learning

Why?

只有一些training date,又不能overfitting

个数& relate的程度

Last updated