😆Lecture 6-7

● Activation functions

○ Sigmoid, tanh, Relu

分别是什么

● Data preprocessing ：make data more mutable for more effiecient nerual network training

zeromean；

Varience in each dimension is one

因为rgb的variance基本一样

PCA不help也不harm on conv

● Weight initialization

Q：若W=0；b=0

A：则output0；且backpro 的gradient也是0-》无法update weight

A： no update for the weight

all the activations will be identical in each layer;;redundant nodes

若weight不用，可以做sematy breaking

Din=3

Dout=2

w->5W

○ Xavier Initialization, Kaiming

Varience collapes to 0

To fix this->

之前关于deep NN 结构：VGGnetwork（visual geomerty group created）

90 多layer，Xavier not good；Kaiming ok

● Batch normalization

可以zero-centering input；

但是其他layer呢，如何保证它们zero-center

//若with kaiming still hard to train the more comples deeper neural network

想要standardized the varience of each layer；；；因为想要防止graient collapse into 0

想要weight be updated at the roughly same rate

如果太hard of a constraint

-》

Goal 也会lear gama beta让final loss low

好处of Bath normalization：

可以scale；可以解决hard of constraint；可以recover the identity function

Test的时候用train时平均下来的 mean和std；；不再算一次

Bacth norm for convnets：
- 如果在fully connect中，中用comput the batch dimesion
- 由于在convolution networkd 中我们try to preserve the spatial dimension，我们还要do the spatial batch norm

BN帮助gradient not collapse into 0

（x+y）c？

grapgh/function？ compute the gradient

Out respect to input

Do norm sperately for each group

Good use in： object detection

Eg weight for edges and color blocks

● Transfer learning

Why？

只有一些training date，又不能overfitting

个数& relate的程度

PreviousLecture 5 NextLecture 8-9

Last updated 2 years ago