😃Lecture 4

● Optimization

Optimization is the process of finding the best parameters for a model in order to minimize the loss function. The goal of optimization is to find the set of parameters that produce the lowest loss value on the training data. Optimization algorithms use gradients of the loss function with respect to the model parameters to update them in the direction that reduces the loss.

Q: What is the purpose of optimization in machine learning?

A: Optimization is used to find the best set of model parameters that minimize the loss function on the training data. This is important because the model's performance on the validation and test data will be directly influenced by the quality of the learned parameters.

Q: How do optimization algorithms use gradients to update the model parameters?

A: Optimization algorithms use the gradients of the loss function with respect to the model parameters to update them in the direction that reduces the loss. The size and direction of the update are determined by the learning rate and the magnitude of the gradient.

○ Numeric vs analytic gradient

Numeric gradients are computed by approximating the gradient using small perturbations to the parameters.

Analytic gradients are computed using the exact derivative of the loss function with respect to the parameters.

Analytic gradients are generally faster and more accurate, but require the computation of derivatives, which can be complex for some models.

Blue是low loss

○ SGD（Stochastic Gradient Descent，随机梯度下降）

Sample randomly （subset）

经过一次training set 叫an“epoch”

不放回（保证经历所有data

Batch 越大，more smooth the loss will decrease

○ Feature transforms

问题of linear classifier：

解决方案⬇️：feature transforms

Feature transforms, also known as feature engineering, refer to the process of transforming the original input features of a problem into a new set of features that are more informative or useful for a particular optimization task.

In optimization, feature transforms are often used to improve the performance of machine learning models by making the input data more relevant to the problem being solved. This can include techniques such as scaling, normalization, one-hot encoding, and creating new features by combining existing ones.

For example, in linear regression, feature transforms can be used to transform the input data to a form that is more suitable for linear modeling. This may involve transforming non-linear features into linear ones by applying mathematical functions such as logarithmic or exponential transformations.

In summary, feature transforms in optimization refer to the process of manipulating input features in order to improve the performance of machine learning models for a specific optimization task.

Eg. color histogram

（image pixel；how many xx color

Eg。Histogram of Oriented Gradient（HoG）

（lec4 ppt43；47:22）

● Neural networks

Neural networks are a type of machine learning model that is inspired by the structure of the human brain. They consist of layers of interconnected nodes that perform computations on the input data. Each node applies an activation function to a weighted sum of the input values and passes the result to the next layer. Neural networks can be used for a variety of tasks, including image and speech recognition, natural language processing, and game playing.

Q: What is a neural network and how does it work?

A neural network is a type of machine-learning model that is inspired by the structure of the human brain. It consists of layers of interconnected nodes that perform computations on the input data. Each node applies an activation function to a weighted sum of the input values and passes the result to the next layer. Neural networks can be trained using backpropagation and optimization algorithms to learn the best set of weights for the given task.

Q: What are some common activation functions used in neural networks?

Some common activation functions used in neural networks include sigmoid, tanh, ReLU, and softmax. These functions are used to introduce nonlinearity into the model, which is important for capturing complex patterns in the data.

Q: How can neural networks be used for image classification?

Neural networks can be used for image classification by treating the pixels of an image as input features and passing them through a series of convolutional and pooling layers to extract meaningful features. These features are then flattened and passed through one or more fully connected layers before producing a classification output.

多个linear classifier 堆叠

Linear classifier不能处理不同mode

100 template（中间neurons

Each row can be flatted into a template image

激活函数重要，处理non-linear；否则NN end up with a linear classifier again！

● Computational graphs

Computational graphs are a way to represent the flow of data and computations in a machine learning model. They consist of nodes that represent mathematical operations and edges that represent the flow of data between the nodes. Computational graphs are used to perform automatic differentiation, which is the process of computing the gradients of the loss function with respect to the model parameters.

一种date structure，便于compure gradient in a modular form

Sample Question：

What is a computational graph and why is it useful in machine learning?

A computational graph is a way to represent the flow of data and computations in a machine learning model. It is useful because it allows for automatic differentiation, which is the process of computing the gradients of the loss function with respect to the model parameters. By using a computational graph, the gradients can be computed more efficiently and accurately than by hand.

How are gradients computed using a computational graph?

Gradients are computed using the chain rule of calculus applied to the nodes in the computational graph. The gradient of the loss function with respect to each node is computed recursively by multiplying the gradients of the nodes that depend on it.

What is automatic differentiation and how does it relate to computational graphs?

Automatic differentiation is the process of computing the gradients of a function with respect to its inputs using the chain rule of calculus. It relates to computational graphs because a computational graph can be used to represent the function and its dependencies, making it possible to compute the gradients efficiently.

SLIDE

算复杂的结构的gradient麻烦

这个图叫computation graph；

我们用这个图，加上（backprogation）这个technique来计算复杂的gradient！

● Backpropagation

Backpropagation is an algorithm used to compute the gradients of the loss function with respect to the model parameters in a neural network. It works by propagating the error backwards through the network, using the chain rule of calculus to compute the gradients at each layer. Backpropagation is used in conjunction with optimization algorithms to update the model parameters in the direction that reduces the loss.

Sample question：

What is backpropagation and how does it work?
How is the chain rule of calculus used in backpropagation?
What is the relationship between backpropagation and optimization algorithms?

What is backpropagation and how does it work?

Backpropagation is an algorithm used to compute the gradients of the loss function with respect to the model parameters in a neural network. It works by propagating the error backwards through the network, using the chain rule of calculus to compute the gradients at each layer. The gradients are then used to update the weights using an optimization algorithm.

How is the chain rule of calculus used in backpropagation?

The chain rule of calculus is used to compute the gradients of the loss function with respect to the model parameters in a neural network. It is applied recursively by computing the gradients of the output layer first, and then propagating the gradients backwards through the network, using the chain rule to compute the gradients at each layer. Specifically, for each layer, the gradients with respect to the output of the layer are computed by multiplying the gradients from the layer above it with the local gradients of the layer's activation function. These gradients are then used to compute the gradients with respect to the layer's weights and biases.

What is the relationship between backpropagation and optimization algorithms?

Backpropagation is used to compute the gradients of the loss function with respect to the model parameters in a neural network, while optimization algorithms are used to update the parameters based on the gradients. Backpropagation computes the gradients efficiently by using the chain rule of calculus and propagating the error backwards through the network, while optimization algorithms update the weights in the direction that reduces the loss. Together, backpropagation and optimization form the backbone of training neural networks.

SLIDE NOTES:

PreviousLecture 2-3 NextLecture 5

Last updated 2 years ago

hashtag● Optimization

hashtag○ Numeric vs analytic gradient

hashtag○ SGD（Stochastic Gradient Descent，随机梯度下降）

hashtag○ Feature transforms

hashtag● Neural networks

hashtag● Computational graphs

hashtag● Backpropagation