Week 4

ClassC1W4
Created
Materialshttps://www.coursera.org/learn/neural-networks-deep-learning/home/week/4
Property
Reviewed
TypeSection

Deep Neural Network

Deep L-layer neural network

What is a deep neural network?

The image below shows the types of neural network, and we call a logistic regression a shallow neural network and a 5+ hidden layer is a deep neural network.

Deep neaural network notation

Below is an image illustrating the notation we use to describe the number of hidden layers denotated by nln^l, where nn is the number of units in layer ll and ll  is the number of layers.

The neural network below consists of 4 layers and 3 hidden layers, with 3 inputs and 1 output.

Forward Propagation in a Deep Network

The general forward prop calculation is denoted as:

z[l]=W[l]a[l1]+b[l]z^{[l]} = W^{[l]}a^{[l-1]} +b^{[l]}

a[l]=g[l](z[l])a^{[l]} = g{[l]}(z^{[l]}),

Where aa is the activations of the output zz and,

a[l]=y^a^{[l]} = \hat{y}

Vectorization

for l=1...4l=1...4

X=A[0]X = A^{[0]}

Z[1]=W[1]A[0]+b[1]Z^{[1]} = W^{[1]}A^{[0]} + b^{[1]}

A[1]=g[1](z[1])A^{[1]} = g^{[1]}(z^{[1]})

......

Y^=g[4](Z[4])=A[4]\hat{Y} = g^{[4]}(Z^{[4]}) = A^{[4]}

Our notation allows us to replace lowercase aa and zz with AA and ZZ and that will output the vectorized version. When implementing vectorization you will need an explicit for loop, currently there isn't any way around it.

When working with Deep Neural Networks one should always take note of the shape of matrix they are working with.

Getting your matrix dimensions right

The general formula to check is that when youre implementing the matrix for layer LL, that the dimension of that matrix be:

w[l]:w^{[l]}:  (n[l],n[l1])(n^{[l]}, n^{[l-1]})

b[l]:b^{[l]}:  (n[l],1)(n^{[l]}, 1)

Therefore:

a[l]=g[l](z[l])a^{[l]}=g^{[l]}(z^{[l]}) = y^\hat{y}

Note that "a" and "z have dimensions (n[l],1)(n^{[l]},1)

In general, the number of neurons in the previous layer gives us the number of columns of the weight matrix, and the number of neurons in the current layer gives us the number of rows in the weight matrix.

Vectorized implementation

Through Python broadcasting the dimensions for bb will be broadcasted thus instead of (n[l],1)(n^{[l]}, 1) it will be (n[l],m)(n^{[l]}, m) , and W[l]W^{[l]} will be horizontal matrix and XX a vertical matrix which will result in Z[l]Z^{[l]} being a horizontal matrix.

Why deep representations?

Why deep neural network work well as compared to something else.

Building blocks of deep neural networks

A key takeaway is that, when building a basic building blocks for implementing a deep neural network, in each layer there's a forward propagation step and there's a corresponding backpropagation step as well as a cache (z[l1])(z^{[l-1]}) to pass the information from one layer to another.

Forward and Backward Propagation

One thing to note is that when computing the output for a forward propagation you'd need an input and cache (z[l],w[l],b[l])(z^{[l]}, w^{[l]}, b^{[l]}).

When computing the output for a backward propagation you'd need an input da[l]da^{[l]}.

Note: dw[l]=dz[l]a[l1]Tdw^{[l]} = dz^{[l]} a^{[l-1]T}

Summary

Suppose you have a 3-layer neural network, which gets initialised with random values of XX this would compute the value of Y^\hat{Y} which you then compute the loss function L(Y^,Y)L(\hat{Y}, Y), while at the same time caching all the values of Z[l1]Z^{[l-1]}.

Note that the backward propagation is also initialised to certain values and this is the values of da[l]da^{[l]}. There is no need to calculate the value of da[0]da^{[0]} as theres no point in computing the value of the initialisers, hence why it is scratched off in the image.

Parameters vs Hyperparameters

Hyperparameters tell how your learning algorithm resolves such as:

These hyperparameters determine the final value of ww and bb.

ALWAYS REMEMBER:

The difference between np.random.rand and np.random.randn

See explaination here: https://stackoverflow.com/a/47241066 Graphical explaination: https://stackoverflow.com/a/56829859

What does this have to do with the brain?

Q & A

  1. What is the "cache" used for in our implementation of forward propagation and backward propagation?
    • We use it to pass variables computed during forward propagation to the corresponding backward propagation step. It contains useful values for backward propagation to compute derivatives.

      Justification: Correct, the "cache" records values from the forward propagation units and sends it to the backward propagation units because it is needed to compute the chain rule derivatives.

  1. Among the following, which ones are "hyperparameters"?
    • number of layers LL in the neural network
    • number of iterations
    • learning rate α\alpha
    • size of the hidden layers n[l]n^{[l]}
  1. Which of the following statements is true?
    • The deeper layers of a neural network are typically computing more complex features of the input than the earlier layers.
  1. Vectorization allows you to compute forward propagation in an L-layer neural network without an explicit for-loop (or any other explicit iterative loop) over the layers l=1, 2, …,L. False
  1. Assume we store the values for n[l]n^{[l]}in an array called layer_dims, as follows: layer_dims=[nx,4,3,2,1]layer\_dims = [n_x, 4,3,2,1]. So layer 1 has four hidden units, layer 2 has 3 hidden units and so on. Which of the following for-loops will allow you to initialize the parameters for the model?
    for i in range(1, len(layer_dims)):
        parameter['W' + str(i)] = np.random.randn(layer_dims[i], layer_dims[i-1]) * 0.01
        parameter['b' + str(i)] = np.random.randn(layer_dims[i], 1) * 0.01

6. Consider the following neural network.

How many layers does this network have?

7. During forward propagation, in the forward function for a layer lll you need to know what is the activation function in a layer (Sigmoid, tanh, ReLU, etc.). During backpropagation, the corresponding backward function also needs to know what is the activation function for layer lll, since the gradient depends on it. True

Justification: Yes, as you've seen in the week 3 each activation has a different derivative. Thus, during backpropagation you need to know which activation was used in the forward propagation to be able to compute the correct derivative.

8. There are certain functions with the following properties: True

9. Consider the following 2 hidden layer neural network:

Which of the following statements are True? (Check all that apply).

10. Whereas the previous question used a specific network, in the general case what is the dimension of W[l],W^{[l]}, the weight matrix associated with layer-LL?