深度学习|神经网络和深度学习-第2周 logistic-regression-as-a-neural-network

本博客停止更新,查看本文点击此处
Note This is my personal note at the 2nd week after studying the course neural-networks-deep-learning and the copyright belongs to deeplearning.ai.
01_logistic-regression-as-a-neural-network 01_binary-classification
Binary Classification In a binary classification problem, the result is a discrete value output. For example

  • account hacked (1) or compromised (0)
  • a tumor malign (1) or benign (0)
Example: Cat vs Non-Cat
The goal is to train a classifier that the input is an image represented by a feature vector, xxx, and predicts whether the corresponding label yyy is 1 or 0. In this case, whether this is a cat image (1) or a non-cat image (0).

An image is store in the computer in three separate matrices corresponding to the Red, Green, and Blue color channels of the image. The three matrices have the same size as the image, for example, the resolution of the cat image is 64 pixels X 64 pixels, the three matrices (RGB) are 64 X 64 each.
The value in a cell represents the pixel intensity which will be used to create a feature vector of ndimension. In pattern recognition and machine learning, a feature vector represents an object, in this case, a cat or no cat.
To create a feature vector, xxx, the pixel intensity values will be “unroll” or “reshape” for each color. The dimension of the input feature vector xxx is nx=64×64×3=12288.n_x = 64 \times 64 \times 3 = 12 288.nx?=64×64×3=12288.

notation
02_Logistic Regression
Logistic regression is a learning algorithm used in a supervised learning problem when the output yyy are all either zero or one. The goal of logistic regression is to minimize the error between its predictions and training data.
Example: Cat vs No - cat Given an image represented by a feature vector xxx, the algorithm will evaluate the probability of a cat being in that image.
Civen x,y^=P(y=1∣x),where 0≤y^≤1\text{Civen }x, \hat{y}=P(y=1|x), \text{where } 0 \le \hat{y} \le 1Civen x,y^?=P(y=1∣x),where 0≤y^?≤1
The parameters used in Logistic regression are:
? The input features vector: x∈Rnxx ∈ ?^{n_x}x∈Rnx?, where nxn_xnx? is the number of features
? The training label: y∈0,1y ∈ 0,1y∈0,1
? The weights: w∈Rnxw ∈ ?^{n_x}w∈Rnx? , where nxn_xnx? is the number of features
? The threshold: b∈Rb ∈ ?b∈R
? The output: y^=σ(wTx+b)\hat{y} = \sigma(w^Tx+b)y^?=σ(wTx+b)
? Sigmoid function: s=σ(wTx+b)=σ(z)=11+e?zs = \sigma(w^Tx+b) = \sigma(z)= \frac{1}{1+e^{-z}}s=σ(wTx+b)=σ(z)=1+e?z1?

(wTx+b)(w^Tx +b )(wTx+b) is a linear function (ax+b)(ax + b)(ax+b), but since we are looking for a probability constraint between [0,1], the sigmoid function is used. The function is bounded between [0,1] as shown in the graph above.
Some observations from the graph:
? If zzz is a large positive number, then σ(z)=1\sigma(z) = 1σ(z)=1
? If zzz is small or large negative number, then σ(z)=0\sigma(z) = 0σ(z)=0
? If zzz = 0, then σ(z)=0.5\sigma(z) = 0.5σ(z)=0.5
notation
03_logistic-regression-cost-function

04_gradient-descent

05_06_derivatives



07_computation-graph
【深度学习|神经网络和深度学习-第2周 logistic-regression-as-a-neural-network】You've heard me say that the computations of a neural network are organized in terms of a forward pass or a forward propagation step, in which we compute the output of the neural network, followed by a backward pass or back propagation step, which we use to compute gradients or compute derivatives. The computation graph explains why it is organized this way.



09_logistic-regression-gradient-descent
Welcome back. In this video, we'll talk about how to compute derivatives for you to implement gradient descent for logistic regression. The key takeaways will be what you need to implement. That is, the key equations you need in order to implement gradient descent for logistic regression. In this video, I want to do this computation using the computation graph. I have to admit, using the computation graph is a little bit of an overkill for deriving gradient descent for logistic regression, but I want to start explaining things this way to get you familiar with these ideas so that, hopefully, it will make a bit more sense when we talk about fully-fledged neural networks. To that, let's dive into gradient descent for logistic regression.

In logistic regression, what we want to do is to modify the parameters, W and B, in order to reduce this loss.

da=?L?a=?{?(ylog(a)+(1?y)log(1?a))}?a=?ya+1?y1?ada = \frac{\partial{L}}{\partial{a}} =\frac{\partial \left\{ {-(ylog(a)+(1-y)log(1-a))} \right\} }{\partial{a}} = -\frac{y}{a} + \frac{1-y}{1-a}da=?a?L?=?a?{?(ylog(a)+(1?y)log(1?a))}?=?ay?+1?a1?y?
dz=?L?z=?L?a??a?z=(?ya+1?y1?a)?a(1?a)=a?ydz=\frac{\partial{L}}{\partial{z}}=\frac{\partial{L}}{\partial{a}}\cdot \frac{\partial{a}}{\partial{z}} = \left(-\frac{y}{a} + \frac{1-y}{1-a}\right) \cdot a(1-a) = a - ydz=?z?L?=?a?L???z?a?=(?ay?+1?a1?y?)?a(1?a)=a?y
dw1=?L?w1=?L?z??z?w1=x1?dz=x1(a?y)dw_1=\frac{\partial{L}}{\partial{w_1}}=\frac{\partial{L}}{\partial{z}}\cdot \frac{\partial{z}}{\partial{w_1}} = x_1\cdot dz = x_1(a-y)dw1?=?w1??L?=?z?L???w1??z?=x1??dz=x1?(a?y)
dw2=?L?w2=?L?z??z?w2=x2?dz=x2(a?y)dw_2=\frac{\partial{L}}{\partial{w_2}}=\frac{\partial{L}}{\partial{z}}\cdot \frac{\partial{z}}{\partial{w_2}} = x_2\cdot dz = x_2(a-y)dw2?=?w2??L?=?z?L???w2??z?=x2??dz=x2?(a?y)
db=?L?b=?L?z??z?b=1?dz=a?ydb=\frac{\partial{L}}{\partial{b}}=\frac{\partial{L}}{\partial{z}}\cdot \frac{\partial{z}}{\partial{b}} = 1 \cdot dz = a - ydb=?b?L?=?z?L???b?z?=1?dz=a?y
w1:=w1?αdw1w_1 := w_1 - \alpha dw_1w1?:=w1??αdw1?
w2:=w2?αdw2w_2 := w_2 - \alpha dw_2w2?:=w2??αdw2?
b:=b?αdbb := b - \alpha dbb:=b?αdb
10_gradient-descent-on-m-examples
in a previous video you saw how to compute derivatives and implement gradient descent with respect to just one training example for religious regression now we want to do it for m training examples.
one single step gradient descent
02_python-and-vectorization 01_vectorization
Welcome back. Vectorization is basically the art of getting rid of explicit folders in your code. In the deep learning era safety in deep learning in practice, you often find yourself training on relatively large data sets, because that's when deep learning algorithms tend to shine. And so, it's important that your code very quickly because otherwise, if it's running on a big data set, your code might take a long time to run then you just find yourself waiting a very long time to get the result. So in the deep learning era, I think the ability to perform vectorization has become a key skill.


Yeah. Vectorize version 1.5 milliseconds seconds and the four loop. So 481 milliseconds, again, about 300 times slower to do the explicit four loop. If the engine x slows down, it's the difference between your code taking maybe one minute to run versus taking say five hours to run. And when you are implementing deep learning algorithms, you can really get a result back faster. It will be much faster if you vectorize your code. Some of you might have heard that a lot of scaleable deep learning implementations are done on a GPU or a graphics processing unit. But all the demos I did just now in the Jupiter notebook where actually on the CPU. And it turns out that both GPU and CPU have parallelization instructions. They're sometimes called SIMD instructions. This stands for a single instruction multiple data. But what this basically means is that, if you use built-in functions such as this np.function or other functions that don't require you explicitly implementing a for loop. It enables Phyton Pi to take much better advantage of parallelism to do your computations much faster. And this is true both computations on CPUs and computations on GPUs. It's just that GPUs are remarkably good at these SIMD calculations but CPU is actually also not too bad at that. Maybe just not as good as GPUs. You're seeing how vectorization can significantly speed up your code. The rule of thumb to remember is whenever possible, avoid using explicit four loops.
02_more-vectorization-examples



03_vectorizing-logistic-regression
We have talked about how vectorization lets you speed up your code significantly. In this video, we'll talk about how you can vectorize the implementation of logistic regression, so they can process an entire training set, that is implement a single elevation of grading descent with respect to an entire training set without using even a single explicit for loop. I'm super excited about this technique, and when we talk about neural networks later without using even a single explicit for loop.

Here are details of python broadcasting
04_vectorizing-logistic-regressions-gradient-output
In the previous video, you saw how you can use vectorization to compute their predictions. The lowercase a's for an entire training set O at the same time. In this video, you see how you can use vectorization to also perform the gradient computations for all M training samples. Again, all sort of at the same time. And then at the end of this video, we'll put it all together and show how you can derive a very efficient implementation of logistic regression.


05_broadcasting-in-python




Summary: Python or Numpy automatically expands two arrays or numbers to the same dimensions and operate element-wise.
06_a-note-on-python-numpy-vectors
The ability of python to allow you to use broadcasting operations and more generally, the great flexibility of the python numpy program language is, I think, both a strength as well as a weakness of the programming language. I think it's a strength because they create expressivity of the language. A great flexibility of the language lets you get a lot done even with just a single line of code. But there's also weakness because with broadcasting and this great amount of flexibility, sometimes it's possible you can introduce very subtle bugs or very strange looking bugs, if you're not familiar with all of the intricacies of how broadcasting and how features like broadcasting work. For example, if you take a column vector and add it to a row vector, you would expect it to throw up a dimension mismatch or type error or something. But you might actually get back a matrix as a sum of a row vector and a column vector. So there is an internal logic to these strange effects of Python. But if you're not familiar with Python, I've seen some students have very strange, very hard to find bugs. So what I want to do in this video is share with you some couple tips and tricks that have been very useful for me to eliminate or simplify and eliminate all the strange looking bugs in my own code. And I hope that with these tips and tricks, you'll also be able to much more easily write bug-free, python and numpy code.
To illustrate one of the less intuitive effects of Python-Numpy, especially how you construct vectors in Python-Numpy, let me do a quick demo.
one rank array
practical tips
07_quick-tour-of-jupyter-ipython-notebooks
With everything you've learned, you're just about ready to tackle your first programming assignment. Before you do that, let me just give you a quick tour of iPython notebooks in Coursera.
Please see the video to get details.
08_explanation-of-logistic-regression-cost-function-optional


    推荐阅读