Deep Learning for Everybody?

Return to site

Deep Learning for Everybody?

A Review of Andrew Ng’s New Course

I recently finished the first course in Andrew Ng’s new Deep Learning Specialization on Coursera. The course is called Neural Networks and Deep Learning and walks you through the mechanics of neural networks by having you implement them in Python. It’s part of his ambition to bring deep learning to a wider audience and is a follow up to his popular Machine Learning course on Coursera. People who have previously gone through his neural networks unit in that course will find a lot of overlap (although no more Octave!).

I’ve used MOOCs like Coursera, Udacity and MITx extensively in the past on my way toward becoming a data scientist, but this was the first course I decided to pay for. I had a number of reasons. Unlike with other courses I’ve done in the past, you could only get access to the assignments by paying. In addition, I was curious Coursera’s “specialization” offering and wanted to give it a trial to see if it might be something worth investing in in the future. Finally I wanted to have the satisfaction of receiving some kind of official certification, which you receive upon completion of the course. I’ve learned about the basics of neural networks before through a mix of classes and online tutorials, but I didn’t have anything to show for it. Getting the certificate, I thought, might be a decent way to prove I have some knowledge of the subject, although I think the jury is still out about how much value potential employers give to these kind of certifications.

My overall review of this first course in the specialization is mixed. As always, Andrew Ng does an excellent job teaching complex concepts through clear and methodical lecture. For those with more than a beginner understanding of Python and mathematical concepts like calculus and dot products, however, the course leaves much to be desired. The assignments in particular were frustratingly easy. If you make them a little tougher by using a strategy I suggest further on, though, you can make them worth your while.

I’ll first describe the format of the course a little more, and then what I liked about it and what needs some improvement.

Format

The course is broken up into 4 units that are each supposed to take one week each, but you can get through it much more quickly if you have the time. He starts with a cursory introduction to the idea of neural networks and why they’re popular. The next unit includes a Numpy/vectorization overview and then moves into logistic regression and how it can be seen as a one-layer neural network. The final two units are about implementing a shallow and deep neural network respectively. Each unit consists of about 5 – 20 video lectures (each 5-15 minutes), some quizzes and a programming assignment (except the first intro unit).

The programming assignments are done through Coursera-hosted Jupyter notebooks, which makes sense for the format of the course but can sometimes be frustrating when they freeze. I’m also not a big fan of using notebooks in general. The assignments mostly focus on implementing neural networks using Python and Numpy, and briefly cover an example of using them for image classification.

What I Liked

Clear, concise videos that explain concepts intuitively. This is Andrew Ng’s strong suit, and I think he did a great job explaining concepts like activation functions and backpropagation well. He makes good use of examples and works through some concrete sample inputs when possible. This is what drew me to his videos in the first place, as few tutorials do this and I find having just one worked-through concrete example is one of the best ways to understand a mathematical or computational process. A good example of this is his video explaining the various activation functions such as hyperbolic tangent and ReLU. (On that note, a great resource if you’re struggling to understand backpropagation is Matt Mazur’s blog post going through all the computations with real numbers).

Consistent and well-explain notation. Andrew Ng and the team developing this course have clearly put a lot of time into developing a consistent notation to describe all the working pieces of neural networks. As anyone who’s tried to learn about or discuss the inner workings of neural nets will know, things can get very confusing very fast if you’re loose with notation (is W2 the vector for the weights going into or out of layer 2?!?). Some mathematicians may quibble with his abbreviation of  as dA, but he stays consistent, which is what really matters.

Assignments were clearly explained and on-topic. Since you don’t have the benefit of being able to easily ask a professor or TA clarifying questions about an online assignment, it’s really important that they be totally clear. The assignments definitely meet this requirement. If anything, they are too clear; they sometimes give a little too much away… which brings me to what I didn’t like.

What I Didn't Like

The assignments are way too easy. The core coding assignments you perform are implementing neural networks in Python. This would be sufficiently challenging if you had to do it from scratch, but instead the assignments break down every small step you have to make, provide you with the appropriate equations at each step, and pre-write almost all of the each function for you. All you have to do is fill in a few lines per function. And even then, you almost never fill in the whole line since they tell you which variables you need to define (see below).

Coursera deep learning course screenshot

I think this takes 90% of the learning away, since the process of deciding how to implement the algorithms you’ve learned about forces you to make sure you have a very solid grasp of the underlying theory. When you’re only filling in a few blanks here and there and the exact equation you’re supposed to code is right in front of you, it’s easy to mindlessly get the code right but not really understand what you’re doing. Sometimes they even provide the exact code you’re supposed to come up with! (see below)

I tried to find a compromise between how easy they were making it and doing everything totally from scratch. My strategy was to copy each function definition that you were supposed to fill out and its accompanying docstring into a new cell without looking at any of their pre-written code. I then tried to write the function on my own, tested it and compared my way to their way. I think this made my learning much more complete, since I wasn’t just filling in lines of code but thinking about the sequence of steps necessary for the whole function. I recommend doing this to anyone going through the course to enhance your learning.

Not enough mathematical emphasis. More resources should be provided for those who want more, particularly for the derivations of formulas. Some people may favor the way the course glosses over some of the math behind neural networks. It’s true that this allows a greater number of people to go through the course, and to do so more quickly. And Andrew Ng does make some attempt to explain the intuitions of backpropogation through a discussion of derivatives. But if I had a dollar for every time he says something like “it’s not really important to understand the math behind that,” I’d be able to repay the cost of the course. I’m personally not comfortable with just being given an equation to implement; I have to at least have an idea of where it came from.

I also would have loved an optional video on a more mathematically precise derivation of backpropagation. Right now I think there’s a gap between his user-friendly videos and the dense but complete derivation in www.deeplearningbook.com, and it would have been great for him to provide something in between.

Another particular area where I think he could have done a little more in depth on the math is in his explanations of dot products when doing a vectorized implementation of forward propagation. He talked about how you can debug code by making sure that matrix dimensions match up, but not about why they have to match up that way. This would have been a great opportunity for an intuitive explanation of the dot product and how the idea of weighted sums perfectly matches up with what we’re trying to achieve when we take the dot product of the weights and the input data in calculating the pre-activated values of the first layer.

Provide more extra credit suggestions for greater challenge. One of the more valuable parts of the programming assignments was at the end of the shallow neural assignment. After having implemented it with a hyperbolic tangent activation function in the hidden layer, the assignments suggests an extra credit opportunity to modify it for a ReLU activation. That was a great way to practice without the heavy-handed hints present in other parts of the assignments. A similarly challenging extra credit suggestion was missing from the final assignment, however. I think more of these could be peppered in to provide extra challenge.

Overall I feel like the course at least helped me a little in understanding neural networks. It’s always a pleasure to watch Andrew Ng teach. I’m also happy with my shiny certificate I got at the end, for whatever that’s worth. I think for the course to be of real value, though, they need to do a little less hand-holding and provide a little more rigor.

Let me know if you enjoyed this article and feel free to comment if you agree or disagree!