How can computers recognize your voice, identify objects in scenes, translate languages, or create art? The answer lies in deep learning.
Deep learning is a branch of artificial intelligence that helps machines learn from examples at scale. Artificial neural networks, inspired by the human brain, learn from large amounts of data. These networks are deep because they have many layers stacked together allowing them to learn increasingly complex patterns.
Just how deep are we talking about?
Model | Year | Domain | # Layers | # Parameters |
---|---|---|---|---|
AlexNet | 2012 | Vision | 8 | 60 million |
BERT | 2018 | Language | 24 | 340 million |
GPT-2 | 2019 | Language | 48 | 1.5 billion |
GPT-3 | 2020 | Language | 96 | 175 billion |
GPT-4 | 2023 | Language, vision | Not disclosed | > 1 trillion |
For comparison, the human brain has 86 billion neurons. However, direct comparisons between artificial neural networks and the human brain can be misleading for several reasons:
While deep learning is inspired by the brain, think of it as a simplified model that works in fundamentally different ways.
We covered a comprehensive history for artificial intelligence here. The current boom in generative artificial intelligence (e.g. LLMs) is powered by deep learning and neural networks. The deep learning revolution can be attributed to 2012 onwards. In the past decade, deep learning models have achieved remarkable results in speech recognition, computer vision, natural language processing, and many other fields thanks to more data, more powerful models, and better ways of training them.
At its core, deep learning is about finding a complex function that maps inputs to outputs. We can represent this with:
where is the input, is the output, is the neural network with parameters , and is the learnable parameters of the model (e.g. weights and biases). This differs from a standard function that we might see in math class in the following ways:
Deep learning models, particularly neural networks, are built from layers of interconnected artificial neurons. These neurons are mathematical functions that take multiple inputs, apply weights to them, and produce an output.
A neural network typically consists of three types of layers:
Each neuron in a neural network does a simple job:
This can be mathematically represented as:
where is the output of the neuron, is the activation function, is the number of inputs, is the weight for -th element of input, is the input value, and is the bias term.
What makes deep learning powerful is having many layers. Each layer can learn different features:
Neural networks learn through a process called backpropagation. Here's how it works:
With code this looks like:
for epoch in num_epochs:
# 1. Forward pass to make predictions
predictions = model(data)
# 2. Calculate error
loss = loss_function(prediction, true_labels)
# 3. Backward pass to calculate gradients
loss.backward()
# 4. Update weights
optimizer.step()
optimizer.zero_grad()
A few key terms to understand:
If you plot your loss curve over the training steps (i.e. iterations) during a successful run, you’ll see something similar to the examples below. When the loss plateaus and stops decreasing, we usually say that the model has converged. There are many reasons why neural network training might fail to converge, and we’ll cover these in the next article.
The mathematics of backpropagation is the chain rule from calculus:
This might look complicated, but we usually don't have to compute it ourselves as software (e.g. PyTorch, TensorFlow) handles this part. It's just a way to figure out how much each weight contributes to the error, so we know how to adjust it. We can break it down:
The symbol (pronounced "partial") just means "a small change in". So means "how much does the error change when we make a small change to this weight?". This formula helps us figure out how to adjust each weight by breaking the process into steps:
Deep learning is everywhere in our modern world:
Deep learning continues to evolve rapidly with exciting developments:
Want to dive deeper into deep learning? The next articles will explore key components of deep learning and different types of neural networks. We'll even build our own simple convolutional neural network (CNN) in the next lesson.