Aiphabet

Introduction to Deep Learning

How can computers recognize your voice, identify objects in scenes, translate languages, or create art? The answer lies in deep learning.

🧠 What is Deep Learning?

Deep learning is a branch of artificial intelligence that helps machines learn from examples at scale. Artificial neural networks, inspired by the human brain, learn from large amounts of data. These networks are deep because they have many layers stacked together allowing them to learn increasingly complex patterns.

Just how deep are we talking about?

Model Year Domain # Layers # Parameters
AlexNet 2012 Vision 8 60 million
BERT 2018 Language 24 340 million
GPT-2 2019 Language 48 1.5 billion
GPT-3 2020 Language 96 175 billion
GPT-4 2023 Language, vision Not disclosed > 1 trillion

For comparison, the human brain has 86 billion neurons. However, direct comparisons between artificial neural networks and the human brain can be misleading for several reasons:

  1. The human brain has 100 trillion synaptic connections between neurons
  2. Artificial neurons often only connect to adjacent layers, while brain neurons form complex, recurrent connections
  3. Biological neurons are more complex than their simplified mathematical counterparts

While deep learning is inspired by the brain, think of it as a simplified model that works in fundamentally different ways.

📜 Evolution of Deep Learning

We covered a comprehensive history for artificial intelligence here. The current boom in generative artificial intelligence (e.g. LLMs) is powered by deep learning and neural networks. The deep learning revolution can be attributed to 2012 onwards. In the past decade, deep learning models have achieved remarkable results in speech recognition, computer vision, natural language processing, and many other fields thanks to more data, more powerful models, and better ways of training them.

🔦 How Deep Learning Works: The Basics

At its core, deep learning is about finding a complex function that maps inputs to outputs. We can represent this with:

y=fθ(x)\text{y} = f_{\theta} (\text{x})

where x\text{x} is the input, y\text{y} is the output, fθf_{\theta} is the neural network with parameters θ\theta, and θ\theta is the learnable parameters of the model (e.g. weights and biases). This differs from a standard function that we might see in math class in the following ways:

  • The input is now a matrix, not just a single value. x\text{x} might be a 224×224×3224 \times 224 \times 3 matrix of pixel values for an image (height ×\times width ×\times RGB channels).
  • ff is now a very complex function: a deep neural network with many layers.
  • The output can also potentially be a matrix. y\text{y} might be a vector of probabilities for different object classes.

Deep learning models, particularly neural networks, are built from layers of interconnected artificial neurons. These neurons are mathematical functions that take multiple inputs, apply weights to them, and produce an output.

A neural network typically consists of three types of layers:

  • Input Layer: Receives the raw data (like pixel values of an image)
  • Hidden Layers: Process the information through multiple transformations
  • Output Layer: Produces the final result (like recognizing a cat in an image)

undefined

💥 Inside the Neuron

Each neuron in a neural network does a simple job:

  1. It receives inputs from other neurons
  2. It multiplies each input by a weight (how important that input is)
  3. It adds all these weighted inputs together
  4. It applies an activation function to decide how strongly to fire
  5. It sends its output to neurons in the next layer

This can be mathematically represented as:

y=σ(i=1nwixi+b)y = \sigma(\sum_{i=1}^{n} w_i x_i + b)

where yy is the output of the neuron, σ\sigma is the activation function, nn is the number of inputs, wiw_i is the weight for ii -th element of input, xix_i is the input value, and bb is the bias term.

📶 The Power of Many Layers

What makes deep learning powerful is having many layers. Each layer can learn different features:

  • Early layers might learn simple features (edges, colors)
  • Middle layers combine these into more complex features (shapes, textures)
  • Later layers can recognize high-level concepts (objects, faces)

undefined

🔄 Training Process

Neural networks learn through a process called backpropagation. Here's how it works:

  1. Forward Pass: The network makes a prediction based on its current weights
  2. Calculate Error: The prediction is compared to the correct answer
  3. Backward Pass: The error is sent backward through the network
  4. Update Weights: The weights are adjusted to make the prediction more accurate next time

With code this looks like:

for epoch in num_epochs:
    # 1. Forward pass to make predictions
    predictions = model(data)

    # 2. Calculate error
    loss = loss_function(prediction, true_labels)

    # 3. Backward pass to calculate gradients
    loss.backward()

    # 4. Update weights
    optimizer.step()
    optimizer.zero_grad()

A few key terms to understand:

  • Loss Function: This measures how far off the model's predictions are from the correct answers. It's like a scoring system that tells the model how badly it performed. A higher loss means worse predictions, so the goal is to minimize this value.
  • Optimizer: This is the algorithm that updates the weights of the neural network based on the calculated gradients. It determines how quickly and in what way the model learns. We take small steps downhill to find the lowest point (minimum loss).

If you plot your loss curve over the training steps (i.e. iterations) during a successful run, you’ll see something similar to the examples below. When the loss plateaus and stops decreasing, we usually say that the model has converged. There are many reasons why neural network training might fail to converge, and we’ll cover these in the next article.

undefined
Loss curves from real-world models. Left: AlphaGo Zero (source), Middle: BERT (source), Right: ResNet (source).

The mathematics of backpropagation is the chain rule from calculus:

Lw=Laazzw\frac{\partial L}{\partial w} = \frac{\partial L}{\partial a} \frac{\partial a}{\partial z} \frac{\partial z}{\partial w}

This might look complicated, but we usually don't have to compute it ourselves as software (e.g. PyTorch, TensorFlow) handles this part. It's just a way to figure out how much each weight contributes to the error, so we know how to adjust it. We can break it down:

  • LL: The loss or error (how wrong our prediction was)
  • ww: A weight in our neural network (the numbers we're trying to adjust)
  • aa: The activation or output of a neuron (what a neuron decides to send forward)
  • zz: The input to a neuron before the activation function is applied

The symbol \partial (pronounced "partial") just means "a small change in". So Lw\frac{\partial L}{\partial w} means "how much does the error change when we make a small change to this weight?". This formula helps us figure out how to adjust each weight by breaking the process into steps:

  • La\frac{\partial L}{\partial a}: How does changing the neuron's output affect the final error?
  • az\frac{\partial a}{\partial z}: How does changing the neuron's input affect its output?
  • zw\frac{\partial z}{\partial w}: How does changing the weight affect the neuron's input?

💡 Applications of Deep Learning

Deep learning is everywhere in our modern world:

  • Computer Vision: Image recognition, object detection, face recognition
  • Natural Language Processing: Translation, chatbots, text summarization
  • Speech Recognition: Voice assistants, transcription services
  • Gaming: Game AI, procedural content generation
  • Healthcare: Disease diagnosis, drug discovery
  • Autonomous Vehicles: Self-driving cars, drones
  • Art Generation: Creating music, paintings, and other creative content

🔮 The Future

Deep learning continues to evolve rapidly with exciting developments:

  • Models are becoming more efficient, requiring less data and computing power
  • New architectures like transformers are revolutionizing NLP and other fields
  • Research in explainable AI is making deep learning more transparent
  • Deep learning is being combined with other AI techniques for even stronger systems

Want to dive deeper into deep learning? The next articles will explore key components of deep learning and different types of neural networks. We'll even build our own simple convolutional neural network (CNN) in the next lesson.