Aiphabet

Introduction to Machine Learning

Machine Learning Basics

How does Netflix know what shows to recommend to you? Or how your phone recognizes your face? Or even how spam emails get filtered out of your inbox? The answer to all these questions is machine learning!

What is Machine Learning?

Machine learning is the process of building programs that learn from experience without being explicitly programmed with rules. Instead of writing step-by-step instructions for a computer to follow, we give the computer examples and let it figure out the patterns on its own.

Machine learning is about:

  1. Build algorithms that use data to learn how to make predictions or find interesting patterns.
  2. Study different paradigms and algorithms of machine learning.
  3. Analyze how good they are.

Think of it like teaching a child. You don't give them a rulebook for recognizing dogs—you show them many examples of dogs until they can identify one on their own!

undefined

Learning from Data: Let's walk through the steps of a typical machine learning project:

  1. Data Collection:
  • First, we need to gather relevant data from databases or other sources.
  • Domain experts help determine what data might be useful.
  1. Data Preparation:
  • Raw data is rarely ready to use. We need to clean it by removing errors and handling missing values.
  • We also might create new features from existing ones.
  1. Exploratory Data Analysis (EDA):
  • We examine the data through descriptive statistics and visualizations to understand its characteristics and identify potential patterns.
  1. Machine Learning:
  • This is where we apply various algorithms to build models that can make predictions or identify patterns.
  1. Visualization and Deployment:
  • The results are visualized for easy understanding, and the model may be deployed in an application to make data-driven decisions.

Two Main Types of Machine Learning

Machine learning is typically divided into two main categories:

Supervised Learning:

In supervised learning, the algorithm learns from labeled data—examples that include both the input features and the correct output. The goal is to learn a mapping from inputs to outputs.

undefined

Training data: “examples” x with “labels” y

(x1,y1),,(xn,yn)withxiRd(x_1, y_1), \dots, (x_n, y_n) \quad \text{with} \quad x_i \in \mathbb{R}^d

Classification: y is discrete; to simplify, y1,+1y \in {1, +1}

f:Rd{1,+1}fis called a binary classifier.f: \mathbb{R}^d \rightarrow \{ -1, +1 \} \quad f \quad \text{is called a binary classifier.}

What is Classification?

Classification is when we want to predict a category or class. The output is discrete (belonging to specific groups).

Example: Is an email spam or not spam?

undefined

In a classification problem, we're drawing a decision boundary that separates different classes. For instance, if we were classifying fruits based on their length and width, we might find that bananas tend to be longer while oranges tend to be rounder.

Regression

Regression is when we want to predict a continuous value.

Examples:

  • What will be someone's income based on their age?
  • What will be the weight of a fruit based on its length?

undefined

In regression, we're trying to find a line (or curve) that best fits the data points, allowing us to predict values for new data.

Unsupervised Learning

In unsupervised learning, the algorithm learns from unlabeled data—examples that only include the input features without any corresponding output. The goal is to find structure or patterns in the data.

Clustering

Clustering is about grouping similar examples together. undefined

Training data: “examples” x

(x1,y1),,(xn,yn)withxiRd(x_1, y_1), \dots, (x_n, y_n) \quad \text{with} \quad x_i \in \mathbb{R}^d

Clustering/segmentation:

f:Rd>C1,...Ck(set of clusters)f : R^d -> {C_1,...C_k} \quad \text{(set of clusters)} \quad

Examples:

  • Grouping customers with similar purchasing behaviors
  • Categorizing different species based on their characteristics
  • Finding natural segments in population data