3 Chapter 2: The Math You Actually Need

Author

Pranav Deshpande

Published

March 2, 2026

Most AI math courses teach way more than you need. This chapter teaches exactly what shows up in the code, and you’ll implement all of it.

3.1 Vectors and matrices

A vector is a list of numbers. A matrix is a grid of numbers. That’s the entire intuition you need to start.

What matters is what you do with them:

Dot product: multiply corresponding elements and sum. This is the fundamental operation in neural networks. Every neuron computes a dot product.
Matrix multiplication: many dot products at once. This is how you process a batch of inputs through a layer.
Transpose: flip rows and columns. You’ll use this constantly when aligning dimensions.

import numpy as np

# A single neuron: dot product + bias
weights = np.array([0.2, -0.5, 0.1])
inputs = np.array([1.0, 2.0, 3.0])
bias = 0.3

output = np.dot(weights, inputs) + bias  # 0.2 - 1.0 + 0.3 + 0.3 = -0.2

3.2 Derivatives and the chain rule

A derivative tells you: if I nudge this input a little, how much does the output change?

That’s all backpropagation is. You nudge each weight, see how the loss changes, and adjust.

The chain rule lets you compute derivatives through a sequence of operations. If y = f(g(x)), then dy/dx = f'(g(x)) * g'(x). In a neural network, the “sequence of operations” is your layers, and the chain rule lets you figure out how each weight contributed to the error.

3.3 Implementing gradient descent

Gradient descent in three lines of real logic:

def train_step(weights, inputs, target, learning_rate=0.01):
    prediction = np.dot(weights, inputs)
    error = prediction - target
    gradient = error * inputs  # derivative of squared error w.r.t. weights
    weights = weights - learning_rate * gradient
    return weights

That’s the core loop. Everything else, batching, momentum, Adam, is an optimization on top of this.

3.4 Probability basics

You need three things:

Softmax: turns a vector of numbers into a probability distribution. Used in every classification network.
Cross-entropy: measures how wrong your probability distribution is. This is your loss function for classification.
Sampling: picking randomly according to probabilities. Used in text generation.

def softmax(x):
    exp_x = np.exp(x - np.max(x))  # subtract max for numerical stability
    return exp_x / exp_x.sum()

3.5 What we’re skipping

Eigenvalues, SVD, measure theory, information theory proofs. You don’t need them to build working AI systems. If you need them later (and you might), you’ll know exactly why, because you’ll hit a problem that requires them.

3.6 What’s next

Chapter 3: you build your first neural network from scratch. No PyTorch, no TensorFlow. Numpy and your own code.