5 Chapter 4: Frameworks and What They Hide

Author

Pranav Deshpande

Published

March 4, 2026

You built a neural network from scratch. Now rebuild it in PyTorch, and pay attention to what changes and what stays the same.

5.1 The same network, fewer lines

import torch
import torch.nn as nn

class Network(nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = nn.Sequential(
            nn.Linear(784, 128),
            nn.ReLU(),
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Linear(64, 10)
        )

    def forward(self, x):
        return self.layers(x)

That’s the whole model. Compare it to the 30+ lines from Chapter 3.

5.2 What disappeared

Three things vanished:

Weight initialization. PyTorch picks sensible defaults (Kaiming uniform for linear layers). You can override them, but the defaults work.
Backpropagation. loss.backward() computes all gradients automatically. No manual chain rule.
Weight updates. The optimizer handles it: optimizer.step().

5.3 What autograd actually does

When you call loss.backward(), PyTorch walks backward through a computation graph it built during the forward pass. Every operation (matmul, add, ReLU) recorded itself and knows how to compute its own gradient.

This is the same chain rule math you wrote by hand. PyTorch just automates it.

5.4 The training loop

optimizer = torch.optim.SGD(net.parameters(), lr=0.01)
loss_fn = nn.CrossEntropyLoss()

for epoch in range(50):
    output = net(X_train)
    loss = loss_fn(output, y_train)

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

5.5 What you’re trading

Convenience costs understanding. When you use a framework:

You can’t easily see the gradients flowing through your network
Debugging shape mismatches becomes harder (the error is inside the framework, not your code)
You trust the framework’s implementation is correct (it usually is, but “usually” has burned people)

The point of Chapter 3 wasn’t to teach you to never use frameworks. It was to make sure that when you use one, you know what it’s doing.

5.6 When to go manual, when to use frameworks

Use frameworks for anything you ship. The optimizations (GPU kernels, mixed precision, distributed training) are not things you want to rewrite.

Go manual when you’re learning a new concept, debugging a weird training behavior, or implementing a paper that does something nonstandard.

5.7 Exercises

Replace SGD with Adam. What changes in training dynamics?
Add dropout between layers. Train with and without it. Compare test accuracy.
Try torch.compile() and benchmark the speed difference.

5.8 What’s next

Chapter 5: Convolutional networks. You’ll learn why fully-connected layers are the wrong tool for images, and build a CNN that beats your MLP.