Learning a bit advanced Pytorch

Training Loop demystified

Forward Pass: Compute predictions (The code where you pass the input to the model to get the output)
Loss Calculation: Compute the loss (The code where you calculate the loss between the predicted output and the target output)
Backward Pass: Compute the gradients for all the parameters where we have requires_grad = True and is stored in the .grad attribute of the parameter (The code where you calculate the gradients of the loss with respect to the model parameters). In the loss.backward() function, it does not update the .grad attribute of the parameter. It just computes the gradient and stores it in the .grad attribute of the parameter.
Parameter Update aka optimization: Update the parameters of the model for which we have .grad attribute / requires grad attributes (The code where you update the parameters using the gradients and a learning rate)
Zero_grad(): The default behavior of the .grad attribute is to accumulate the gradients of the loss with respect to the parameters (The .grad is a tensor and it accumulates the grad for all the parameters). So we need to set it to zero before doing the backward pass.

Training Loop in Pytorch

for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):     
        optimizer.zero_grad()
        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward pass (calculate the gradients and stored in .grad attribute of the parameter)
        loss.backward()

        # Updates these parameters based on the defined optimizer and learning rate
        optimizer.step()

The alias codes and removing the abstraction

loss.backward() translates to

Optimizer.step() translates to

optimizer.zero_grad() translates to

Convolutional Layers demystified with alias codes

Nice link for visualizing the convolution operation.

conv2d( kernel_size = 3, stride = 1, padding = 1) translates to

visualizing ConvTranspose2d is a bit more complex and tough than the conv2d layers and the reason is less intutive nature behind it. Dont think much about it … just know that it works and gives results.( I dont know much about it too)

Training Loop demystified#

Training Loop in Pytorch#

The alias codes and removing the abstraction#

Convolutional Layers demystified with alias codes#

Training Loop demystified

Training Loop in Pytorch

The alias codes and removing the abstraction

Convolutional Layers demystified with alias codes