Learning a bit advanced Pytorch

Training Loop demystified

Forward Pass: Compute predictions (The code where you pass the input to the model to get the output)
Loss Calculation: Compute the loss (The code where you calculate the loss between the predicted output and the target output)
Backward Pass: Compute the gradients for all the parameters where we have requires_grad = True and is stored in the .grad attribute of the parameter (The code where you calculate the gradients of the loss with respect to the model parameters). In the loss.backward() function, it does not update the .grad attribute of the parameter. It just computes the gradient and stores it in the .grad attribute of the parameter.
Parameter Update aka optimization: Update the parameters of the model for which we have .grad attribute / requires grad attributes (The code where you update the parameters using the gradients and a learning rate)
Zero_grad(): The default behavior of the .grad attribute is to accumulate the gradients of the loss with respect to the parameters (The .grad is a tensor and it accumulates the grad for all the parameters). So we need to set it to zero before doing the backward pass.

Training Loop in Pytorch

for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):     
        optimizer.zero_grad()
        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward pass (calculate the gradients and stored in .grad attribute of the parameter)
        loss.backward()

        # Updates these parameters based on the defined optimizer and learning rate
        optimizer.step()

The alias codes and removing the abstraction

loss.backward() translates to

for param in model.parameters():
    if param.requires_grad:
        param.grad = Autograd.backward(loss, param)

Optimizer.step() translates to

for param in model.parameters():
    if param.grad:
        param.data = param.data - learning_rate * param.grad

optimizer.zero_grad() translates to

for param in model.parameters():
    if param.requires_grad:
        param.grad = None

Convolutional Layers demystified with alias codes

Nice link for visualizing the convolution operation.

conv2d( kernel_size = 3, stride = 1, padding = 1) translates to

input_size = (3 , 32, 32)
>>>kernel_size = 3 == (3, 3, 3) 
stride = 1
padding = 1

output_size = (32, 32)

## get blob of the input_tensor , do element wise multiplication with the kernel and sum them up to get the output_tensor

# Define output tensor of desired size
output_tensor = torch.zeros(output_size)

# Iterate over each position in the output tensor


for c in range(input_size[2]):  
    for i in range(output_size[0]-kernel_size[0]+1):
        for j in range(output_size[1]-kernel_size[1]+1):
            # Define the receptive field
            start_i = i * stride - padding
            start_j = j * stride - padding
            end_i = start_i + kernel_size
            end_j = start_j + kernel_size
            
            # Extract the input patch
            input_patch = input_tensor[:, start_i:end_i, start_j:end_j]
            
            # Perform element-wise multiplication with kernel and sum
            output_tensor[i, j] = torch.sum(input_patch * kernel)

# The output_tensor now contains the result of the convolution operation

visualizing ConvTranspose2d is a bit more complex and tough than the conv2d layers and the reason is less intutive nature behind it. Dont think much about it … just know that it works and gives results.( I dont know much about it too)

Written on September 7, 2024