👋 Welcome to my Codex

Hi, this is Mohit. I’m documenting my learning notes in this blog since 2024. Based on the number of grammar mistakes in my posts, you can tell how much ChatGPT is involved 😉.

Learning Design Patterns

Python oops Everything in python is an object (So a method with its variables / properties) that is also a object OOPS is not that simple as we use in projects , its an interesting low-level discussion Class variables : So these are the one that are same for all the instance of the class and are from same memory location so changing any one of these will lead to change of other as well. like this: class Node: value: int , next: None ...

Brushing up LLM for Interview

Embeddings Vector Embeddings https://www.pinecone.io/learn/what-is-similarity-search/ Searching over structured data that is easy we can use Data structures for it like Binary tress / arrays (sorted order) and also things like hashset. This was done in internet 2.0 , sql , mysql , mongodb these leveraged it so well Now for unstructured data we need something that represents more deeper concept / representation of the data Using sentence-transformers (and models like Word2Vec , BERT model) ...

Adding JSON mode to any model and that too without prompts

Learning about how to add JSON mode to any model and dont just solely on prompts

Learnings from astro app AS

Load balancer Building up load balancer Get the input metrics sorted : Latency Cost Uptime Fit a linear equation that matches your score function : SF = W_1 * latency + W_2 * cost + W_3 * uptime where W_1 + W_2 + W_3 = 1 AB testing Create 2 buckets 80/20, whenever a user comes check if its in A bucket or B bucket or assign it one based on the rule so now it becomes a cohort to choose from and see which is a primary and which is a secondary users and experiment with them accordingly !! ...

Deep learning Optimizers

Gradients visualised Second order derivative Maths behind Single variable Things are very simple in single variable Function definition : f(x) = x^2 First derivative : f'(x) = 2x Second derivative : f''(x) = 2 To find the Minima in single variable the f’(x) = 0 , and f’’(x) >= 0 , these 2 conditions are enough to find minima Multivariable /Multivariate First Derivative : This becomes a vector and is no longer a number Second Derivative : This becomes a matrix that is called Hessian ...

Virtual Machine

Connecting to a Azure VM using ssh and RDP Configure a VM based on the specs Connecting with SSH Directly connect using ssh create an inbound rule for ssh, you get the ssh key ! Connecting with RDP RDP , install ubuntu-desktop, 3389 is the port for RDP , 22 for ssh , 80 for http , 443 for https ! Azure VM RDP Setup Guide 1. Install Desktop Environment (Ubuntu Desktop) Install desktop environment (if not already installed): sudo apt update sudo apt install ubuntu-desktop 2. Install and Configure XRDP Install xrdp: ...

Home Lab

Kubernetes This is a manager of docker containers So assume it like this , we have a VM from azure , that has some specs and we now want to spin up 4 containers in it so to do this we need manager as well that handles the routing, load balance, code updates reflection ( CI/CD ) and someone that we can talk to at manager level and it gives all information about the internal working of the containers, logs etc so that role is taken cared by kubernetes (k8s) ...

Post training methods in LLM using RL

Tags : PPO RLHF Maths Reinforcement learning, here the agent takes / decides some action to take based on the current state and other variables present at timestep t, and then its takes that action and a reward is followed and weights are updated based on the rewards received by model Consider this basic hello world example of RL State : Any place / position where the agent can be Action : Up , down , left , right these are the action the agent can take ...

LLM's loss curve and compute

Small LM vs Large LM Small no. of parameter model has to choose between what knowledge to keep in parameter space and what to ignore and due to small no. of params the model tends to ignore most of the ood knowledge and keep the one that is commonly occuring ( small fields are ignored / tail knowledge is ignored ) Multi-task learning : grammar maths punctuation .. etc Heuristics : Small LM unwilling restricts itself to a smaller set of tasks , it can only improve a particular set / learn a particular subsets ( grammar , punctuation only ) while a large language model can learn both about tail knowledge / tasks like maths, punctuation and world knowledge ...

Learning to use aider

Learning Aider

Spinning up a GCP

Best practices and get a VM in cheap price Take a GPU GCP , there is a section to select that then take either V100 or T4 GPU class ( cheap is time and flops are not an issue ) add persistent additional disk of around 70GB (most important) GPU only gcp provides very less SSD storage so this becomes necessary Then install drivers from nvidia official site and then mount the additional disk and start with placing your project in the mounted device dont touch teh original SSD … every development and cache goes in the mounted directory not in the other one ...

Building Backend Architecture from scratch

Building a scalable backend arch Using Django / django We start with a django Project, Using : $ django-admin startproject <project-1> <dir_name> A single django project contains its own views, urls, asgi , wsgi , manage.py files and a project is often called as service so these 2 things projects and services are same only, and the microservices architecture is the one where each service aka project can be scaled up independently .. ...