👋 Welcome to my Codex

Hi, this is Mohit. I’m documenting my learning notes in this blog since 2024. Based on the number of grammar mistakes in my posts, you can tell how much ChatGPT is involved 😉.

Audio models for generation, ASR, Trigger word etc

Audio Modality Codec : piece of hardware/software that compresses / decompresses digital data to reduce file size FFT : fast fourier transform converts from time-amplitude domain to frequency-amplitude domain Resampling : an audio was recorded at 44100 Hz, but we want to resample it to 16000 Hz, so that is called resampling Spectrogram : converting from time-amplitude domain to frequency-amplitude domain Mels : mel scale, approximates how humans percieve pitch and in this freq axis is converted to mel scale Channel : no. of seperate audio how many microphones were used to record the audio Mono channel : single sound , more like one headphone sound Stereo channel : surround sounds , more like two headphones sound (tv , songs , youtube vids ) Sampling Rate : no. of sound point extracted from 1 sec of audio Waveform : so this is audio plot , on x-axis we have time , on y-axis we have decibels, pitch . This is what we hear and what music players shows Spectogram : so we convert from time domain to freq domain using FFT , Mel-spectogram : Inspired from how humans listen to sound, and we listen on a logscale so therefore mel-spectogram is made for humans to listen Example : SAMPLE_RATE = 16000 HOP_LENGTH = 256 # number of audio samples between spectrogram frames between 2 short time frame windows N_FFT = 1024 MAX_MEL_FRAMES = 512 # no. of timestamps in a mel spectrogram So in this audio is sampled at 16Khz N_FFT : tells in a 1 fft how many samples to analyse, ...

March 9, 2026 Â· 3 min Â· Mohit Dulani

DL (image modality)

Resolution means a lot for image domains an upscale from 512 to 1204 can make or break your AI model U2Net still works better for segmentation than most of the SAM3 models when it comes to high defined smooth edges. SAM models are data hungry, need a lot data to do something nice from it Convolution operation and kernels are underrated, a lot can be done if the filter / kernel values are set correctly ...

February 6, 2026 Â· 1 min Â· Mohit Dulani

Reliable / scalable Backend

Terms Ephemeral (efimeral) : short lived functions like google-cloud functions , that are running for per request and has a fixed timeout and at that timeout it sends a SIGTERM to close it. Presigned / Signed URL : Is used for upload / download directly to storage (w/o involving backend in this all), its a time based URL that expires with time, the generator of the link should have access and uploader can just upload … so this is used in Customer support as well , when we upload images of a damaged item, this is used to reduce load to backend and solve this ...

January 12, 2026 Â· 8 min Â· Mohit Dulani

Reinforcement learning in Language Modelling

Alignment / RLHF part of the models , and how to do those ! Pretrained model to getting to Instruct GPT model the whole flow of going from here to there Enabling better, tighter controls over LM output Post-training : SFT : if you want to imitate expert demostration you better have expert demonstration of what that looks like , once we have the data how to adapt to it ? https://youtu.be/Dfu7vC9jo4w?t=337 All policies tries to find / estimate the Advantage value .. PPO does that by telling how better this action is compared to the average .. Advantage = ( R - Baseline ) ...

November 23, 2025 Â· 10 min Â· Mohit Dulani

Tokenizer

Encoding Its a way to send / transmit information The transfering of data across internet is done in byte stream and we have a defined format to encode those in a byte stream Like if its a text data then we have utf-8 encoding , if its a audio file its mp3 encoding, image file its png encoded like this we define encoding for all dataformats and on client side decoding happens to output it in a compatible way to the end user ...

October 24, 2025 Â· 2 min Â· Mohit Dulani

Brushing up LLM for Interview

Embeddings Vector Embeddings https://www.pinecone.io/learn/what-is-similarity-search/ Searching over structured data that is easy we can use Data structures for it like Binary tress / arrays (sorted order). This was done in internet 2.0 , sql , mysql , mongodb these leveraged it so well Now for unstructured data we need something that represents more deeper concept / representation of the data Using sentence-transformers (and models like Word2Vec , BERT model) So in the bert model we train it using the [CLS] token / prefix, we take the trained model and then extract this token embedding. Encoder only architecture : This is used in models like BERT ( that is bidirectional ) and its useful for NLU tasks ( that is natural language understanding ) and to generate more tokens out from this we use [MASK] as a token header. And in word2vec model, we use cbow and skip-gram that depend on the proximity of similar words Vector Search Terminologies : IndexPQ ( product quantizers ) , IndexIVFPQ ( Inverted File with Product Quantization ) ...

October 13, 2025 Â· 24 min Â· Mohit

Adding JSON mode to any model and that too without prompts

Learning about how to add JSON mode to any model and dont just solely on prompts

October 10, 2025 Â· 5 min Â· Mohit Dulani

Learnings from astro app AS

Load balancer Building up load balancer Get the input metrics sorted : Latency Cost Uptime Fit a linear equation that matches your score function : SF = W_1 * latency + W_2 * cost + W_3 * uptime where W_1 + W_2 + W_3 = 1 AB testing Create 2 buckets 80/20, whenever a user comes check if its in A bucket or B bucket or assign it one based on the rule so now it becomes a cohort to choose from and see which is a primary and which is a secondary users and experiment with them accordingly !! ...

September 10, 2025 Â· 4 min Â· Mohit Dulani

Deep learning Optimizers

Gradients visualised Second order derivative Maths behind Single variable Things are very simple in single variable Function definition : f(x) = x^2 First derivative : f'(x) = 2x Second derivative : f''(x) = 2 To find the Minima in single variable the f’(x) = 0 , and f’’(x) >= 0 , these 2 conditions are enough to find minima Multivariable /Multivariate 2x + 3y = 10 (multivariable eq) x2 + y2 = 16 (multivariable eq) ...

September 9, 2025 Â· 11 min Â· Mohit Dulani

Virtual Machine

Connecting to a Azure VM using ssh and RDP Configure a VM based on the specs Connecting with SSH Directly connect using ssh create an inbound rule for ssh, you get the ssh key ! Connecting with RDP RDP , install ubuntu-desktop, 3389 is the port for RDP , 22 for ssh , 80 for http , 443 for https ! Azure VM RDP Setup Guide 1. Install Desktop Environment (Ubuntu Desktop) Install desktop environment (if not already installed): sudo apt update sudo apt install ubuntu-desktop 2. Install and Configure XRDP Install xrdp: ...

September 3, 2025 Â· 2 min Â· Mohit Dulani

Home Lab

Kubernetes Kubernetes is a manager of pods, those pods could be running on different machines and we can have multiple roles to this, one could be control panel other could be worker role So k8s takes your machine ( or you can define via virtualisation, the amount / ratio it should take) and then it orchestrates it, so you have to define a .yaml file and k8s will take care of on which cluster to run this and how many pods to spin up for this , everything is taken care of just you need namespace, entry file / project etc all this written in a yaml file. ...

August 31, 2025 Â· 7 min Â· Mohit Dulani

Post training methods in LLM using RL

Tags : PPO RLHF Maths Reinforcement learning, here the agent takes / decides some action to take based on the current state and other variables present at timestep t, and then its takes that action and a reward is followed and weights are updated based on the rewards received by model Consider this basic hello world example of RL State : Any place / position where the agent can be Action : Up , down , left , right these are the action the agent can take ...

August 23, 2025 Â· 6 min Â· Mohit Dulani