Post training methods in LLM using RL

Tags : PPO RLHF Maths Reinforcement learning, here the agent takes / decides some action to take based on the current state and other variables present at timestep t, and then its takes that action and a reward is followed and weights are updated based on the rewards received by model Consider this basic hello world example of RL State : Any place / position where the agent can be Action : Up , down , left , right these are the action the agent can take ...

August 23, 2025 · 6 min · Mohit Dulani

LLM's loss curve and compute

Small LM vs Large LM Small no. of parameter model has to choose between what knowledge to keep in parameter space and what to ignore and due to small no. of params the model tends to ignore most of the ood knowledge and keep the one that is commonly occuring ( small fields are ignored / tail knowledge is ignored ) Multi-task learning : grammar maths punctuation .. etc Heuristics : Small LM unwilling restricts itself to a smaller set of tasks , it can only improve a particular set / learn a particular subsets ( grammar , punctuation only ) while a large language model can learn both about tail knowledge / tasks like maths, punctuation and world knowledge ...

August 16, 2025 · 2 min · Mohit Dulani

Django

Building a scalable backend arch Using Django / django We start with a django Project, Using : $ django-admin startproject <project-1> <dir_name> A single django project contains its own views, urls, asgi , wsgi , manage.py files and a project is often called as service so these 2 things projects and services are same only, and the microservices architecture is the one where each service aka project can be scaled up independently .. ...

July 20, 2025 · 2 min · Mohit Dulani

Learning Go-lang

Learning GO to get the minimum latency and get things right from start .. Go docs Go Tour The whole file needs to become a package and to make it whole a package we wrap that up in package main and this makes the whole code as a single package that is then converted to binary We create package to help us import those in other files and the importing helps in code seperation, else we would have to write the whole code in a single main.go file.. ...

July 18, 2025 · 9 min · Mohit Dulani

Low Rank Adaptation

LoRA This is beneficial only if the rank, r « d, where d is the dimension of the matrix Rank of a matrix No. of Linearly independent rows/ columns are ranks A matrix of size 4 x 5 with rank = 2 , can be broken down to ( 4 x 2 ) x ( 2 x 5) reducing the total no. of params from 20 to 18 And when applied at a large scale for sizes 1024 x 1024 or 2048 x 2048 , and rank = 2 or 4 the size decreases exponentially from (1024 x 1024) to (1024 x 4 x 2) by 128x times ...

June 7, 2025 · 2 min · Mohit Dulani

SmolVLA paper from HG

SmolVLA Vision language action paper used for training real world Robots for task Input : Images of surrronding, task explanation in text, state ( senserimotor snapshot at a given time ) Sensorimotor states are projected into a single token using a linear layer to align with the token dimension of the language model. Ex: i } n p u t " " " ] s i i s ) m n t = a s a g t t 0 0 0 0 0 # { e r e . . . . . " u " 1 0 8 4 0 : c : 2 0 5 5 , t , , , , i i n 0 m o p - 0 0 . a a n . 1 . . 0 n g " a . 0 1 , y e : r 0 1 2 _ r 5 , , 0 o t " a , . t e P y 0 0 0 h n u ( 0 . . , e s t [ . 0 3 r o 4 0 3 1 r t 4 , , . r , h , 0 e e 0 , l 0 . e # r . 0 v e 3 0 a e d 1 , n . , t g c 0 . u - . s , b 0 0 e e . 0 n [ 2 , s 3 i 2 o , n , 0 r . 2 t 0 0 r 2 h . 0 e 4 e 0 , a , 8 d b , 0 i 2 i . n 2 n 1 0 g 4 " . 0 s ] , 5 , 7 R , G B i # # m # # # a j j g o o g e e e i i r n n n n i d d f t t p - - r p e e o p v e f f m o e r f f s l e e r i o o c c o t c p t t b i i e o o o o t n r r t n i ' s e ( p o s s 1 o r ( . s i c 7 ( 0 i e a ) 7 = t n m ) o i t e p o a r e n t a n i , ( o x n 0 , . ( 0 y q = , u c a l z t o ) e s r e n d i ) o n ) Output from Action expert that predict what action it should take next: Flow matching is a way to train the action expert so it can generate smooth, realistic action sequences quickly and efficiently. ...

June 7, 2025 · 4 min · Mohit Dulani

Thinking and fiddling out with random ideas

Learning new ideas and fiddling with them When you visit a site what happens ? *The html , js , css gets transferred to the browser using http request and then the chrome engine runs that and display the content How does image upscaling works? How does real time stt works ? We store the 500ms chunk in /tmp file , send it to transcribe and then repeat this until we close it .. or a VAD (voice activity detection) is used to check till when the voice is detected .. ...

May 11, 2025 · 2 min · Mohit Dulani

Dual booting arch setup with windows

So now we have arch installed and need to dual boot with windows ( obviously for nvidia broadcast to work and cracked premiere pro to work there, wine just doesnt work for me ) Partition The first role is to make a partition , a virtual partition in your drive so that we can install windows there .. How to make a partition ? if you want to part a drive that is mounted ( means that you are using it or is accessible to you ) then it cant be done as you need to unmount it first to create a partition .. ...

April 25, 2025 · 2 min · Mohit Dulani

Learning F&O and game theory

F&O learning Derivatives Includes Futures and Options Financial Instruments Nifty 50, Nifty Bank , Stock Options , Commodities , Soybean Futures etc Index h , joh ki top 50 indian companies ko track krta h .. Ye apne aap m khuch nhi hota toh isko khareed bech nhi skte nifty m trading krne ka matlab hota h , ki nifty k futures m trade krna Futures No premium , just have to pay the margin 3 month in future , last thursday of each month ka dekha jata h aka near future and far-away future. Lot m kaam krta h ,multi-day kr skte ho and isme short kr skte ho Last thrusday of that month ko expire hoga mtlb agar aapne apna balance khtm nhi kiya uss din tk toh apka broker automatically apko uss din nikal dega .. Futures m you can go either go long or go short Long position : means you bullish on the stock price that it will go up Short position : means you are bearish that stock price will go down in future Already leveraged prodcut h … ...

April 13, 2025 · 15 min · Mohit Dulani

Learning and Making CUDA kernels as a future skill

THIS IS A SKILL OF FUTURE CUDA DOCS Beginners introduction to CUDA How do GPU work by geohotz starts at 20:00 Amazing CS336 Stanford Stanford lecture Cpu does something called as branch prediction that allows to flow its operations without waiting for operations to get completed first, so this looks like, its gets many wrong also sometime but eventually stores the pattern ( Branch-1 was taken recently , branch-2 was not taken recently like this … ) ...

March 30, 2025 · 6 min · Mohit Dulani

Learning Celery

Background task Maintaining background tasks specially in python django is damn tough, so we to think spawning some daemon threads and getting that solved is very tough .. so rather we should use celery that starts a complete new process and doesnt maintain any-thing inside main process Celery What is the use of celery ? and its comparisons asyncio.create_task() : This creates and executes task in event loop ( efficient management of event loop ) ...

March 29, 2025 · 2 min · Mohit Dulani

Editing the node modules and github commands

A noob method of playing with Node modules Steps to follow to update code from a node_module like d3-force directly !! So what we get in a node_modules are the build folder that dont support HRM (hot reload module) so we have to make changes and then build that folder using the rollup module ( maybe we can use other libs also !!) Steps: Make that change in the file that you want !!(be it debugging statements or logic change ) Install the required files, using npm install and if required change the package manager from yarn to npm Then lookup to package.json and you might find a function named “prepublishOnly” Run the command npm run prepublishOnly, If the command requires the some folder , build those or remove those folders from the npm command !! Then once you fix the errors , got the errors solved , then we build this file using npm run prepublishOnly Then run the development server using the npm run dev -- --cache This is how to rebuild a node-module !! ...

March 26, 2025 · 9 min · Mohit Dulani