Posts

Learning the 'Why' in Deep Learning

https://claude.ai/share/82ba8ee9-f673-4f8a-a89a-9fd380bd0e53 We all know about DL architectures, but do we know how researchers arrived at this !! How to arrive at a deep learning architecture ? How to make connections between layers and be intuitionally correct ? The logic behind a deep learning architecture , how to make the gradients flow from different parameters space and still making sense , and the use of residual connections , layer normalisation .. etc ...

Learning internals of building a PC

computer from starting up computerphile explaining CPU Inside a CPU blog explaining cpu working compiler explorer What is a disassembler? and what does that mean? Disassembles Machine code to human readable assembly code and the assembly code is specific to the processor as it depends on ISA of the processor !! Popular disassembler is IDA ( integrated disassembler ) Modern compilers add a lot of functions to make it efficient so its becomes an art decompiling it !! ...

Learning Authentication

Learning Authentication and Tokens So when using an auth library like Auth0 we get 2 tokens : JWT ( JSON web token ) Access token What’s the use of these 2 and not just use 1 ? ID Token is all about who the user is , the details , identity of the user , user’s unique identifier Access Token is your key to authorization. It tells the resource server that the bearer of the token has permission to access specific resources. ...

Learning about Hardware and ROS and controllers

Learning Hardware stuff Nice_Chat : https://chatgpt.com/share/67c0be3f-ca80-8009-b271-f3c1f626edbf Learning microcontrollers Esp32 : One of the most popular microcontroller that is inbuilt with bluetooth and wifi card ! ATmega328P : Microcontroller that is used in Arduino UNO GPIO : general purpose input output ( can act as input or output pin, depending on the code and its usage ) What is a firmware ? Firmware is a low level control over hardware , directly written in the hardware RAM, to the memory ( embedded in metal ) kinda like a kernel that manages / controls the microcontroller ...

Learning TypeScript

Typescript Intro and Setup for Typescript Superset of Javascript, its built on top of JS JS is a dynamically typed language, where we dont define the types and they are associated with run-time also like python whereas, TypeScript is a statically typed language, you define types to variable like , C , java You need to compile Typescript to work for your case, it compiles .ts to .js and Learning typescript let id = 5; // js let id: number = 5; // typescript Here we defined a type and as the modern browsers can only understand JS and no this fancy languages so we need to compile this to JS. ...

Learning Crawl 4 AI

Crawl-4-AI Getting relevant data from Crawl4AI Its good at generating markdown format but seems to be a a bit off getting structured data out from it !! So it provides 2 inbuilt strategies : Using Content Selection (https://docs.crawl4ai.com/core/content-selection/) Using CSS based selection import asyncio import json from crawl4ai import AsyncWebCrawler, CacheMode from crawl4ai.async_configs import BrowserConfig, CrawlerRunConfig from crawl4ai.extraction_strategy import JsonXPathExtractionStrategy from bs4 import BeautifulSoup browser_config = BrowserConfig( headless=False, # Changed to headless mode user_agent_mode="random" ) crawler_config = CrawlerRunConfig( cache_mode=CacheMode.ENABLED, magic=True, verbose=True, log_console=True, wait_until="networkidle", simulate_user=True, exclude_external_images=True, css_selector="div.yuRUbf a", ) query = "https://www.google.com/search?q=health" async def main(): try: async with AsyncWebCrawler(config=browser_config) as crawler: urls = [query+"&start=0" , query+"&start=10"] result = await crawler.arun( url=query, config=crawler_config ) print("Extraction completed successfully") # Add inside your main function with open("raw_html.html", "w", encoding="utf-8") as f: f.write(result.cleaned_html if hasattr(result, 'html') else "No HTML available") print("Cleaned HTML is :" , result.cleaned_html) # print not helpful content from this !! except Exception as e: print(f"Crawling failed: {e}") asyncio.run(main()) Using Schema based extraction : import asyncio import json from crawl4ai import AsyncWebCrawler, CacheMode from crawl4ai.async_configs import BrowserConfig, CrawlerRunConfig from crawl4ai.extraction_strategy import JsonXPathExtractionStrategy from bs4 import BeautifulSoup # Updated schema to be more resilient to Google's layout changes schema = { "name": "Link-Title-Schema", "baseSelector": "//div[contains(@class, 'yuRUbf')]", # More flexible match "fields": [ { "name": "link", "selector": ".//a", # Direct child anchor tag "type": "attribute", "attribute": "href" } ] } browser_config = BrowserConfig( headless=False, # Changed to headless mode user_agent_mode="random" ) crawler_config = CrawlerRunConfig( cache_mode=CacheMode.ENABLED, magic=True, verbose=True, log_console=True, wait_until="networkidle", simulate_user=True, exclude_external_images=True, extraction_strategy=JsonXPathExtractionStrategy(schema=schema, verbose=True) ) query = "https://www.google.com/search?q=health" async def main(): try: async with AsyncWebCrawler(config=browser_config) as crawler: urls = [query+"&start=0" , query+"&start=10"] result = await crawler.arun( url=query, config=crawler_config ) print("Extraction completed successfully") print("Cleaned HTML is :" , result.cleaned_html) soup = BeautifulSoup(result.html, "html.parser") links = [a["href"] for a in soup.select("div.yuRUbf a") if a.has_attr("href")] print("Extracted Links:", links) with open("extracted_links.txt", "w") as f: f.write("\n".join(links)) # Check if extracted content exists if result.extracted_content: print(f"Length of extracted content: {len(result.extracted_content)}") try: # Try to parse as JSON data = json.loads(result.extracted_content) print(f"Successfully parsed JSON with {len(data)} entries") except json.JSONDecodeError as e: print(f"Could not parse JSON: {e}") print(f"Raw content: {result.extracted_content[:200]}...") else: print("No content was extracted") # Save markdown and media try: with open("result_markdown.md", "w") as f: f.write(result.markdown) print("Saved markdown file") with open("result_media.md", "w") as f: f.write(str(result.media)) print("Saved media file") except IOError as e: print(f"Error saving files: {e}") except Exception as e: print(f"Crawling failed: {e}") asyncio.run(main()) Custom Methods to get relevant data: Get the full HTML from the page and then pass this to the beautiful soup parser ( to get the HTML ) and then use the logic there to get the desired output !! ...

Learning nginx

Proxy Something that sits in front of client is called proxy ! HTTP and Frontend working So the frontend code renders on the user site (everyone knows but how does that happen ?) The user requests the site url https://mysite.com , the http protocol goes to the hosted server and gets code for frontend site data and then render to the user system. In the frontend code we mention in depth what we want , how we want , how to interact with the data .. ...

Learning D3 js

Learning D3 js D3.js is declarative rather than imperative, meaning you describe what you want the visualization to look like, and D3 handles the details. For someone used to more imperative approaches (like traditional programming), this might feel a bit abstract or confusing at first. Simulation : Its like a mini physics engine, where instead of manually positioning every elements you let the simulation apply forces ( like gravity , repulsion , attraction ) to your nodes. f ...

Web Scraping

Learning to scrape web The modern web is dynamic, so we need to use a tool that can atleast scrape dynamic content and get the relevant results The static scraping sites like : Requests , cheerio, they dont work for dynamic site Dynamic sites scraping : Use tools like playwright ( inject javascript on site load and get the data ) But modern sites catch this data (playwright) We have something called Chrome Devtools Protocol ( CDP ) CDP is a low-level, JSON-based protocol that allows external tools to interact directly with Chromium-based browsers (like Chrome and the Chromium engine used by Edge). ...

VS code debugger

VS Code Debugger The ultimate use is to understand a large project , that is important to understand and is large , manually going through all the files is a bit tough so we need to ultimately understand it using a debugger First change the interpreter for the python file using ctrl + shift + P > select interpreter , that will allow you to choose an interpreter where all your libraries are downloaded !! so that swiggly lines and everything goes away , no matter what’s your env setup in the terminal, the interpreter path is the something that u needs to be changed !! ...

Computer buzzwords

This includes all cloud / devops , web-based , backend , machines etc Devops Glossary Elastic Search “Elastic” – It is highly scalable and can dynamically handle large volumes of data. “Search” – It is designed primarily for search and analytics, making retrieval fast and efficient. Elasticsearch is a distributed search and analytics engine built on Apache Lucene. It is designed for fast, scalable full-text search and is commonly used for log analysis, real-time data indexing, and search applications. ...

Learning Event Loops ( JS / Python )

Learning JS Event Loop Event loop Main thread : where every thing happens something on main thread, takes a long time ( 200ms ) that becomes noticeable to the users and is not user exp The human body , is multithreaded in all ways , we can type , listen , understand , talk , move eyes , move lips ..etc all at the same time But only while sneezing, we become single threaded, we cant do nothing but sneeze ...