Taplio

Alex Razvant's Linkedin Analytics

Get the Linkedin stats of Alex Razvant and many LinkedIn Influencers by Taplio.

Want detailed analytics of your Linkedin Account? Try Taplio for free.

Alex Razvant

open on linkedin

Hard work always pays off, be consistent! I'm a Senior Machine Learning Engineer with a keen interest in developing AI/ML solutions for real world problems and helping others get started on their Machine Learning Journey. Let's connect @: 📘 https://medium.com/@alexandrurazvant ✉️ alexandrurazvant@gmail.com 🌐 https://www.neuraleaps.com

Check out Alex Razvant's verified LinkedIn stats (last 30 days)

Followers: 15,767

Posts: 14

Engagements: 3,241

Likes: 2,727

Alex Razvant's Best Posts (last 30 days)

Use Taplio to search all-time best posts

2025/05/01

A complete 2025 𝗼𝘃𝗲𝗿𝘃𝗶𝗲𝘄 𝗼𝗳 𝘁𝗵𝗲 𝗖𝗼𝗺𝗽𝘂𝘁𝗲𝗿 𝗩𝗶𝘀𝗶𝗼𝗻 𝗔𝗜 𝗙𝗶𝗲𝗹𝗱 👇 This is one of the longest articles I've ever written for my newsletter. Covered each topic without too many niche technicals.. Tried to keep it accessible to a larger audience. It starts from what a Pixel is, and builds up to Vision Language Models (VLMs) and Multimodal Generative AI. 𝗜𝗻 𝘀𝗵𝗼𝗿𝘁, 𝗜𝘁'𝗹𝗹 𝘁𝗮𝗸𝗲 𝘆𝗼𝘂 𝘁𝗵𝗿𝗼𝘂𝗴𝗵: ↳ Pixels, Images, Image Types ↳ Colors, Formats ↳ Sensors, Cameras, LiDAR ↳ Classic Image Processing with OpenCV ↳ CNN-based Computer Vision ↳ Object Detection, Tracking, Pose, Segmentation ↳ Generative AI, GANs, AE, VAE ↳ Diffusion Models ↳ Tesla Autopilot ↳ Vision Transformer (ViT), Diffusion Transformer (DiT) ↳ Text-to-Image, Text-to-Video ↳ Stable Diffusion, OpenAI Sora, FLUX ↳ Neural Radiance Fields (NeRF) and Gaussian Splats ↳ Google Maps 3D Rendering I aimed to include as many diagrams/GIFs as possible. 𝗖𝗼𝘃𝗲𝗿 𝗲𝗮𝗰𝗵 𝘁𝗼𝗽𝗶𝗰 𝘄𝗶𝘁𝗵 𝗲𝗻𝗼𝘂𝗴𝗵 𝗱𝗲𝘁𝗮𝗶𝗹𝘀. Add up-to-date references, no older than 2-3 years. 🔸 A few more interesting topics I'm planning to add: ↳ How DLSS is used in Video Games. ↳ 3D Shape Completion. ↳ 4D Motion-Cap Video Rendering. ↳ More on Generative AI & MultiModal side. 📒 𝐀𝐫𝐭𝐢𝐜𝐥𝐞: https://lnkd.in/dU2XGKHi 𝗘𝗻𝗷𝗼𝘆! ----- #deeplearning #machinelearning #artificialintelligence ----- 💡 Follow me for more 𝗲𝘅𝗽𝗲𝗿𝘁 𝗶𝗻𝘀𝗶𝗴𝗵𝘁𝘀 on AI/ML Engineering.

110

2025/04/24

21 𝗔𝗜/𝗠𝗟 𝗚𝗶𝘁𝗵𝘂𝗯 𝗿𝗲𝗽𝗼𝘀 you'll find interesting 👇 (+ short description on each) 1. 𝗙𝗶𝗿𝗲𝗰𝗿𝗮𝘄𝗹 Crawl websites to LLM-ready data with a single API. 💻 https://lnkd.in/d6R3hCHS 2. 𝗠𝗼𝗱𝗲𝗹𝗖𝗼𝗻𝘁𝗲𝘅𝘁𝗣𝗿𝗼𝘁𝗼𝗰𝗼𝗹 (𝗠𝗖𝗣) Give LLMs safe access to tools and data sources. 💻 https://shorturl.at/iUoxv 3. 𝗠𝗠𝗮𝗴𝗶𝗰 Training, building, and serving a large set of deep learning models. 💻 https://lnkd.in/dZF_jXAu 4. 𝗦𝘂𝗽𝗲𝗿𝗱𝘂𝗽𝗲𝗿 Framework for building AI-data workflows and applications. 💻 https://lnkd.in/dX5G8N9r 5. 𝗡𝗲𝗿𝗳𝗦𝘁𝘂𝗱𝗶𝗼 A simple API for end-to-end NeRFs. 💻 https://shorturl.at/nVsDE 6. 𝗟𝗮𝗻𝗴𝗙𝘂𝘀𝗲 An OSS LLM engineering platform for AI applications. 💻 https://lnkd.in/dr9fDZc4 7. 𝗧𝗮𝗯𝗯𝘆𝗠𝗟 Self-hosted LLMs as coding assistants. 💻 https://lnkd.in/dnEMvAtE 8. 𝗟𝗠𝗙𝗹𝗼𝘄 An efficient toolbox for finetuning LLMs. 💻 https://lnkd.in/dg64Shp4 9. 𝗚𝗮𝗿𝗮𝗸 Toolkit for probing LLM security and quality of outputs. 💻 https://lnkd.in/dRStRf3v 10. 𝗧𝗧𝗦 (𝗧𝗲𝘅𝘁-𝘁𝗼-𝗦𝗽𝗲𝗲𝗰𝗵) A popular library for advanced Text-to-Speech generation. 💻 https://lnkd.in/d_nUBFMT 11. 𝗦𝘂𝗻𝗼 𝗕𝗮𝗿𝗸 A text-to-audio model can be fine-tuned for highly realistic speech. 💻 https://lnkd.in/dFTPVbYa 12. 𝗢𝗟𝗠𝗼 Codebase for training and using AI2's OLMo LLM models. 💻 https://lnkd.in/d3YsVUge 13. 𝗧𝗶𝗻𝘆𝗴𝗿𝗮𝗱 A tiny deep learning framework, useful to can help understand the nuts & bolts of PyTorch. 💻 https://lnkd.in/dGRWgheu 14. 𝗠𝗶𝗰𝗿𝗼𝘀𝗼𝗳𝘁 𝗨𝗙𝗢 A UI-Focused multi-agent framework for Windows OS. 💻 https://lnkd.in/dcAywxgU 15 𝗨𝗻𝗶𝘁𝘆 𝗔𝗜 𝗔𝗴𝗲𝗻𝘁𝘀 Game environments for training agent simulations. 💻 https://shorturl.at/aEh83 16. 𝗗𝗲𝗽𝘁𝗵𝗔𝗻𝘆𝘁𝗵𝗶𝗻𝗴 Foundation model for robust monocular depth 💻 https://shorturl.at/iuFhe 17. 𝗚𝗲𝗺𝗺𝗮 𝗖𝗣𝗣 C++ implementation of Google Gemma LLM. 💻 https://lnkd.in/drqupR6C 18. 𝗚𝗿𝗼𝗸-𝟭 Open-source implementation of XAI grok-1 model. 💻 https://lnkd.in/dXGqUbqj 19. 𝗠𝗼𝘀𝗮𝗶𝗰𝗠𝗟 𝗦𝘁𝗿𝗲𝗮𝗺𝗶𝗻𝗴 Library to cache and stream datasets directly from cloud storage. 💻 https://lnkd.in/dKtNxhe4 20. 𝗥𝗮𝗴𝗙𝗹𝗼𝘄 OSS RAG engine based on deep document understanding. 💻 https://lnkd.in/dCppFXV4 21. 𝗩𝗶𝘀𝗶𝗼𝗻 𝗔𝗴𝗲𝗻𝘁 Generate code to solve your vision tasks. 💻 https://lnkd.in/d9xqQvUS ----- #artificialintelligence #deeplearning #machinelearning ----- 💡 Follow me for more 𝗲𝘅𝗽𝗲𝗿𝘁 𝗶𝗻𝘀𝗶𝗴𝗵𝘁𝘀 on AI/ML Engineering.

203

2025/04/24

𝟭𝟭 𝗸𝗲𝘆 𝗺𝗲𝘁𝗿𝗶𝗰𝘀 to monitor your 𝗗𝗲𝗲𝗽 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 models 𝗶𝗻 𝗽𝗿𝗼𝗱𝘂𝗰𝘁𝗶𝗼𝗻 using Triton Server! You might believe that once the model is deployed, the job is done and you can work on the next improvements, be it better accuracy metrics or re-training, but the job isn't done! 𝗟𝗲𝘁 𝗺𝗲 𝘁𝗲𝗹𝗹 𝘆𝗼𝘂 𝘄𝗵𝘆: It would be best if you spent more time gathering insights on how the model performs regarding latency, TCO (Total Cost of Ownership), throughput, and energy footprint. 🔹 Here are 𝘁𝗵𝗲 𝘁𝗼𝗽 𝟭𝟭 metrics that Triton monitors for you that you 𝘀𝗵𝗼𝘂𝗹𝗱 𝗸𝗲𝗲𝗽 𝗮𝗻 𝗲𝘆𝗲 𝗼𝗻: → 𝗻𝘃_𝗶𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲_𝗿𝗲𝗾𝘂𝗲𝘀𝘁_𝘀𝘂𝗰𝗰𝗲𝘀𝘀 Tracks successful inference requests, monitors server health, and identifies bottlenecks. → 𝗻𝘃_𝗶𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲_𝗿𝗲𝗾𝘂𝗲𝘀𝘁_𝗳𝗮𝗶𝗹𝘂𝗿𝗲 Counts failed inference requests to help quickly troubleshoot issues. → 𝗻𝘃_𝗶𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲_𝗰𝗼𝘂𝗻𝘁 Measures the total inferences processed, indicating server workload and throughput. → 𝗻𝘃_𝗶𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲_𝗲𝘅𝗲𝗰_𝗰𝗼𝘂𝗻𝘁 Reveals the demand on specific models, aiding in resource optimization. → 𝗻𝘃_𝗶𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲_𝗿𝗲𝗾𝘂𝗲𝘀𝘁_𝗱𝘂𝗿𝗮𝘁𝗶𝗼𝗻_𝘂𝘀 Monitors inference request completion time, crucial for meeting latency requirements. → 𝗻𝘃_𝗶𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲_𝗾𝘂𝗲𝘂𝗲_𝗱𝘂𝗿𝗮𝘁𝗶𝗼𝗻_𝘂𝘀 Identifies bottlenecks by tracking request queue times. → 𝗻𝘃_𝗶𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲_𝗰𝗼𝗺𝗽𝘂𝘁𝗲_𝗱𝘂𝗿𝗮𝘁𝗶𝗼𝗻_𝘂𝘀 Provides insights into processing efficiency and potential optimizations. → 𝗻𝘃_𝗴𝗽𝘂_𝘂𝘁𝗶𝗹𝗶𝘇𝗮𝘁𝗶𝗼𝗻 Shows how effectively GPU resources are utilized, crucial for scaling. → 𝗻𝘃_𝗴𝗽𝘂_𝗺𝗲𝗺𝗼𝗿𝘆_𝘁𝗼𝘁𝗮𝗹_𝗯𝘆𝘁𝗲𝘀 and 𝗻𝘃_𝗴𝗽𝘂_𝗺𝗲𝗺𝗼𝗿𝘆_𝘂𝘀𝗲𝗱_𝗯𝘆𝘁𝗲𝘀 Manage memory resources. → 𝗻𝘃_𝗲𝗻𝗲𝗿𝗴𝘆_𝗰𝗼𝗻𝘀𝘂𝗺𝗽𝘁𝗶𝗼𝗻 Provides stats on GPU energy consumption. → 𝗻𝘃_𝗶𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲_𝗹𝗼𝗮𝗱_𝗿𝗮𝘁𝗶𝗼 Offers insights into load distribution, helping with efficient resource use and load balancing. --- ⭐ I'm working on something cool that I'm going to announce soon. Subscribe to my newsletter as I'll roll out the updates there. 🔗 Newsletter: https://lnkd.in/dgWB64cX --- #deeplearning #artificialintelligence #machinelearning --- 💡 I share expert insights on AI Systems and help you upskill as an AI Engineer. Follow for more!

2025/04/17

Roboflow has been on quite a surge lately! If you working with CV or looking to get started with Vision AI, these should be on your list : - Trackers (https://lnkd.in/drH_SvAj) - newest one, excited for this! - Supervision (https://lnkd.in/ddWwNsZ3) - toolkit for reusable CV components, it cuts the boilerplate code by a ton! - Maestro (https://lnkd.in/duceppkw) - finetuning VLMs. - Autodistill (https://lnkd.in/d7hMebmC) - modular zero-shot detection, really nice library. Piotr way to go! 🔥 🔥 🔥

Piotr Skalski

21 days ago

Introducing Trackers: All-in-One Object Tracking Library 🔥 🔥 🔥 TL;DR: Together with Soumik Rakshit, I’m building an all‑in‑one tracking toolkit: multi‑object tracking, tracker fine‑tuning, and re‑identification in one place. The first official release drops next week! - Plug‑and‑play integration with detectors from Transformers, Inference, Ultralytics, PaddlePaddle, MMDetection, and more. - Built‑in support for SORT and DeepSORT today, with StrongSORT, BoT‑SORT, ByteTrack, OC‑SORT, and additional trackers on the way. - Released under the open Apache 2.0 license. ⮑ 🔗 trackers: https://lnkd.in/dy6tSiS8 Quick‑start notebook link in the comments. 👇🏻

2025/04/24

I'd love your feedback on this 👇 In my newsletter, I've been mostly unpacking and explaining AI tools and frameworks used in the industry, which many found helpful. To get more practical, I thought about building a full AI System, starting from the ground up, and guide you through each component step by step. From the ground up, meaning: - Business Logic, Design Decisions - Tooling, structure - Data collection & curation - Training, Evaluation - Workflows, Pipelines - Engineering (code, app, tests) - Optimisation (model level, system level) - Monitoring (pre-deployment, post-deployment) - MLOps - and many other concepts. I have a few ideas and sketches in mind, but find it difficult to decide on the length, complexity, domain, and format - and I’d love your feedback. 𝐇𝐨𝐰 𝐝𝐨𝐞𝐬 𝐭𝐡𝐚𝐭 𝐬𝐨𝐮𝐧𝐝? Find the polls in the short article below - your vote will help a ton! Thanks, appreciate it 🙏 https://lnkd.in/dCYZKAH6 --- #deeplearning #machinelearning #artificialintelligence --- 💡 Follow me for more 𝗲𝘅𝗽𝗲𝗿𝘁 𝗶𝗻𝘀𝗶𝗴𝗵𝘁𝘀 on AI/ML Engineering.

Preparing for First Neural Bits Course?

neuralbits.substack.com

2025/05/06

𝗙𝗲𝘄 𝗽𝗼𝗶𝗻𝘁𝘀 on writing a technical AI/ML Newsletter 👇 𝟭. 𝗜𝗺𝗽𝗿𝗼𝘃𝗲𝘀 𝘁𝗵𝗶𝗻𝗸𝗶𝗻𝗴 𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲 I've written a lot of code, and documentation and project ramp-up guides, and feature descriptions, and more... I've found writing a newsletter more complicated than all that at times. Even if I master the technicals, turning them into a larger or general audience piece of content is tricky and requires a lot of fine-tuning and editing. This habit of reiteration became helpful as it brought structure into my thoughts and helped me express ideas more clearly. 𝟮. 𝗟𝗮𝗻𝗴𝘂𝗮𝗴𝗲++ English was not my first language, and I didn't study it in a structured manner until the first year of college. In middle and high school, I studied French and Russian. I picked up English from music, movies, tv-shows and video games while growing up, more or less. Well, that left quite a few large gaps. Writing definitely helped! 𝟯. 𝗚𝗲𝘁𝘀 𝘆𝗼𝘂 𝗗𝗲𝗲𝗽𝗲𝗿 𝗘𝘅𝗽𝗲𝗿𝘁𝗶𝘀𝗲 I might know how AI works at the lowest level. In my head, everything connects nicely and everything makes sense. However, to explain all that for a general audience - that's a challenge, especially when starting. You'll not only have to unpack it step by step, but also provide resources and references, as your audience won't trust your expertise just because "trust me, bro". That develops a habit of continuous learning and knowing where and how to look for information. More importantly, how to digest it and explain it to others through your view. 𝟰. 𝗢𝘃𝗲𝗿𝗰𝗼𝗺𝗲 𝗜𝗺𝗽𝗼𝘀𝘁𝗲𝗿 𝗦𝘆𝗻𝗱𝗿𝗼𝗺𝗲 The usual "what if", "I don't know that much", "what will the comments say", "What if I get checked on this", blabla. Your first articles will always be bad. Look past that and target to improve rather than making everything perfect from the first go. When you write, you also learn. This reflected in my career role, and built up confidence to express ideas and make decisions. --- Finally, if you're planning on starting a newsletter or sharing your thoughts, start doing so. You've got nothing to lose! --- If curious, find it here, I talk about AI/ML Engineering: 📙 𝗡𝗲𝘂𝗿𝗮𝗹 𝗕𝗶𝘁𝘀 𝗡𝗲𝘄𝘀𝗹𝗲𝘁𝘁𝗲𝗿 https://lnkd.in/dgWB64cX #machinelearning #artificialintelligence #writing ----

2025/04/24

A 𝘀𝗵𝗼𝗿𝘁 glossary of 𝗔𝗜 𝘁𝗼 𝗘𝗻𝗴𝗹𝗶𝘀𝗵 terms 👇 (LLM Training Edition) 𝗥𝗡𝗡𝗦 = Recurrent Neural Networks, the precursor for transformers. They processed data sequentially while keeping an internal state. 𝗧𝗿𝗮𝗻𝘀𝗳𝗼𝗿𝗺𝗲𝗿 = a novel network architecture that came out in 2017 and solved the pain points of RNNs. Powers-up ChatGPT, Claude, Llama, and DeepSeek architectures. 𝗟𝗟𝗠 = transformer-based models trained on a large volume of text data for language modeling. It processes sequences of tokens. 𝗧𝗼𝗸𝗲𝗻 = pieces of words, characters that LLMs take as input. 𝗧𝗼𝗸𝗲𝗻𝗶𝘇𝗲𝗿 = algorithm that converts text into tokens. Each LLM comes with its own trained tokenizer. 𝗘𝗺𝗯𝗲𝗱𝗱𝗶𝗻𝗴𝘀 = vector of numbers that describe a data point in a high-dimensional space. 𝗣𝗿𝗼𝗺𝗽𝘁 = the text sentence that goes into an LLM. 𝗦𝗲𝗹𝗳-𝗔𝘁𝘁𝗲𝗻𝘁𝗶𝗼𝗻 = matrix multiplication between token embeddings of the input prompt so the model can learn the relationships between words/tokens. 𝗔𝘂𝘁𝗼𝗿𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝗼𝗻 = LLMs are auto-regressive, as they predict the next N token based on the previous 1 ... N-1 tokens, one at a time. 𝗣𝗼𝘀𝗶𝘁𝗶𝗼𝗻𝗮𝗹 𝗘𝗻𝗰𝗼𝗱𝗶𝗻𝗴 = adds a position embedding to each token embedding to mark setup order. 𝗦𝗽𝗲𝗰𝗶𝗮𝗹 𝗧𝗼𝗸𝗲𝗻𝘀 = a set of tokens that act like markers and impose a specific behavior. For example,"/start/" and "/end/" for LLM generation. 𝗠𝘂𝗹𝘁𝗶-𝗛𝗲𝗮𝗱 𝗔𝘁𝘁𝗲𝗻𝘁𝗶𝗼𝗻 = split the attention mechanism into parallel heads to focus on different patterns (e.g., text syntax or semantics) 𝗘𝗻𝗰𝗼𝗱𝗲𝗿 = encodes information about the input sequence into a fixed-length embedding vector. 𝗗𝗲𝗰𝗼𝗱𝗲𝗿 = generates the output token by token. 𝗖𝗮𝘂𝘀𝗮𝗹 𝗠𝗮𝘀𝗸𝗶𝗻𝗴 = when training, we have the entire text sentence. Masking prevents the decoder from "cheating/looking" at the future tokens it needs to generate and forces it to focus only on the previously generated ones. Not needed at inference. 𝗣𝗿𝗲𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 = training from 0. 𝗙𝗶𝗻𝗲𝘁𝘂𝗻𝗶𝗻𝗴 = adapting a model to specific tasks using an engineered dataset. 𝗟𝗼𝗥𝗔 (Low Rank Adaptation) = inserts a small set of low-rank matrices as trainable parameters, leaving model parameters intact. Requires fewer resources it trains a subset of low-rank matrices only. 𝗤𝗟𝗼𝗥𝗔 = Quantized LoRA, quantized low-rank matrices. 𝗧𝗲𝗺𝗽𝗲𝗿𝗮𝘁𝘂𝗿𝗲 = a 0..1 factor that specifies if the output generation is more deterministic (0) or stochastic (1). Also known as creativity. 𝗧𝗼𝗽-𝗞 𝗦𝗮𝗺𝗽𝗹𝗶𝗻𝗴 = limits the LLM output layer to select only top K token predictions. --- #artificialintelligence #deeplearning #machinelearning --- 💡 Follow me for more 𝗲𝘅𝗽𝗲𝗿𝘁 𝗶𝗻𝘀𝗶𝗴𝗵𝘁𝘀 on AI/ML Engineering.

2025/04/24

𝗣𝗿𝗼 𝘁𝗶𝗽: Working with YAML configs in ML Projects? Start using Hydra + OmegaConf 👇 ML projects have multiple dynamic components. Keeping track of which configuration was used might become difficult. I found Hydra + OmegaConf to get the job done. Here are the key details: 1. 𝗢𝗺𝗲𝗴𝗮𝗖𝗼𝗻𝗳 is a hierarchical configuration system explicitly designed for complex applications like ML pipelines. 🔸 You can: ↳ Merge multiple configuration files directly. ↳ Access fields via attribute notation "config.model.optimizer" or dict notation config["model"]["optimizer"] ↳ Add Type safety against schemas such as dataclasses or Pydantic models. 2. 𝗛𝘆𝗱𝗿𝗮 builds on OmegaConf, creating a full-featured framework for configuration management in complex applications. Initially created by Facebook Research (FAIR), Hydra is well-suited for ML workflows. 🔸 You can: ↳ Group multiple YAMLs into a single configuration file and reference them directly by filename. ``` config.yaml defaults: - mydata.yaml - training: trainAB.yaml - inference: inferAB.yaml ``` When loading the `config.yaml` using Hydra, we'll get a nested configuration object, from which we can access any field. ↳ Override any config value at runtime. For example, if in the config.yaml: ``` model: optimizer: SGD ``` At runtime, you can change the value with: ``` python train.py model.optimizer=Adam ``` 🔹 𝗧𝗵𝗶𝘀 𝗮𝗹𝘀𝗼 𝘄𝗼𝗿𝗸𝘀 𝗳𝗼𝗿 𝗻𝗲𝘀𝘁𝗲𝗱 𝗰𝗼𝗻𝗳𝗶𝗴𝘀. ↳ Run multiple configs at the same time using Hydra sweeps. ``` python train.py -m data.dataset=A1, B12, C55 ``` This will spawn multiple processes and run in parallel, automatically saving logs. --- 𝗛𝗼𝘄 𝗱𝗼 𝘆𝗼𝘂 𝗺𝗮𝗻𝗮𝗴𝗲 𝘆𝗼𝘂𝗿 𝗰𝗼𝗻𝗳𝗶𝗴𝘂𝗿𝗮𝘁𝗶𝗼𝗻𝘀? #deeplearning #artificialintelligence #machinelearning --- 💡 Follow me for daily expert insights on AI/ML Engineering.

2025/05/01

𝗪𝗼𝗿𝗸𝗶𝗻𝗴 𝘄𝗶𝘁𝗵 𝗟𝗟𝗠𝘀? You'll like this set of resources on LLM Post-Training 👇 𝘍𝘪𝘳𝘴𝘵 𝘰𝘧𝘧, 𝘸𝘩𝘢𝘵 𝘪𝘴 𝘗𝘰𝘴𝘵-𝘛𝘳𝘢𝘪𝘯𝘪𝘯𝘨? Once a new Foundation Model (LLM/VLM) was pretrained from scratch on vast web-scale data, the focus is on post-training techniques to achieve further breakthroughs. If you've ever fine-tuned an LLM, you used a post-training technique. Here are a few examples: 𝗧𝘂𝗻𝗶𝗻𝗴 ↳ PEFT ↳ Full Model Finetuning ↳ LoRA, Adapters ↳ Knowledge Distillation 𝗦𝗰𝗮𝗹𝗶𝗻𝗴 ↳ Chain of Thought (CoT) ↳ Tree of Thought (ToT) 𝗥𝗲𝗶𝗻𝗳𝗼𝗿𝗰𝗶𝗻𝗴 ↳ RLHF (reinforcement learning, human feedback) ↳ DPO (direct preference optimization) ↳RLAIF (reinforcement learning, AI feedback) .. and these only scratch the surface. This repo groups all techniques, with key papers, resources, and surveys on every major post-training technique out there. 💻 LLM Post-Training: https://lnkd.in/dQj78C3X --- #artificialintelligence #deeplearning #machinelearning --- 💡 Follow me for 𝗲𝘅𝗽𝗲𝗿𝘁 𝗶𝗻𝘀𝗶𝗴𝗵𝘁𝘀 on AI/ML Engineering

315

2025/04/24

An LLM model on HuggingFace has multiple files attached: 𝘄𝗵𝗮𝘁'𝘀 𝘁𝗵𝗲𝗶𝗿 𝗽𝘂𝗿𝗽𝗼𝘀𝗲? 👇 Each model repo on HF contains 3 tabs. The 𝗠𝗼𝗱𝗲𝗹 𝗖𝗮𝗿𝗱 with the model architecture, benchmarks, license, and other details. The 𝗙𝗶𝗹𝗲𝘀 section contains the actual model's files. The 𝗖𝗼𝗺𝗺𝘂𝗻𝗶𝘁𝘆 𝗦𝗲𝗰𝘁𝗶𝗼𝗻 with discussion threads, pull requests, and other resources. 🔹 Let's unpack the LLM file structure. 1️⃣ 𝘼𝙧𝙘𝙝𝙞𝙩𝙚𝙘𝙩𝙪𝙧𝙚 𝘾𝙤𝙣𝙛𝙞𝙜𝙪𝙧𝙖𝙩𝙞𝙤𝙣 This file, usually named config.json, contains metadata on model architecture, layer activations, sizes, vocabulary size, number of attention heads, model precision, and more. The transformers library knows how to parse this config and build the model architecture. 2️⃣ 𝙈𝙤𝙙𝙚𝙡 𝙒𝙚𝙞𝙜𝙝𝙩𝙨 Due to LLMs having B of parameters, the models are usually split into parts for safer download, as no one would like to download an 800GB model and get a network error, ending up with the entire model file being corrupted. These model weights come in either .bin format or .safetensors, a newer, safer format proposed by HuggingFace. Safetensors format is an alternative to the default Pickle serializer that PyTorch (pt) used, as it’s vulnerable to code injection. 3️⃣ 𝙇𝙖𝙮𝙚𝙧 𝙈𝙖𝙥𝙥𝙞𝙣𝙜 Since the models are large and weights come as part files (e.g., 0001-of-0006, 0002-of-0006, etc.), this file stores a sequential map of the model architecture, specifying which part file each layer has its weights in. 4️⃣ 𝙏𝙤𝙠𝙚𝙣𝙞𝙯𝙚𝙧 𝘾𝙤𝙣𝙛𝙞𝙜 The tokenizer config file contains metadata about which tokenizer and configuration were used to train this model. It also shows the class name used to instantiate the tokenizer, the layer names, and how the inputs are processed before passing through the model. This also contains 𝘴𝘱𝘦𝘤𝘪𝘢𝘭_𝘵𝘰𝘬𝘦𝘯𝘴, tokens not derived from input that LLM uses as markers to stop generation, mark the chat template, differentiate between text and image modalities, etc. 5️⃣ 𝙂𝙚𝙣𝙚𝙧𝙖𝙩𝙞𝙤𝙣 𝘾𝙤𝙣𝙛𝙞𝙜 These configuration files contain metadata for Inference, such as Temperature and TopP/TopK thresholds or context window size the model was trained with. Also, it specifies the token IDs for the special tokens so that the tokenizer can append these IDs to the sequence. 𝗦𝗲𝗲 𝘁𝗵𝗲 𝗱𝗶𝗮𝗴𝗿𝗮𝗺 𝗯𝗲𝗹𝗼𝘄 𝗳𝗼𝗿 𝘁𝗵𝗲 𝘀𝘂𝗺𝗺𝗮𝗿𝗶𝘇𝗲𝗱 𝘃𝗲𝗿𝘀𝗶𝗼𝗻 👇 ----- #artificialintelligence #deeplearning #machinelearning ----- 💡 Follow me for more 𝗲𝘅𝗽𝗲𝗿𝘁 𝗶𝗻𝘀𝗶𝗴𝗵𝘁𝘀 on AI/ML Engineering.

302

2025/05/05

These are some of my favorite resources for learning about applied GenAI & LLMs 👇 I focus solely on long-form videos in this one. I know you might have seen multiple similar posts. But in this one, I've grouped 9 videos that have helped me pick low-level details on topics such as: ↳ Production RAG (Jerry Liu, LLamaIndex) ↳ LLM Inference Optimization (NVIDIA) ↳ High-Level Agentic Patterns (Neural Maze) ↳ Low-level Maths of Transformers (Unsloth) ↳ Transformer-specific Hardware (Groq LPU) ↳ MCP (Anthropic) 🔹 𝗧𝗼 𝘀𝗮𝘃𝗲 𝘆𝗼𝘂𝗿 𝘁𝗶𝗺𝗲, 𝗵𝗲𝗿𝗲 𝗮𝗿𝗲 𝗺𝘆 𝗿𝗲𝗰𝗼𝗺𝗺𝗲𝗻𝗱𝗮𝘁𝗶𝗼𝗻𝘀: 1. For everyone, check the LLM Inference Optimization one from Mark Moyou (NVIDIA) 2. RAG-specific, see the lessons learned from Production RAG (Jerry Liu) 3. The video from Anthropic on MCP is long and detailed; feel free to skip sections. 4. Building and understanding LLMs, from Tokenizer to Inference - videos from Andrej and Sebastian. 𝗘𝘅𝘁𝗿𝗮: - If you're interested in Hardware, GPUs, and Architecture, check Igor's (Head of Silicon at Groq) walkthrough on how Groq LPUs work. - If you're interested in low-level maths in LLMs and how Unsloth optimizes training and inference - Daniel Han's video. Find all resources alongside other details in this article: 📙 𝗔𝗿𝘁𝗶𝗰𝗹𝗲: https://lnkd.in/daNSm3Ct --- #deeplearning #machinelearning #artificialintelligence --- 💡 Follow me, I help you learn AI/ML Engineering.

261

2025/04/24

If you work with LLMs, you might like this 👇 TransformerLab is an open-source toolkit for LLMs enabling fine-tuning, visualization, tracking, and inference with multiple HuggingFace models. Found out about it just yesterday, scrolling through /r/locallama subreddit. 𝗛𝗲𝗿𝗲'𝘀 𝗮 𝘀𝗵𝗼𝗿𝘁 𝘀𝘂𝗺𝗺𝗮𝗿𝘆: ↳ Integrates with HF Models ↳ Has built-in TensorBoard for experiment tracking ↳ It's got a built-in tokenizer visualizer ↳ Will enable a model architecture visualizer ↳ Every component is built as a plugin ↳ Interactive RAG Tab to quickly test a basic RAG application ↳ Cross OS (Linux, macOS) ↳ Integration with MLX, vLLM, llama.cpp inference engines ↳ RLHF and Preference Optimization 📙 Docs: https://lnkd.in/d9RKJK7Y 💻 Code: https://lnkd.in/dBr5r5hy --- #artificialintelligence #machinelearning #llm --- 💡 Follow me for more expert insights on AI/ML!

325

2025/05/01

If you're still using FastAPI to deploy Hugging Face LLMs/VLMs - try 𝗟𝗶𝘁𝗔𝗽𝗶! FastAPI is a great framework for implementing RESTful APIs. However, it wasn’t specifically designed to handle the complex requirements of serving ML models at scale. The team at Lightning AI is behind LitServe and LitApi to fill in that gap. 🔹 𝗟𝗶𝘁𝗔𝗣𝗜 builds on top of FastAPI, adapting for ML workloads and standardizing the core steps of serving a model. 🔹 𝗟𝗶𝘁𝗦𝗲𝗿𝘃𝗲𝗿 handles the infrastructure side of serving models. 🔸 Here's what you must know: 1. 𝗢𝗻𝗲-𝘁𝗶𝗺𝗲 𝗺𝗼𝗱𝗲𝗹 𝘀𝗲𝘁𝘂𝗽 In the 𝙨𝙚𝙩𝙪𝙥() method, we can load any model only once. 2. 𝗖𝘂𝘀𝘁𝗼𝗺𝗶𝘇𝗲 𝗣𝗿𝗲𝗱𝗶𝗰𝘁 In the 𝙥𝙧𝙚𝙙𝙞𝙘𝙩() method, we implement the inference on input logic. 3. 𝗖𝘂𝘀𝘁𝗼𝗺𝗶𝘇𝗲 𝗕𝗮𝘁𝗰𝗵𝗶𝗻𝗴 𝗟𝗼𝗴𝗶𝗰 You can specify a MAX_BATCH_SIZE and a BATCH_TIME_WINDOW, and it'll automatically handle the dynamic batching of requests as they come in concurrently. You can use ThreadPoolExecutor to parallelize the preprocessing steps in the 𝙗𝙖𝙩𝙘𝙝() method. 4. 𝗖𝘂𝘀𝘁𝗼𝗺𝗶𝘇𝗲 𝗨𝗻𝗯𝗮𝘁𝗰𝗵𝗶𝗻𝗴 𝗟𝗼𝗴𝗶𝗰 After inferencing on a batch, you'll handle the detach () of GPU tensors and post-process the raw logits in the 𝙪𝙣𝙗𝙖𝙩𝙘𝙝() method. 5. 𝗗𝗲𝗰𝗼𝗱𝗲 𝗿𝗲𝗾𝘂𝗲𝘀𝘁 𝗮𝗻𝗱 𝗲𝗻𝗰𝗼𝗱𝗲 𝗿𝗲𝘀𝗽𝗼𝗻𝘀𝗲 In the 𝙙𝙚𝙘𝙤𝙙𝙚_𝙧𝙚𝙦𝙪𝙚𝙨𝙩() - specify how the API should access the input value from the request. In the 𝙚𝙣𝙘𝙤𝙙𝙚_𝙧𝙚𝙨𝙥𝙤𝙣𝙨𝙚() - specify how the API should return responses to the client. Simple as that! To scale this up for a production workload, you'll use LitServe's scale configuration parameters: ``` LitServer( lit_api: LitAPI, accelerator: str = "auto", devices: Union[str, int] = "auto", workers_per_device: int = 1, timeout: Union[float, bool] = 30, max_batch_size: int = 1, batch_timeout: float = 0.0, stream: bool = False, ) ``` 📙 For a full tutorial, see this article: https://lnkd.in/dGUrVX7s --- #machinelearning #deeplearning #artificialintelligence --- 💡 Follow me for 𝗲𝘅𝗽𝗲𝗿𝘁 𝗶𝗻𝘀𝗶𝗴𝗵𝘁𝘀 on AI/ML Engineering

2025/04/24

The 𝗔𝗜/𝗠𝗟 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿'𝘀 𝗴𝘂𝗶𝗱𝗲 𝘁𝗼 𝗺𝘂𝘀𝘁-𝗸𝗻𝗼𝘄 NVIDIA AI Frameworks Forget about the "Vibe Coding" frenzy and make sure you know what each of these is doing. 👇 1️⃣ 𝗖𝗨𝗗𝗔 Parallel computing platform and API to accelerate computation on NVIDIA GPUs. Keypoints: ↳ Kernels - C/C++ functions. ↳ Thread - executes the kernel instructions. ↳ Block - groups of threads. ↳ Grid - collection of blocks. ↳ Streaming Multiprocessor (SM) - processor units that execute thread blocks. When a CUDA program invokes a kernel grid, the thread blocks are distributed to the SMs. CUDA follows the SIMT (Single Instruction Multiple Threads) architecture to execute threads logic and uses a Barrier to gather and synchronize Threads. 2️⃣ 𝗰𝘂𝗗𝗡𝗡 Library with highly tuned implementations for standard routines such as: ↳ forward and backward convolution ↳ attention ↳ matmul, pooling, and normalization - which are used in all NN Architectures. 3️⃣ 𝗧𝗲𝗻𝘀𝗼𝗿𝗥𝗧 If we unpack a model architecture, we have multiple layer types, operations, layer connections, activations, etc. Imagine an NN architecture as a complex Graph of operations. TensorRT can: ↳ Scan that graph ↳ Identify bottlenecks ↳ Optimize ↳ Remove, merge layers ↳ Reduce layer precisions, ↳ Many other optimizations. 4️⃣ 𝗧𝗲𝗻𝘀𝗼𝗿𝗥𝗧-𝗟𝗟𝗠 Inference Engine that brings the TensorRT Compiler optimizations to Transformer-based models. Covers the advanced and custom requirements for LLMs, such as: ↳ KV Caching ↳ Inflight Batching ↳ Optimized Attention Kernels ↳Tensor Parallel ↳ Pipeline Parallel. 5️⃣ 𝗧𝗿𝗶𝘁𝗼𝗻 𝗜𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗦𝗲𝗿𝘃𝗲𝗿 An open source, high-performance, and secure serving system for AI Workloads. Devs can optimize their models, define serving configurations in Protobuf Text files, and deploy. It supports multiple framework backends, including: ↳ Native PyTorch, TensorFlow ↳ TensorRT, TensorRT-LLM ↳ Custom BLS (Bussiness Language Scripting) with Python Backends 6️⃣ 𝗡𝗩𝗜𝗗𝗜𝗔 𝗡𝗜𝗠 Set of plug-and-play inference microservices that package up multiple NVIDIA libraries and frameworks highly tuned for serving LLMs to production cluster & datacenters scale. It has: ↳ CUDA, cuDNN ↳ TensorRT ↳ Triton Server ↳ Many other libraries - baked in. NIM provides the optimal serving configuration for an LLM. 7️⃣ 𝗗𝘆𝗻𝗮𝗺𝗼 𝗜𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗙𝗿𝗮𝗺𝗲𝘄𝗼𝗿𝗸 The newest inference framework for accelerating and scaling GenAI workloads. Composed of modular blocks, robust and scalable. Implements: ↳ Elastic compute - GPU Planner ↳ KV Routing, Sharing, and Caching ↳ Disaggregated Serving of Prefill and Decode. --- #deeplearning #artificialintelligence #machinelearning --- 💡 Follow me for more practical expert insights on AI/ML Engineering.

385

Want to drive more opportunities from LinkedIn?

Content Inspiration, AI, scheduling, automation, analytics, CRM.

Get all of that and more in Taplio.

Try Taplio for free