Get the Linkedin stats of Alex Razvant and many LinkedIn Influencers by Taplio.
open on linkedin
Hard work always pays off, be consistent! I'm a Senior Machine Learning Engineer with a keen interest in developing AI/ML solutions for real world problems and helping others get started on their Machine Learning Journey. Let's connect @: ๐ https://medium.com/@alexandrurazvant โ๏ธ alexandrurazvant@gmail.com ๐ https://www.neuraleaps.com
Check out Alex Razvant's verified LinkedIn stats (last 30 days)
Use Taplio to search all-time best posts
A complete 2025 ๐ผ๐๐ฒ๐ฟ๐๐ถ๐ฒ๐ ๐ผ๐ณ ๐๐ต๐ฒ ๐๐ผ๐บ๐ฝ๐๐๐ฒ๐ฟ ๐ฉ๐ถ๐๐ถ๐ผ๐ป ๐๐ ๐๐ถ๐ฒ๐น๐ฑ ๐ This is one of the longest articles I've ever written for my newsletter. Covered each topic without too many niche technicals.. Tried to keep it accessible to a larger audience. It starts from what a Pixel is, and builds up to Vision Language Models (VLMs) and Multimodal Generative AI. ๐๐ป ๐๐ต๐ผ๐ฟ๐, ๐๐'๐น๐น ๐๐ฎ๐ธ๐ฒ ๐๐ผ๐ ๐๐ต๐ฟ๐ผ๐๐ด๐ต: โณ Pixels, Images, Image Types โณ Colors, Formats โณ Sensors, Cameras, LiDAR โณ Classic Image Processing with OpenCV โณ CNN-based Computer Vision โณ Object Detection, Tracking, Pose, Segmentation โณ Generative AI, GANs, AE, VAE โณ Diffusion Models โณ Tesla Autopilot โณ Vision Transformer (ViT), Diffusion Transformer (DiT) โณ Text-to-Image, Text-to-Video โณ Stable Diffusion, OpenAI Sora, FLUX โณ Neural Radiance Fields (NeRF) and Gaussian Splats โณ Google Maps 3D Rendering I aimed to include as many diagrams/GIFs as possible. ๐๐ผ๐๐ฒ๐ฟ ๐ฒ๐ฎ๐ฐ๐ต ๐๐ผ๐ฝ๐ถ๐ฐ ๐๐ถ๐๐ต ๐ฒ๐ป๐ผ๐๐ด๐ต ๐ฑ๐ฒ๐๐ฎ๐ถ๐น๐. Add up-to-date references, no older than 2-3 years. ๐ธ A few more interesting topics I'm planning to add: โณ How DLSS is used in Video Games. โณ 3D Shape Completion. โณ 4D Motion-Cap Video Rendering. โณ More on Generative AI & MultiModal side. ๐ ๐๐ซ๐ญ๐ข๐๐ฅ๐: https://lnkd.in/dU2XGKHi ๐๐ป๐ท๐ผ๐! ----- #deeplearning #machinelearning #artificialintelligence ----- ๐ก Follow me for more ๐ฒ๐ ๐ฝ๐ฒ๐ฟ๐ ๐ถ๐ป๐๐ถ๐ด๐ต๐๐ on AI/ML Engineering.
21 ๐๐/๐ ๐ ๐๐ถ๐๐ต๐๐ฏ ๐ฟ๐ฒ๐ฝ๐ผ๐ you'll find interesting ๐ (+ short description on each) 1. ๐๐ถ๐ฟ๐ฒ๐ฐ๐ฟ๐ฎ๐๐น Crawl websites to LLM-ready data with a single API. ๐ป https://lnkd.in/d6R3hCHS 2. ๐ ๐ผ๐ฑ๐ฒ๐น๐๐ผ๐ป๐๐ฒ๐ ๐๐ฃ๐ฟ๐ผ๐๐ผ๐ฐ๐ผ๐น (๐ ๐๐ฃ) Give LLMs safe access to tools and data sources. ๐ป https://shorturl.at/iUoxv 3. ๐ ๐ ๐ฎ๐ด๐ถ๐ฐ Training, building, and serving a large set of deep learning models. ๐ป https://lnkd.in/dZF_jXAu 4. ๐ฆ๐๐ฝ๐ฒ๐ฟ๐ฑ๐๐ฝ๐ฒ๐ฟ Framework for building AI-data workflows and applications. ๐ป https://lnkd.in/dX5G8N9r 5. ๐ก๐ฒ๐ฟ๐ณ๐ฆ๐๐๐ฑ๐ถ๐ผ A simple API for end-to-end NeRFs. ๐ป https://shorturl.at/nVsDE 6. ๐๐ฎ๐ป๐ด๐๐๐๐ฒ An OSS LLM engineering platform for AI applications. ๐ป https://lnkd.in/dr9fDZc4 7. ๐ง๐ฎ๐ฏ๐ฏ๐๐ ๐ Self-hosted LLMs as coding assistants. ๐ป https://lnkd.in/dnEMvAtE 8. ๐๐ ๐๐น๐ผ๐ An efficient toolbox for finetuning LLMs. ๐ป https://lnkd.in/dg64Shp4 9. ๐๐ฎ๐ฟ๐ฎ๐ธ Toolkit for probing LLM security and quality of outputs. ๐ป https://lnkd.in/dRStRf3v 10. ๐ง๐ง๐ฆ (๐ง๐ฒ๐ ๐-๐๐ผ-๐ฆ๐ฝ๐ฒ๐ฒ๐ฐ๐ต) A popular library for advanced Text-to-Speech generation. ๐ป https://lnkd.in/d_nUBFMT 11. ๐ฆ๐๐ป๐ผ ๐๐ฎ๐ฟ๐ธ A text-to-audio model can be fine-tuned for highly realistic speech. ๐ป https://lnkd.in/dFTPVbYa 12. ๐ข๐๐ ๐ผ Codebase for training and using AI2's OLMo LLM models. ๐ป https://lnkd.in/d3YsVUge 13. ๐ง๐ถ๐ป๐๐ด๐ฟ๐ฎ๐ฑ A tiny deep learning framework, useful to can help understand the nuts & bolts of PyTorch. ๐ป https://lnkd.in/dGRWgheu 14. ๐ ๐ถ๐ฐ๐ฟ๐ผ๐๐ผ๐ณ๐ ๐จ๐๐ข A UI-Focused multi-agent framework for Windows OS. ๐ป https://lnkd.in/dcAywxgU 15 ๐จ๐ป๐ถ๐๐ ๐๐ ๐๐ด๐ฒ๐ป๐๐ Game environments for training agent simulations. ๐ป https://shorturl.at/aEh83 16. ๐๐ฒ๐ฝ๐๐ต๐๐ป๐๐๐ต๐ถ๐ป๐ด Foundation model for robust monocular depth ๐ป https://shorturl.at/iuFhe 17. ๐๐ฒ๐บ๐บ๐ฎ ๐๐ฃ๐ฃ C++ implementation of Google Gemma LLM. ๐ป https://lnkd.in/drqupR6C 18. ๐๐ฟ๐ผ๐ธ-๐ญ Open-source implementation of XAI grok-1 model. ๐ป https://lnkd.in/dXGqUbqj 19. ๐ ๐ผ๐๐ฎ๐ถ๐ฐ๐ ๐ ๐ฆ๐๐ฟ๐ฒ๐ฎ๐บ๐ถ๐ป๐ด Library to cache and stream datasets directly from cloud storage. ๐ป https://lnkd.in/dKtNxhe4 20. ๐ฅ๐ฎ๐ด๐๐น๐ผ๐ OSS RAG engine based on deep document understanding. ๐ป https://lnkd.in/dCppFXV4 21. ๐ฉ๐ถ๐๐ถ๐ผ๐ป ๐๐ด๐ฒ๐ป๐ Generate code to solve your vision tasks. ๐ป https://lnkd.in/d9xqQvUS ----- #artificialintelligence #deeplearning #machinelearning ----- ๐ก Follow me for more ๐ฒ๐ ๐ฝ๐ฒ๐ฟ๐ ๐ถ๐ป๐๐ถ๐ด๐ต๐๐ on AI/ML Engineering.
๐ญ๐ญ ๐ธ๐ฒ๐ ๐บ๐ฒ๐๐ฟ๐ถ๐ฐ๐ to monitor your ๐๐ฒ๐ฒ๐ฝ ๐๐ฒ๐ฎ๐ฟ๐ป๐ถ๐ป๐ด models ๐ถ๐ป ๐ฝ๐ฟ๐ผ๐ฑ๐๐ฐ๐๐ถ๐ผ๐ป using Triton Server! You might believe that once the model is deployed, the job is done and you can work on the next improvements, be it better accuracy metrics or re-training, but the job isn't done! ๐๐ฒ๐ ๐บ๐ฒ ๐๐ฒ๐น๐น ๐๐ผ๐ ๐๐ต๐: It would be best if you spent more time gathering insights on how the model performs regarding latency, TCO (Total Cost of Ownership), throughput, and energy footprint. ๐น Here are ๐๐ต๐ฒ ๐๐ผ๐ฝ ๐ญ๐ญ metrics that Triton monitors for you that you ๐๐ต๐ผ๐๐น๐ฑ ๐ธ๐ฒ๐ฒ๐ฝ ๐ฎ๐ป ๐ฒ๐๐ฒ ๐ผ๐ป: โ ๐ป๐_๐ถ๐ป๐ณ๐ฒ๐ฟ๐ฒ๐ป๐ฐ๐ฒ_๐ฟ๐ฒ๐พ๐๐ฒ๐๐_๐๐๐ฐ๐ฐ๐ฒ๐๐ Tracks successful inference requests, monitors server health, and identifies bottlenecks. โ ๐ป๐_๐ถ๐ป๐ณ๐ฒ๐ฟ๐ฒ๐ป๐ฐ๐ฒ_๐ฟ๐ฒ๐พ๐๐ฒ๐๐_๐ณ๐ฎ๐ถ๐น๐๐ฟ๐ฒ Counts failed inference requests to help quickly troubleshoot issues. โ ๐ป๐_๐ถ๐ป๐ณ๐ฒ๐ฟ๐ฒ๐ป๐ฐ๐ฒ_๐ฐ๐ผ๐๐ป๐ Measures the total inferences processed, indicating server workload and throughput. โ ๐ป๐_๐ถ๐ป๐ณ๐ฒ๐ฟ๐ฒ๐ป๐ฐ๐ฒ_๐ฒ๐ ๐ฒ๐ฐ_๐ฐ๐ผ๐๐ป๐ Reveals the demand on specific models, aiding in resource optimization. โ ๐ป๐_๐ถ๐ป๐ณ๐ฒ๐ฟ๐ฒ๐ป๐ฐ๐ฒ_๐ฟ๐ฒ๐พ๐๐ฒ๐๐_๐ฑ๐๐ฟ๐ฎ๐๐ถ๐ผ๐ป_๐๐ Monitors inference request completion time, crucial for meeting latency requirements. โ ๐ป๐_๐ถ๐ป๐ณ๐ฒ๐ฟ๐ฒ๐ป๐ฐ๐ฒ_๐พ๐๐ฒ๐๐ฒ_๐ฑ๐๐ฟ๐ฎ๐๐ถ๐ผ๐ป_๐๐ Identifies bottlenecks by tracking request queue times. โ ๐ป๐_๐ถ๐ป๐ณ๐ฒ๐ฟ๐ฒ๐ป๐ฐ๐ฒ_๐ฐ๐ผ๐บ๐ฝ๐๐๐ฒ_๐ฑ๐๐ฟ๐ฎ๐๐ถ๐ผ๐ป_๐๐ Provides insights into processing efficiency and potential optimizations. โ ๐ป๐_๐ด๐ฝ๐_๐๐๐ถ๐น๐ถ๐๐ฎ๐๐ถ๐ผ๐ป Shows how effectively GPU resources are utilized, crucial for scaling. โ ๐ป๐_๐ด๐ฝ๐_๐บ๐ฒ๐บ๐ผ๐ฟ๐_๐๐ผ๐๐ฎ๐น_๐ฏ๐๐๐ฒ๐ and ๐ป๐_๐ด๐ฝ๐_๐บ๐ฒ๐บ๐ผ๐ฟ๐_๐๐๐ฒ๐ฑ_๐ฏ๐๐๐ฒ๐ Manage memory resources. โ ๐ป๐_๐ฒ๐ป๐ฒ๐ฟ๐ด๐_๐ฐ๐ผ๐ป๐๐๐บ๐ฝ๐๐ถ๐ผ๐ป Provides stats on GPU energy consumption. โ ๐ป๐_๐ถ๐ป๐ณ๐ฒ๐ฟ๐ฒ๐ป๐ฐ๐ฒ_๐น๐ผ๐ฎ๐ฑ_๐ฟ๐ฎ๐๐ถ๐ผ Offers insights into load distribution, helping with efficient resource use and load balancing. --- โญ I'm working on something cool that I'm going to announce soon. Subscribe to my newsletter as I'll roll out the updates there. ๐ Newsletter: https://lnkd.in/dgWB64cX --- #deeplearning #artificialintelligence #machinelearning --- ๐ก I share expert insights on AI Systems and help you upskill as an AI Engineer. Follow for more!
Roboflow has been on quite a surge lately! If you working with CV or looking to get started with Vision AI, these should be on your list : - Trackers (https://lnkd.in/drH_SvAj) - newest one, excited for this! - Supervision (https://lnkd.in/ddWwNsZ3) - toolkit for reusable CV components, it cuts the boilerplate code by a ton! - Maestro (https://lnkd.in/duceppkw) - finetuning VLMs. - Autodistill (https://lnkd.in/d7hMebmC) - modular zero-shot detection, really nice library. Piotr way to go! ๐ฅ ๐ฅ ๐ฅ
Piotr Skalski
Introducing Trackers: All-in-One Object Tracking Library ๐ฅ ๐ฅ ๐ฅ TL;DR: Together with Soumik Rakshit, Iโm building an allโinโone tracking toolkit: multiโobject tracking, tracker fineโtuning, and reโidentification in one place. The first official release drops next week! - Plugโandโplay integration with detectors from Transformers, Inference, Ultralytics, PaddlePaddle, MMDetection, and more. - Builtโin support for SORT and DeepSORT today, with StrongSORT, BoTโSORT, ByteTrack, OCโSORT, and additional trackers on the way. - Released under the open Apache 2.0 license. โฎ ๐ trackers: https://lnkd.in/dy6tSiS8 Quickโstart notebook link in the comments. ๐๐ป
I'd love your feedback on this ๐ In my newsletter, I've been mostly unpacking and explaining AI tools and frameworks used in the industry, which many found helpful. To get more practical, I thought about building a full AI System, starting from the ground up, and guide you through each component step by step. From the ground up, meaning: - Business Logic, Design Decisions - Tooling, structure - Data collection & curation - Training, Evaluation - Workflows, Pipelines - Engineering (code, app, tests) - Optimisation (model level, system level) - Monitoring (pre-deployment, post-deployment) - MLOps - and many other concepts. I have a few ideas and sketches in mind, but find it difficult to decide on the length, complexity, domain, and format - and Iโd love your feedback. ๐๐จ๐ฐ ๐๐จ๐๐ฌ ๐ญ๐ก๐๐ญ ๐ฌ๐จ๐ฎ๐ง๐? Find the polls in the short article below - your vote will help a ton! Thanks, appreciate it ๐ https://lnkd.in/dCYZKAH6 --- #deeplearning #machinelearning #artificialintelligence --- ๐ก Follow me for more ๐ฒ๐ ๐ฝ๐ฒ๐ฟ๐ ๐ถ๐ป๐๐ถ๐ด๐ต๐๐ on AI/ML Engineering.
๐๐ฒ๐ ๐ฝ๐ผ๐ถ๐ป๐๐ on writing a technical AI/ML Newsletter ๐ ๐ญ. ๐๐บ๐ฝ๐ฟ๐ผ๐๐ฒ๐ ๐๐ต๐ถ๐ป๐ธ๐ถ๐ป๐ด ๐๐๐ฟ๐๐ฐ๐๐๐ฟ๐ฒ I've written a lot of code, and documentation and project ramp-up guides, and feature descriptions, and more... I've found writing a newsletter more complicated than all that at times. Even if I master the technicals, turning them into a larger or general audience piece of content is tricky and requires a lot of fine-tuning and editing. This habit of reiteration became helpful as it brought structure into my thoughts and helped me express ideas more clearly. ๐ฎ. ๐๐ฎ๐ป๐ด๐๐ฎ๐ด๐ฒ++ English was not my first language, and I didn't study it in a structured manner until the first year of college. In middle and high school, I studied French and Russian. I picked up English from music, movies, tv-shows and video games while growing up, more or less. Well, that left quite a few large gaps. Writing definitely helped! ๐ฏ. ๐๐ฒ๐๐ ๐๐ผ๐ ๐๐ฒ๐ฒ๐ฝ๐ฒ๐ฟ ๐๐ ๐ฝ๐ฒ๐ฟ๐๐ถ๐๐ฒ I might know how AI works at the lowest level. In my head, everything connects nicely and everything makes sense. However, to explain all that for a general audience - that's a challenge, especially when starting. You'll not only have to unpack it step by step, but also provide resources and references, as your audience won't trust your expertise just because "trust me, bro". That develops a habit of continuous learning and knowing where and how to look for information. More importantly, how to digest it and explain it to others through your view. ๐ฐ. ๐ข๐๐ฒ๐ฟ๐ฐ๐ผ๐บ๐ฒ ๐๐บ๐ฝ๐ผ๐๐๐ฒ๐ฟ ๐ฆ๐๐ป๐ฑ๐ฟ๐ผ๐บ๐ฒ The usual "what if", "I don't know that much", "what will the comments say", "What if I get checked on this", blabla. Your first articles will always be bad. Look past that and target to improve rather than making everything perfect from the first go. When you write, you also learn. This reflected in my career role, and built up confidence to express ideas and make decisions. --- Finally, if you're planning on starting a newsletter or sharing your thoughts, start doing so. You've got nothing to lose! --- If curious, find it here, I talk about AI/ML Engineering: ๐ ๐ก๐ฒ๐๐ฟ๐ฎ๐น ๐๐ถ๐๐ ๐ก๐ฒ๐๐๐น๐ฒ๐๐๐ฒ๐ฟ https://lnkd.in/dgWB64cX #machinelearning #artificialintelligence #writing ----
A ๐๐ต๐ผ๐ฟ๐ glossary of ๐๐ ๐๐ผ ๐๐ป๐ด๐น๐ถ๐๐ต terms ๐ (LLM Training Edition) ๐ฅ๐ก๐ก๐ฆ = Recurrent Neural Networks, the precursor for transformers. They processed data sequentially while keeping an internal state. ๐ง๐ฟ๐ฎ๐ป๐๐ณ๐ผ๐ฟ๐บ๐ฒ๐ฟ = a novel network architecture that came out in 2017 and solved the pain points of RNNs. Powers-up ChatGPT, Claude, Llama, and DeepSeek architectures. ๐๐๐ = transformer-based models trained on a large volume of text data for language modeling. It processes sequences of tokens. ๐ง๐ผ๐ธ๐ฒ๐ป = pieces of words, characters that LLMs take as input. ๐ง๐ผ๐ธ๐ฒ๐ป๐ถ๐๐ฒ๐ฟ = algorithm that converts text into tokens. Each LLM comes with its own trained tokenizer. ๐๐บ๐ฏ๐ฒ๐ฑ๐ฑ๐ถ๐ป๐ด๐ = vector of numbers that describe a data point in a high-dimensional space. ๐ฃ๐ฟ๐ผ๐บ๐ฝ๐ = the text sentence that goes into an LLM. ๐ฆ๐ฒ๐น๐ณ-๐๐๐๐ฒ๐ป๐๐ถ๐ผ๐ป = matrix multiplication between token embeddings of the input prompt so the model can learn the relationships between words/tokens. ๐๐๐๐ผ๐ฟ๐ฒ๐ด๐ฟ๐ฒ๐๐๐ถ๐ผ๐ป = LLMs are auto-regressive, as they predict the next N token based on the previous 1 ... N-1 tokens, one at a time. ๐ฃ๐ผ๐๐ถ๐๐ถ๐ผ๐ป๐ฎ๐น ๐๐ป๐ฐ๐ผ๐ฑ๐ถ๐ป๐ด = adds a position embedding to each token embedding to mark setup order. ๐ฆ๐ฝ๐ฒ๐ฐ๐ถ๐ฎ๐น ๐ง๐ผ๐ธ๐ฒ๐ป๐ = a set of tokens that act like markers and impose a specific behavior. For example,"/start/" and "/end/" for LLM generation. ๐ ๐๐น๐๐ถ-๐๐ฒ๐ฎ๐ฑ ๐๐๐๐ฒ๐ป๐๐ถ๐ผ๐ป = split the attention mechanism into parallel heads to focus on different patterns (e.g., text syntax or semantics) ๐๐ป๐ฐ๐ผ๐ฑ๐ฒ๐ฟ = encodes information about the input sequence into a fixed-length embedding vector. ๐๐ฒ๐ฐ๐ผ๐ฑ๐ฒ๐ฟ = generates the output token by token. ๐๐ฎ๐๐๐ฎ๐น ๐ ๐ฎ๐๐ธ๐ถ๐ป๐ด = when training, we have the entire text sentence. Masking prevents the decoder from "cheating/looking" at the future tokens it needs to generate and forces it to focus only on the previously generated ones. Not needed at inference. ๐ฃ๐ฟ๐ฒ๐๐ฟ๐ฎ๐ถ๐ป๐ถ๐ป๐ด = training from 0. ๐๐ถ๐ป๐ฒ๐๐๐ป๐ถ๐ป๐ด = adapting a model to specific tasks using an engineered dataset. ๐๐ผ๐ฅ๐ (Low Rank Adaptation) = inserts a small set of low-rank matrices as trainable parameters, leaving model parameters intact. Requires fewer resources it trains a subset of low-rank matrices only. ๐ค๐๐ผ๐ฅ๐ = Quantized LoRA, quantized low-rank matrices. ๐ง๐ฒ๐บ๐ฝ๐ฒ๐ฟ๐ฎ๐๐๐ฟ๐ฒ = a 0..1 factor that specifies if the output generation is more deterministic (0) or stochastic (1). Also known as creativity. ๐ง๐ผ๐ฝ-๐ ๐ฆ๐ฎ๐บ๐ฝ๐น๐ถ๐ป๐ด = limits the LLM output layer to select only top K token predictions. --- #artificialintelligence #deeplearning #machinelearning --- ๐ก Follow me for more ๐ฒ๐ ๐ฝ๐ฒ๐ฟ๐ ๐ถ๐ป๐๐ถ๐ด๐ต๐๐ on AI/ML Engineering.
๐ฃ๐ฟ๐ผ ๐๐ถ๐ฝ: Working with YAML configs in ML Projects? Start using Hydra + OmegaConf ๐ ML projects have multiple dynamic components. Keeping track of which configuration was used might become difficult. I found Hydra + OmegaConf to get the job done. Here are the key details: 1. ๐ข๐บ๐ฒ๐ด๐ฎ๐๐ผ๐ป๐ณ is a hierarchical configuration system explicitly designed for complex applications like ML pipelines. ๐ธ You can: โณ Merge multiple configuration files directly. โณ Access fields via attribute notation "config.model.optimizer" or dict notation config["model"]["optimizer"] โณ Add Type safety against schemas such as dataclasses or Pydantic models. 2. ๐๐๐ฑ๐ฟ๐ฎ builds on OmegaConf, creating a full-featured framework for configuration management in complex applications. Initially created by Facebook Research (FAIR), Hydra is well-suited for ML workflows. ๐ธ You can: โณ Group multiple YAMLs into a single configuration file and reference them directly by filename. ``` config.yaml defaults: - mydata.yaml - training: trainAB.yaml - inference: inferAB.yaml ``` When loading the `config.yaml` using Hydra, we'll get a nested configuration object, from which we can access any field. โณ Override any config value at runtime. For example, if in the config.yaml: ``` model: optimizer: SGD ``` At runtime, you can change the value with: ``` python train.py model.optimizer=Adam ``` ๐น ๐ง๐ต๐ถ๐ ๐ฎ๐น๐๐ผ ๐๐ผ๐ฟ๐ธ๐ ๐ณ๐ผ๐ฟ ๐ป๐ฒ๐๐๐ฒ๐ฑ ๐ฐ๐ผ๐ป๐ณ๐ถ๐ด๐. โณ Run multiple configs at the same time using Hydra sweeps. ``` python train.py -m data.dataset=A1, B12, C55 ``` This will spawn multiple processes and run in parallel, automatically saving logs. --- ๐๐ผ๐ ๐ฑ๐ผ ๐๐ผ๐ ๐บ๐ฎ๐ป๐ฎ๐ด๐ฒ ๐๐ผ๐๐ฟ ๐ฐ๐ผ๐ป๐ณ๐ถ๐ด๐๐ฟ๐ฎ๐๐ถ๐ผ๐ป๐? #deeplearning #artificialintelligence #machinelearning --- ๐ก Follow me for daily expert insights on AI/ML Engineering.
๐ช๐ผ๐ฟ๐ธ๐ถ๐ป๐ด ๐๐ถ๐๐ต ๐๐๐ ๐? You'll like this set of resources on LLM Post-Training ๐ ๐๐ช๐ณ๐ด๐ต ๐ฐ๐ง๐ง, ๐ธ๐ฉ๐ข๐ต ๐ช๐ด ๐๐ฐ๐ด๐ต-๐๐ณ๐ข๐ช๐ฏ๐ช๐ฏ๐จ? Once a new Foundation Model (LLM/VLM) was pretrained from scratch on vast web-scale data, the focus is on post-training techniques to achieve further breakthroughs. If you've ever fine-tuned an LLM, you used a post-training technique. Here are a few examples: ๐ง๐๐ป๐ถ๐ป๐ด โณ PEFT โณ Full Model Finetuning โณ LoRA, Adapters โณ Knowledge Distillation ๐ฆ๐ฐ๐ฎ๐น๐ถ๐ป๐ด โณ Chain of Thought (CoT) โณ Tree of Thought (ToT) ๐ฅ๐ฒ๐ถ๐ป๐ณ๐ผ๐ฟ๐ฐ๐ถ๐ป๐ด โณ RLHF (reinforcement learning, human feedback) โณ DPO (direct preference optimization) โณRLAIF (reinforcement learning, AI feedback) .. and these only scratch the surface. This repo groups all techniques, with key papers, resources, and surveys on every major post-training technique out there. ๐ป LLM Post-Training: https://lnkd.in/dQj78C3X --- #artificialintelligence #deeplearning #machinelearning --- ๐ก Follow me for ๐ฒ๐ ๐ฝ๐ฒ๐ฟ๐ ๐ถ๐ป๐๐ถ๐ด๐ต๐๐ on AI/ML Engineering
An LLM model on HuggingFace has multiple files attached: ๐๐ต๐ฎ๐'๐ ๐๐ต๐ฒ๐ถ๐ฟ ๐ฝ๐๐ฟ๐ฝ๐ผ๐๐ฒ? ๐ Each model repo on HF contains 3 tabs. The ๐ ๐ผ๐ฑ๐ฒ๐น ๐๐ฎ๐ฟ๐ฑ with the model architecture, benchmarks, license, and other details. The ๐๐ถ๐น๐ฒ๐ section contains the actual model's files. The ๐๐ผ๐บ๐บ๐๐ป๐ถ๐๐ ๐ฆ๐ฒ๐ฐ๐๐ถ๐ผ๐ป with discussion threads, pull requests, and other resources. ๐น Let's unpack the LLM file structure. 1๏ธโฃ ๐ผ๐ง๐๐๐๐ฉ๐๐๐ฉ๐ช๐ง๐ ๐พ๐ค๐ฃ๐๐๐๐ช๐ง๐๐ฉ๐๐ค๐ฃ This file, usually named config.json, contains metadata on model architecture, layer activations, sizes, vocabulary size, number of attention heads, model precision, and more. The transformers library knows how to parse this config and build the model architecture. 2๏ธโฃ ๐๐ค๐๐๐ก ๐๐๐๐๐๐ฉ๐จ Due to LLMs having B of parameters, the models are usually split into parts for safer download, as no one would like to download an 800GB model and get a network error, ending up with the entire model file being corrupted. These model weights come in either .bin format or .safetensors, a newer, safer format proposed by HuggingFace. Safetensors format is an alternative to the default Pickle serializer that PyTorch (pt) used, as itโs vulnerable to code injection. 3๏ธโฃ ๐๐๐ฎ๐๐ง ๐๐๐ฅ๐ฅ๐๐ฃ๐ Since the models are large and weights come as part files (e.g., 0001-of-0006, 0002-of-0006, etc.), this file stores a sequential map of the model architecture, specifying which part file each layer has its weights in. 4๏ธโฃ ๐๐ค๐ ๐๐ฃ๐๐ฏ๐๐ง ๐พ๐ค๐ฃ๐๐๐ The tokenizer config file contains metadata about which tokenizer and configuration were used to train this model. It also shows the class name used to instantiate the tokenizer, the layer names, and how the inputs are processed before passing through the model. This also contains ๐ด๐ฑ๐ฆ๐ค๐ช๐ข๐ญ_๐ต๐ฐ๐ฌ๐ฆ๐ฏ๐ด, tokens not derived from input that LLM uses as markers to stop generation, mark the chat template, differentiate between text and image modalities, etc. 5๏ธโฃ ๐๐๐ฃ๐๐ง๐๐ฉ๐๐ค๐ฃ ๐พ๐ค๐ฃ๐๐๐ These configuration files contain metadata for Inference, such as Temperature and TopP/TopK thresholds or context window size the model was trained with. Also, it specifies the token IDs for the special tokens so that the tokenizer can append these IDs to the sequence. ๐ฆ๐ฒ๐ฒ ๐๐ต๐ฒ ๐ฑ๐ถ๐ฎ๐ด๐ฟ๐ฎ๐บ ๐ฏ๐ฒ๐น๐ผ๐ ๐ณ๐ผ๐ฟ ๐๐ต๐ฒ ๐๐๐บ๐บ๐ฎ๐ฟ๐ถ๐๐ฒ๐ฑ ๐๐ฒ๐ฟ๐๐ถ๐ผ๐ป ๐ ----- #artificialintelligence #deeplearning #machinelearning ----- ๐ก Follow me for more ๐ฒ๐ ๐ฝ๐ฒ๐ฟ๐ ๐ถ๐ป๐๐ถ๐ด๐ต๐๐ on AI/ML Engineering.
These are some of my favorite resources for learning about applied GenAI & LLMs ๐ I focus solely on long-form videos in this one. I know you might have seen multiple similar posts. But in this one, I've grouped 9 videos that have helped me pick low-level details on topics such as: โณ Production RAG (Jerry Liu, LLamaIndex) โณ LLM Inference Optimization (NVIDIA) โณ High-Level Agentic Patterns (Neural Maze) โณ Low-level Maths of Transformers (Unsloth) โณ Transformer-specific Hardware (Groq LPU) โณ MCP (Anthropic) ๐น ๐ง๐ผ ๐๐ฎ๐๐ฒ ๐๐ผ๐๐ฟ ๐๐ถ๐บ๐ฒ, ๐ต๐ฒ๐ฟ๐ฒ ๐ฎ๐ฟ๐ฒ ๐บ๐ ๐ฟ๐ฒ๐ฐ๐ผ๐บ๐บ๐ฒ๐ป๐ฑ๐ฎ๐๐ถ๐ผ๐ป๐: 1. For everyone, check the LLM Inference Optimization one from Mark Moyou (NVIDIA) 2. RAG-specific, see the lessons learned from Production RAG (Jerry Liu) 3. The video from Anthropic on MCP is long and detailed; feel free to skip sections. 4. Building and understanding LLMs, from Tokenizer to Inference - videos from Andrej and Sebastian. ๐๐ ๐๐ฟ๐ฎ: - If you're interested in Hardware, GPUs, and Architecture, check Igor's (Head of Silicon at Groq) walkthrough on how Groq LPUs work. - If you're interested in low-level maths in LLMs and how Unsloth optimizes training and inference - Daniel Han's video. Find all resources alongside other details in this article: ๐ ๐๐ฟ๐๐ถ๐ฐ๐น๐ฒ: https://lnkd.in/daNSm3Ct --- #deeplearning #machinelearning #artificialintelligence --- ๐ก Follow me, I help you learn AI/ML Engineering.
If you work with LLMs, you might like this ๐ TransformerLab is an open-source toolkit for LLMs enabling fine-tuning, visualization, tracking, and inference with multiple HuggingFace models. Found out about it just yesterday, scrolling through /r/locallama subreddit. ๐๐ฒ๐ฟ๐ฒ'๐ ๐ฎ ๐๐ต๐ผ๐ฟ๐ ๐๐๐บ๐บ๐ฎ๐ฟ๐: โณ Integrates with HF Models โณ Has built-in TensorBoard for experiment tracking โณ It's got a built-in tokenizer visualizer โณ Will enable a model architecture visualizer โณ Every component is built as a plugin โณ Interactive RAG Tab to quickly test a basic RAG application โณ Cross OS (Linux, macOS) โณ Integration with MLX, vLLM, llama.cpp inference engines โณ RLHF and Preference Optimization ๐ Docs: https://lnkd.in/d9RKJK7Y ๐ป Code: https://lnkd.in/dBr5r5hy --- #artificialintelligence #machinelearning #llm --- ๐ก Follow me for more expert insights on AI/ML!
If you're still using FastAPI to deploy Hugging Face LLMs/VLMs - try ๐๐ถ๐๐๐ฝ๐ถ! FastAPI is a great framework for implementing RESTful APIs. However, it wasnโt specifically designed to handle the complex requirements of serving ML models at scale. The team at Lightning AI is behind LitServe and LitApi to fill in that gap. ๐น ๐๐ถ๐๐๐ฃ๐ builds on top of FastAPI, adapting for ML workloads and standardizing the core steps of serving a model. ๐น ๐๐ถ๐๐ฆ๐ฒ๐ฟ๐๐ฒ๐ฟ handles the infrastructure side of serving models. ๐ธ Here's what you must know: 1. ๐ข๐ป๐ฒ-๐๐ถ๐บ๐ฒ ๐บ๐ผ๐ฑ๐ฒ๐น ๐๐ฒ๐๐๐ฝ In the ๐จ๐๐ฉ๐ช๐ฅ() method, we can load any model only once. 2. ๐๐๐๐๐ผ๐บ๐ถ๐๐ฒ ๐ฃ๐ฟ๐ฒ๐ฑ๐ถ๐ฐ๐ In the ๐ฅ๐ง๐๐๐๐๐ฉ() method, we implement the inference on input logic. 3. ๐๐๐๐๐ผ๐บ๐ถ๐๐ฒ ๐๐ฎ๐๐ฐ๐ต๐ถ๐ป๐ด ๐๐ผ๐ด๐ถ๐ฐ You can specify a MAX_BATCH_SIZE and a BATCH_TIME_WINDOW, and it'll automatically handle the dynamic batching of requests as they come in concurrently. You can use ThreadPoolExecutor to parallelize the preprocessing steps in the ๐๐๐ฉ๐๐() method. 4. ๐๐๐๐๐ผ๐บ๐ถ๐๐ฒ ๐จ๐ป๐ฏ๐ฎ๐๐ฐ๐ต๐ถ๐ป๐ด ๐๐ผ๐ด๐ถ๐ฐ After inferencing on a batch, you'll handle the detach () of GPU tensors and post-process the raw logits in the ๐ช๐ฃ๐๐๐ฉ๐๐() method. 5. ๐๐ฒ๐ฐ๐ผ๐ฑ๐ฒ ๐ฟ๐ฒ๐พ๐๐ฒ๐๐ ๐ฎ๐ป๐ฑ ๐ฒ๐ป๐ฐ๐ผ๐ฑ๐ฒ ๐ฟ๐ฒ๐๐ฝ๐ผ๐ป๐๐ฒ In the ๐๐๐๐ค๐๐_๐ง๐๐ฆ๐ช๐๐จ๐ฉ() - specify how the API should access the input value from the request. In the ๐๐ฃ๐๐ค๐๐_๐ง๐๐จ๐ฅ๐ค๐ฃ๐จ๐() - specify how the API should return responses to the client. Simple as that! To scale this up for a production workload, you'll use LitServe's scale configuration parameters: ``` LitServer( lit_api: LitAPI, accelerator: str = "auto", devices: Union[str, int] = "auto", workers_per_device: int = 1, timeout: Union[float, bool] = 30, max_batch_size: int = 1, batch_timeout: float = 0.0, stream: bool = False, ) ``` ๐ For a full tutorial, see this article: https://lnkd.in/dGUrVX7s --- #machinelearning #deeplearning #artificialintelligence --- ๐ก Follow me for ๐ฒ๐ ๐ฝ๐ฒ๐ฟ๐ ๐ถ๐ป๐๐ถ๐ด๐ต๐๐ on AI/ML Engineering
The ๐๐/๐ ๐ ๐๐ป๐ด๐ถ๐ป๐ฒ๐ฒ๐ฟ'๐ ๐ด๐๐ถ๐ฑ๐ฒ ๐๐ผ ๐บ๐๐๐-๐ธ๐ป๐ผ๐ NVIDIA AI Frameworks Forget about the "Vibe Coding" frenzy and make sure you know what each of these is doing. ๐ 1๏ธโฃ ๐๐จ๐๐ Parallel computing platform and API to accelerate computation on NVIDIA GPUs. Keypoints: โณ Kernels - C/C++ functions. โณ Thread - executes the kernel instructions. โณ Block - groups of threads. โณ Grid - collection of blocks. โณ Streaming Multiprocessor (SM) - processor units that execute thread blocks. When a CUDA program invokes a kernel grid, the thread blocks are distributed to the SMs. CUDA follows the SIMT (Single Instruction Multiple Threads) architecture to execute threads logic and uses a Barrier to gather and synchronize Threads. 2๏ธโฃ ๐ฐ๐๐๐ก๐ก Library with highly tuned implementations for standard routines such as: โณ forward and backward convolution โณ attention โณ matmul, pooling, and normalization - which are used in all NN Architectures. 3๏ธโฃ ๐ง๐ฒ๐ป๐๐ผ๐ฟ๐ฅ๐ง If we unpack a model architecture, we have multiple layer types, operations, layer connections, activations, etc. Imagine an NN architecture as a complex Graph of operations. TensorRT can: โณ Scan that graph โณ Identify bottlenecks โณ Optimize โณ Remove, merge layers โณ Reduce layer precisions, โณ Many other optimizations. 4๏ธโฃ ๐ง๐ฒ๐ป๐๐ผ๐ฟ๐ฅ๐ง-๐๐๐ Inference Engine that brings the TensorRT Compiler optimizations to Transformer-based models. Covers the advanced and custom requirements for LLMs, such as: โณ KV Caching โณ Inflight Batching โณ Optimized Attention Kernels โณTensor Parallel โณ Pipeline Parallel. 5๏ธโฃ ๐ง๐ฟ๐ถ๐๐ผ๐ป ๐๐ป๐ณ๐ฒ๐ฟ๐ฒ๐ป๐ฐ๐ฒ ๐ฆ๐ฒ๐ฟ๐๐ฒ๐ฟ An open source, high-performance, and secure serving system for AI Workloads. Devs can optimize their models, define serving configurations in Protobuf Text files, and deploy. It supports multiple framework backends, including: โณ Native PyTorch, TensorFlow โณ TensorRT, TensorRT-LLM โณ Custom BLS (Bussiness Language Scripting) with Python Backends 6๏ธโฃ ๐ก๐ฉ๐๐๐๐ ๐ก๐๐ Set of plug-and-play inference microservices that package up multiple NVIDIA libraries and frameworks highly tuned for serving LLMs to production cluster & datacenters scale. It has: โณ CUDA, cuDNN โณ TensorRT โณ Triton Server โณ Many other libraries - baked in. NIM provides the optimal serving configuration for an LLM. 7๏ธโฃ ๐๐๐ป๐ฎ๐บ๐ผ ๐๐ป๐ณ๐ฒ๐ฟ๐ฒ๐ป๐ฐ๐ฒ ๐๐ฟ๐ฎ๐บ๐ฒ๐๐ผ๐ฟ๐ธ The newest inference framework for accelerating and scaling GenAI workloads. Composed of modular blocks, robust and scalable. Implements: โณ Elastic compute - GPU Planner โณ KV Routing, Sharing, and Caching โณ Disaggregated Serving of Prefill and Decode. --- #deeplearning #artificialintelligence #machinelearning --- ๐ก Follow me for more practical expert insights on AI/ML Engineering.
Content Inspiration, AI, scheduling, automation, analytics, CRM.
Get all of that and more in Taplio.
Try Taplio for free