Logo Taplio

Taplio

Alex Razvant's Linkedin Analytics

Get the Linkedin stats of Alex Razvant and many LinkedIn Influencers by Taplio.

Want detailed analytics of your Linkedin Account? Try Taplio for free.

Alex Razvant

open on linkedin

Hard work always pays off, be consistent! I'm a Senior Machine Learning Engineer with a keen interest in developing AI/ML solutions for real world problems and helping others get started on their Machine Learning Journey. Let's connect @: ๐Ÿ“˜ https://medium.com/@alexandrurazvant โœ‰๏ธ alexandrurazvant@gmail.com ๐ŸŒ https://www.neuraleaps.com

Check out Alex Razvant's verified LinkedIn stats (last 30 days)

Followers
15,767
Posts
14
Engagements
3,241
Likes
2,727

Alex Razvant's Best Posts (last 30 days)

Use Taplio to search all-time best posts


A complete 2025 ๐—ผ๐˜ƒ๐—ฒ๐—ฟ๐˜ƒ๐—ถ๐—ฒ๐˜„ ๐—ผ๐—ณ ๐˜๐—ต๐—ฒ ๐—–๐—ผ๐—บ๐—ฝ๐˜‚๐˜๐—ฒ๐—ฟ ๐—ฉ๐—ถ๐˜€๐—ถ๐—ผ๐—ป ๐—”๐—œ ๐—™๐—ถ๐—ฒ๐—น๐—ฑ ๐Ÿ‘‡ This is one of the longest articles I've ever written for my newsletter. Covered each topic without too many niche technicals.. Tried to keep it accessible to a larger audience. It starts from what a Pixel is, and builds up to Vision Language Models (VLMs) and Multimodal Generative AI. ๐—œ๐—ป ๐˜€๐—ต๐—ผ๐—ฟ๐˜, ๐—œ๐˜'๐—น๐—น ๐˜๐—ฎ๐—ธ๐—ฒ ๐˜†๐—ผ๐˜‚ ๐˜๐—ต๐—ฟ๐—ผ๐˜‚๐—ด๐—ต: โ†ณ Pixels, Images, Image Types โ†ณ Colors, Formats โ†ณ Sensors, Cameras, LiDAR โ†ณ Classic Image Processing with OpenCV โ†ณ CNN-based Computer Vision โ†ณ Object Detection, Tracking, Pose, Segmentation โ†ณ Generative AI, GANs, AE, VAE โ†ณ Diffusion Models โ†ณ Tesla Autopilot โ†ณ Vision Transformer (ViT), Diffusion Transformer (DiT) โ†ณ Text-to-Image, Text-to-Video โ†ณ Stable Diffusion, OpenAI Sora, FLUX โ†ณ Neural Radiance Fields (NeRF) and Gaussian Splats โ†ณ Google Maps 3D Rendering I aimed to include as many diagrams/GIFs as possible. ๐—–๐—ผ๐˜ƒ๐—ฒ๐—ฟ ๐—ฒ๐—ฎ๐—ฐ๐—ต ๐˜๐—ผ๐—ฝ๐—ถ๐—ฐ ๐˜„๐—ถ๐˜๐—ต ๐—ฒ๐—ป๐—ผ๐˜‚๐—ด๐—ต ๐—ฑ๐—ฒ๐˜๐—ฎ๐—ถ๐—น๐˜€. Add up-to-date references, no older than 2-3 years. ๐Ÿ”ธ A few more interesting topics I'm planning to add: โ†ณ How DLSS is used in Video Games. โ†ณ 3D Shape Completion. โ†ณ 4D Motion-Cap Video Rendering. โ†ณ More on Generative AI & MultiModal side. ๐Ÿ“’ ๐€๐ซ๐ญ๐ข๐œ๐ฅ๐ž: https://lnkd.in/dU2XGKHi ๐—˜๐—ป๐—ท๐—ผ๐˜†! ----- #deeplearning #machinelearning #artificialintelligence ----- ๐Ÿ’ก Follow me for more ๐—ฒ๐˜…๐—ฝ๐—ฒ๐—ฟ๐˜ ๐—ถ๐—ป๐˜€๐—ถ๐—ด๐—ต๐˜๐˜€ on AI/ML Engineering.


    110

    21 ๐—”๐—œ/๐— ๐—Ÿ ๐—š๐—ถ๐˜๐—ต๐˜‚๐—ฏ ๐—ฟ๐—ฒ๐—ฝ๐—ผ๐˜€ you'll find interesting ๐Ÿ‘‡ (+ short description on each) 1. ๐—™๐—ถ๐—ฟ๐—ฒ๐—ฐ๐—ฟ๐—ฎ๐˜„๐—น Crawl websites to LLM-ready data with a single API. ๐Ÿ’ป https://lnkd.in/d6R3hCHS 2. ๐— ๐—ผ๐—ฑ๐—ฒ๐—น๐—–๐—ผ๐—ป๐˜๐—ฒ๐˜…๐˜๐—ฃ๐—ฟ๐—ผ๐˜๐—ผ๐—ฐ๐—ผ๐—น (๐— ๐—–๐—ฃ) Give LLMs safe access to tools and data sources. ๐Ÿ’ป https://shorturl.at/iUoxv 3. ๐— ๐— ๐—ฎ๐—ด๐—ถ๐—ฐ Training, building, and serving a large set of deep learning models. ๐Ÿ’ป https://lnkd.in/dZF_jXAu 4. ๐—ฆ๐˜‚๐—ฝ๐—ฒ๐—ฟ๐—ฑ๐˜‚๐—ฝ๐—ฒ๐—ฟ Framework for building AI-data workflows and applications. ๐Ÿ’ป https://lnkd.in/dX5G8N9r 5. ๐—ก๐—ฒ๐—ฟ๐—ณ๐—ฆ๐˜๐˜‚๐—ฑ๐—ถ๐—ผ A simple API for end-to-end NeRFs. ๐Ÿ’ป https://shorturl.at/nVsDE 6. ๐—Ÿ๐—ฎ๐—ป๐—ด๐—™๐˜‚๐˜€๐—ฒ An OSS LLM engineering platform for AI applications. ๐Ÿ’ป https://lnkd.in/dr9fDZc4 7. ๐—ง๐—ฎ๐—ฏ๐—ฏ๐˜†๐— ๐—Ÿ Self-hosted LLMs as coding assistants. ๐Ÿ’ป https://lnkd.in/dnEMvAtE 8. ๐—Ÿ๐— ๐—™๐—น๐—ผ๐˜„ An efficient toolbox for finetuning LLMs. ๐Ÿ’ป https://lnkd.in/dg64Shp4 9. ๐—š๐—ฎ๐—ฟ๐—ฎ๐—ธ Toolkit for probing LLM security and quality of outputs. ๐Ÿ’ป https://lnkd.in/dRStRf3v 10. ๐—ง๐—ง๐—ฆ (๐—ง๐—ฒ๐˜…๐˜-๐˜๐—ผ-๐—ฆ๐—ฝ๐—ฒ๐—ฒ๐—ฐ๐—ต) A popular library for advanced Text-to-Speech generation. ๐Ÿ’ป https://lnkd.in/d_nUBFMT 11. ๐—ฆ๐˜‚๐—ป๐—ผ ๐—•๐—ฎ๐—ฟ๐—ธ A text-to-audio model can be fine-tuned for highly realistic speech. ๐Ÿ’ป https://lnkd.in/dFTPVbYa 12. ๐—ข๐—Ÿ๐— ๐—ผ Codebase for training and using AI2's OLMo LLM models. ๐Ÿ’ป https://lnkd.in/d3YsVUge 13. ๐—ง๐—ถ๐—ป๐˜†๐—ด๐—ฟ๐—ฎ๐—ฑ A tiny deep learning framework, useful to can help understand the nuts & bolts of PyTorch. ๐Ÿ’ป https://lnkd.in/dGRWgheu 14. ๐— ๐—ถ๐—ฐ๐—ฟ๐—ผ๐˜€๐—ผ๐—ณ๐˜ ๐—จ๐—™๐—ข A UI-Focused multi-agent framework for Windows OS. ๐Ÿ’ป https://lnkd.in/dcAywxgU 15 ๐—จ๐—ป๐—ถ๐˜๐˜† ๐—”๐—œ ๐—”๐—ด๐—ฒ๐—ป๐˜๐˜€ Game environments for training agent simulations. ๐Ÿ’ป https://shorturl.at/aEh83 16. ๐——๐—ฒ๐—ฝ๐˜๐—ต๐—”๐—ป๐˜†๐˜๐—ต๐—ถ๐—ป๐—ด Foundation model for robust monocular depth ๐Ÿ’ป https://shorturl.at/iuFhe 17. ๐—š๐—ฒ๐—บ๐—บ๐—ฎ ๐—–๐—ฃ๐—ฃ C++ implementation of Google Gemma LLM. ๐Ÿ’ป https://lnkd.in/drqupR6C 18. ๐—š๐—ฟ๐—ผ๐—ธ-๐Ÿญ Open-source implementation of XAI grok-1 model. ๐Ÿ’ป https://lnkd.in/dXGqUbqj 19. ๐— ๐—ผ๐˜€๐—ฎ๐—ถ๐—ฐ๐— ๐—Ÿ ๐—ฆ๐˜๐—ฟ๐—ฒ๐—ฎ๐—บ๐—ถ๐—ป๐—ด Library to cache and stream datasets directly from cloud storage. ๐Ÿ’ป https://lnkd.in/dKtNxhe4 20. ๐—ฅ๐—ฎ๐—ด๐—™๐—น๐—ผ๐˜„ OSS RAG engine based on deep document understanding. ๐Ÿ’ป https://lnkd.in/dCppFXV4 21. ๐—ฉ๐—ถ๐˜€๐—ถ๐—ผ๐—ป ๐—”๐—ด๐—ฒ๐—ป๐˜ Generate code to solve your vision tasks. ๐Ÿ’ป https://lnkd.in/d9xqQvUS ----- #artificialintelligence #deeplearning #machinelearning ----- ๐Ÿ’ก Follow me for more ๐—ฒ๐˜…๐—ฝ๐—ฒ๐—ฟ๐˜ ๐—ถ๐—ป๐˜€๐—ถ๐—ด๐—ต๐˜๐˜€ on AI/ML Engineering.


    203

    ๐Ÿญ๐Ÿญ ๐—ธ๐—ฒ๐˜† ๐—บ๐—ฒ๐˜๐—ฟ๐—ถ๐—ฐ๐˜€ to monitor your ๐——๐—ฒ๐—ฒ๐—ฝ ๐—Ÿ๐—ฒ๐—ฎ๐—ฟ๐—ป๐—ถ๐—ป๐—ด models ๐—ถ๐—ป ๐—ฝ๐—ฟ๐—ผ๐—ฑ๐˜‚๐—ฐ๐˜๐—ถ๐—ผ๐—ป using Triton Server! You might believe that once the model is deployed, the job is done and you can work on the next improvements, be it better accuracy metrics or re-training, but the job isn't done! ๐—Ÿ๐—ฒ๐˜ ๐—บ๐—ฒ ๐˜๐—ฒ๐—น๐—น ๐˜†๐—ผ๐˜‚ ๐˜„๐—ต๐˜†: It would be best if you spent more time gathering insights on how the model performs regarding latency, TCO (Total Cost of Ownership), throughput, and energy footprint. ๐Ÿ”น Here are ๐˜๐—ต๐—ฒ ๐˜๐—ผ๐—ฝ ๐Ÿญ๐Ÿญ metrics that Triton monitors for you that you ๐˜€๐—ต๐—ผ๐˜‚๐—น๐—ฑ ๐—ธ๐—ฒ๐—ฒ๐—ฝ ๐—ฎ๐—ป ๐—ฒ๐˜†๐—ฒ ๐—ผ๐—ป: โ†’ ๐—ป๐˜ƒ_๐—ถ๐—ป๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐—ฐ๐—ฒ_๐—ฟ๐—ฒ๐—พ๐˜‚๐—ฒ๐˜€๐˜_๐˜€๐˜‚๐—ฐ๐—ฐ๐—ฒ๐˜€๐˜€ Tracks successful inference requests, monitors server health, and identifies bottlenecks. โ†’ ๐—ป๐˜ƒ_๐—ถ๐—ป๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐—ฐ๐—ฒ_๐—ฟ๐—ฒ๐—พ๐˜‚๐—ฒ๐˜€๐˜_๐—ณ๐—ฎ๐—ถ๐—น๐˜‚๐—ฟ๐—ฒ Counts failed inference requests to help quickly troubleshoot issues. โ†’ ๐—ป๐˜ƒ_๐—ถ๐—ป๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐—ฐ๐—ฒ_๐—ฐ๐—ผ๐˜‚๐—ป๐˜ Measures the total inferences processed, indicating server workload and throughput. โ†’ ๐—ป๐˜ƒ_๐—ถ๐—ป๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐—ฐ๐—ฒ_๐—ฒ๐˜…๐—ฒ๐—ฐ_๐—ฐ๐—ผ๐˜‚๐—ป๐˜ Reveals the demand on specific models, aiding in resource optimization. โ†’ ๐—ป๐˜ƒ_๐—ถ๐—ป๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐—ฐ๐—ฒ_๐—ฟ๐—ฒ๐—พ๐˜‚๐—ฒ๐˜€๐˜_๐—ฑ๐˜‚๐—ฟ๐—ฎ๐˜๐—ถ๐—ผ๐—ป_๐˜‚๐˜€ Monitors inference request completion time, crucial for meeting latency requirements. โ†’ ๐—ป๐˜ƒ_๐—ถ๐—ป๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐—ฐ๐—ฒ_๐—พ๐˜‚๐—ฒ๐˜‚๐—ฒ_๐—ฑ๐˜‚๐—ฟ๐—ฎ๐˜๐—ถ๐—ผ๐—ป_๐˜‚๐˜€ Identifies bottlenecks by tracking request queue times. โ†’ ๐—ป๐˜ƒ_๐—ถ๐—ป๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐—ฐ๐—ฒ_๐—ฐ๐—ผ๐—บ๐—ฝ๐˜‚๐˜๐—ฒ_๐—ฑ๐˜‚๐—ฟ๐—ฎ๐˜๐—ถ๐—ผ๐—ป_๐˜‚๐˜€ Provides insights into processing efficiency and potential optimizations. โ†’ ๐—ป๐˜ƒ_๐—ด๐—ฝ๐˜‚_๐˜‚๐˜๐—ถ๐—น๐—ถ๐˜‡๐—ฎ๐˜๐—ถ๐—ผ๐—ป Shows how effectively GPU resources are utilized, crucial for scaling. โ†’ ๐—ป๐˜ƒ_๐—ด๐—ฝ๐˜‚_๐—บ๐—ฒ๐—บ๐—ผ๐—ฟ๐˜†_๐˜๐—ผ๐˜๐—ฎ๐—น_๐—ฏ๐˜†๐˜๐—ฒ๐˜€ and ๐—ป๐˜ƒ_๐—ด๐—ฝ๐˜‚_๐—บ๐—ฒ๐—บ๐—ผ๐—ฟ๐˜†_๐˜‚๐˜€๐—ฒ๐—ฑ_๐—ฏ๐˜†๐˜๐—ฒ๐˜€ Manage memory resources. โ†’ ๐—ป๐˜ƒ_๐—ฒ๐—ป๐—ฒ๐—ฟ๐—ด๐˜†_๐—ฐ๐—ผ๐—ป๐˜€๐˜‚๐—บ๐—ฝ๐˜๐—ถ๐—ผ๐—ป Provides stats on GPU energy consumption. โ†’ ๐—ป๐˜ƒ_๐—ถ๐—ป๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐—ฐ๐—ฒ_๐—น๐—ผ๐—ฎ๐—ฑ_๐—ฟ๐—ฎ๐˜๐—ถ๐—ผ Offers insights into load distribution, helping with efficient resource use and load balancing. --- โญ I'm working on something cool that I'm going to announce soon. Subscribe to my newsletter as I'll roll out the updates there. ๐Ÿ”— Newsletter: https://lnkd.in/dgWB64cX --- #deeplearning #artificialintelligence #machinelearning --- ๐Ÿ’ก I share expert insights on AI Systems and help you upskill as an AI Engineer. Follow for more!


      85

      Roboflow has been on quite a surge lately! If you working with CV or looking to get started with Vision AI, these should be on your list : - Trackers (https://lnkd.in/drH_SvAj) - newest one, excited for this! - Supervision (https://lnkd.in/ddWwNsZ3) - toolkit for reusable CV components, it cuts the boilerplate code by a ton! - Maestro (https://lnkd.in/duceppkw) - finetuning VLMs. - Autodistill (https://lnkd.in/d7hMebmC) - modular zero-shot detection, really nice library. Piotr way to go! ๐Ÿ”ฅ ๐Ÿ”ฅ ๐Ÿ”ฅ

      Profile picture of Piotr Skalski

      Piotr Skalski


      Introducing Trackers: All-in-One Object Tracking Library ๐Ÿ”ฅ ๐Ÿ”ฅ ๐Ÿ”ฅ TL;DR: Together with Soumik Rakshit, Iโ€™m building an allโ€‘inโ€‘one tracking toolkit: multiโ€‘object tracking, tracker fineโ€‘tuning, and reโ€‘identification in one place. The first official release drops next week! - Plugโ€‘andโ€‘play integration with detectors from Transformers, Inference, Ultralytics, PaddlePaddle, MMDetection, and more. - Builtโ€‘in support for SORT and DeepSORT today, with StrongSORT, BoTโ€‘SORT, ByteTrack, OCโ€‘SORT, and additional trackers on the way. - Released under the open Apache 2.0 license. โฎ‘ ๐Ÿ”— trackers: https://lnkd.in/dy6tSiS8 Quickโ€‘start notebook link in the comments. ๐Ÿ‘‡๐Ÿป


      38

      I'd love your feedback on this ๐Ÿ‘‡ In my newsletter, I've been mostly unpacking and explaining AI tools and frameworks used in the industry, which many found helpful. To get more practical, I thought about building a full AI System, starting from the ground up, and guide you through each component step by step. From the ground up, meaning: - Business Logic, Design Decisions - Tooling, structure - Data collection & curation - Training, Evaluation - Workflows, Pipelines - Engineering (code, app, tests) - Optimisation (model level, system level) - Monitoring (pre-deployment, post-deployment) - MLOps - and many other concepts. I have a few ideas and sketches in mind, but find it difficult to decide on the length, complexity, domain, and format - and Iโ€™d love your feedback. ๐‡๐จ๐ฐ ๐๐จ๐ž๐ฌ ๐ญ๐ก๐š๐ญ ๐ฌ๐จ๐ฎ๐ง๐? Find the polls in the short article below - your vote will help a ton! Thanks, appreciate it ๐Ÿ™ https://lnkd.in/dCYZKAH6 --- #deeplearning #machinelearning #artificialintelligence --- ๐Ÿ’ก Follow me for more ๐—ฒ๐˜…๐—ฝ๐—ฒ๐—ฟ๐˜ ๐—ถ๐—ป๐˜€๐—ถ๐—ด๐—ต๐˜๐˜€ on AI/ML Engineering.


      31

      ๐—™๐—ฒ๐˜„ ๐—ฝ๐—ผ๐—ถ๐—ป๐˜๐˜€ on writing a technical AI/ML Newsletter ๐Ÿ‘‡ ๐Ÿญ. ๐—œ๐—บ๐—ฝ๐—ฟ๐—ผ๐˜ƒ๐—ฒ๐˜€ ๐˜๐—ต๐—ถ๐—ป๐—ธ๐—ถ๐—ป๐—ด ๐˜€๐˜๐—ฟ๐˜‚๐—ฐ๐˜๐˜‚๐—ฟ๐—ฒ I've written a lot of code, and documentation and project ramp-up guides, and feature descriptions, and more... I've found writing a newsletter more complicated than all that at times. Even if I master the technicals, turning them into a larger or general audience piece of content is tricky and requires a lot of fine-tuning and editing. This habit of reiteration became helpful as it brought structure into my thoughts and helped me express ideas more clearly. ๐Ÿฎ. ๐—Ÿ๐—ฎ๐—ป๐—ด๐˜‚๐—ฎ๐—ด๐—ฒ++ English was not my first language, and I didn't study it in a structured manner until the first year of college. In middle and high school, I studied French and Russian. I picked up English from music, movies, tv-shows and video games while growing up, more or less. Well, that left quite a few large gaps. Writing definitely helped! ๐Ÿฏ. ๐—š๐—ฒ๐˜๐˜€ ๐˜†๐—ผ๐˜‚ ๐——๐—ฒ๐—ฒ๐—ฝ๐—ฒ๐—ฟ ๐—˜๐˜…๐—ฝ๐—ฒ๐—ฟ๐˜๐—ถ๐˜€๐—ฒ I might know how AI works at the lowest level. In my head, everything connects nicely and everything makes sense. However, to explain all that for a general audience - that's a challenge, especially when starting. You'll not only have to unpack it step by step, but also provide resources and references, as your audience won't trust your expertise just because "trust me, bro". That develops a habit of continuous learning and knowing where and how to look for information. More importantly, how to digest it and explain it to others through your view. ๐Ÿฐ. ๐—ข๐˜ƒ๐—ฒ๐—ฟ๐—ฐ๐—ผ๐—บ๐—ฒ ๐—œ๐—บ๐—ฝ๐—ผ๐˜€๐˜๐—ฒ๐—ฟ ๐—ฆ๐˜†๐—ป๐—ฑ๐—ฟ๐—ผ๐—บ๐—ฒ The usual "what if", "I don't know that much", "what will the comments say", "What if I get checked on this", blabla. Your first articles will always be bad. Look past that and target to improve rather than making everything perfect from the first go. When you write, you also learn. This reflected in my career role, and built up confidence to express ideas and make decisions. --- Finally, if you're planning on starting a newsletter or sharing your thoughts, start doing so. You've got nothing to lose! --- If curious, find it here, I talk about AI/ML Engineering: ๐Ÿ“™ ๐—ก๐—ฒ๐˜‚๐—ฟ๐—ฎ๐—น ๐—•๐—ถ๐˜๐˜€ ๐—ก๐—ฒ๐˜„๐˜€๐—น๐—ฒ๐˜๐˜๐—ฒ๐—ฟ https://lnkd.in/dgWB64cX #machinelearning #artificialintelligence #writing ----


      23

      A ๐˜€๐—ต๐—ผ๐—ฟ๐˜ glossary of ๐—”๐—œ ๐˜๐—ผ ๐—˜๐—ป๐—ด๐—น๐—ถ๐˜€๐—ต terms ๐Ÿ‘‡ (LLM Training Edition) ๐—ฅ๐—ก๐—ก๐—ฆ = Recurrent Neural Networks, the precursor for transformers. They processed data sequentially while keeping an internal state. ๐—ง๐—ฟ๐—ฎ๐—ป๐˜€๐—ณ๐—ผ๐—ฟ๐—บ๐—ฒ๐—ฟ = a novel network architecture that came out in 2017 and solved the pain points of RNNs. Powers-up ChatGPT, Claude, Llama, and DeepSeek architectures. ๐—Ÿ๐—Ÿ๐—  = transformer-based models trained on a large volume of text data for language modeling. It processes sequences of tokens. ๐—ง๐—ผ๐—ธ๐—ฒ๐—ป = pieces of words, characters that LLMs take as input. ๐—ง๐—ผ๐—ธ๐—ฒ๐—ป๐—ถ๐˜‡๐—ฒ๐—ฟ = algorithm that converts text into tokens. Each LLM comes with its own trained tokenizer. ๐—˜๐—บ๐—ฏ๐—ฒ๐—ฑ๐—ฑ๐—ถ๐—ป๐—ด๐˜€ = vector of numbers that describe a data point in a high-dimensional space. ๐—ฃ๐—ฟ๐—ผ๐—บ๐—ฝ๐˜ = the text sentence that goes into an LLM. ๐—ฆ๐—ฒ๐—น๐—ณ-๐—”๐˜๐˜๐—ฒ๐—ป๐˜๐—ถ๐—ผ๐—ป = matrix multiplication between token embeddings of the input prompt so the model can learn the relationships between words/tokens. ๐—”๐˜‚๐˜๐—ผ๐—ฟ๐—ฒ๐—ด๐—ฟ๐—ฒ๐˜€๐˜€๐—ถ๐—ผ๐—ป = LLMs are auto-regressive, as they predict the next N token based on the previous 1 ... N-1 tokens, one at a time. ๐—ฃ๐—ผ๐˜€๐—ถ๐˜๐—ถ๐—ผ๐—ป๐—ฎ๐—น ๐—˜๐—ป๐—ฐ๐—ผ๐—ฑ๐—ถ๐—ป๐—ด = adds a position embedding to each token embedding to mark setup order. ๐—ฆ๐—ฝ๐—ฒ๐—ฐ๐—ถ๐—ฎ๐—น ๐—ง๐—ผ๐—ธ๐—ฒ๐—ป๐˜€ = a set of tokens that act like markers and impose a specific behavior. For example,"/start/" and "/end/" for LLM generation. ๐— ๐˜‚๐—น๐˜๐—ถ-๐—›๐—ฒ๐—ฎ๐—ฑ ๐—”๐˜๐˜๐—ฒ๐—ป๐˜๐—ถ๐—ผ๐—ป = split the attention mechanism into parallel heads to focus on different patterns (e.g., text syntax or semantics) ๐—˜๐—ป๐—ฐ๐—ผ๐—ฑ๐—ฒ๐—ฟ = encodes information about the input sequence into a fixed-length embedding vector. ๐——๐—ฒ๐—ฐ๐—ผ๐—ฑ๐—ฒ๐—ฟ = generates the output token by token. ๐—–๐—ฎ๐˜‚๐˜€๐—ฎ๐—น ๐— ๐—ฎ๐˜€๐—ธ๐—ถ๐—ป๐—ด = when training, we have the entire text sentence. Masking prevents the decoder from "cheating/looking" at the future tokens it needs to generate and forces it to focus only on the previously generated ones. Not needed at inference. ๐—ฃ๐—ฟ๐—ฒ๐˜๐—ฟ๐—ฎ๐—ถ๐—ป๐—ถ๐—ป๐—ด = training from 0. ๐—™๐—ถ๐—ป๐—ฒ๐˜๐˜‚๐—ป๐—ถ๐—ป๐—ด = adapting a model to specific tasks using an engineered dataset. ๐—Ÿ๐—ผ๐—ฅ๐—” (Low Rank Adaptation) = inserts a small set of low-rank matrices as trainable parameters, leaving model parameters intact. Requires fewer resources it trains a subset of low-rank matrices only. ๐—ค๐—Ÿ๐—ผ๐—ฅ๐—” = Quantized LoRA, quantized low-rank matrices. ๐—ง๐—ฒ๐—บ๐—ฝ๐—ฒ๐—ฟ๐—ฎ๐˜๐˜‚๐—ฟ๐—ฒ = a 0..1 factor that specifies if the output generation is more deterministic (0) or stochastic (1). Also known as creativity. ๐—ง๐—ผ๐—ฝ-๐—ž ๐—ฆ๐—ฎ๐—บ๐—ฝ๐—น๐—ถ๐—ป๐—ด = limits the LLM output layer to select only top K token predictions. --- #artificialintelligence #deeplearning #machinelearning --- ๐Ÿ’ก Follow me for more ๐—ฒ๐˜…๐—ฝ๐—ฒ๐—ฟ๐˜ ๐—ถ๐—ป๐˜€๐—ถ๐—ด๐—ต๐˜๐˜€ on AI/ML Engineering.


      46

      ๐—ฃ๐—ฟ๐—ผ ๐˜๐—ถ๐—ฝ: Working with YAML configs in ML Projects? Start using Hydra + OmegaConf ๐Ÿ‘‡ ML projects have multiple dynamic components. Keeping track of which configuration was used might become difficult. I found Hydra + OmegaConf to get the job done. Here are the key details: 1. ๐—ข๐—บ๐—ฒ๐—ด๐—ฎ๐—–๐—ผ๐—ป๐—ณ is a hierarchical configuration system explicitly designed for complex applications like ML pipelines. ๐Ÿ”ธ You can: โ†ณ Merge multiple configuration files directly. โ†ณ Access fields via attribute notation "config.model.optimizer" or dict notation config["model"]["optimizer"] โ†ณ Add Type safety against schemas such as dataclasses or Pydantic models. 2. ๐—›๐˜†๐—ฑ๐—ฟ๐—ฎ builds on OmegaConf, creating a full-featured framework for configuration management in complex applications. Initially created by Facebook Research (FAIR), Hydra is well-suited for ML workflows. ๐Ÿ”ธ You can: โ†ณ Group multiple YAMLs into a single configuration file and reference them directly by filename. ``` config.yaml defaults: - mydata.yaml - training: trainAB.yaml - inference: inferAB.yaml ``` When loading the `config.yaml` using Hydra, we'll get a nested configuration object, from which we can access any field. โ†ณ Override any config value at runtime. For example, if in the config.yaml: ``` model: optimizer: SGD ``` At runtime, you can change the value with: ``` python train.py model.optimizer=Adam ``` ๐Ÿ”น ๐—ง๐—ต๐—ถ๐˜€ ๐—ฎ๐—น๐˜€๐—ผ ๐˜„๐—ผ๐—ฟ๐—ธ๐˜€ ๐—ณ๐—ผ๐—ฟ ๐—ป๐—ฒ๐˜€๐˜๐—ฒ๐—ฑ ๐—ฐ๐—ผ๐—ป๐—ณ๐—ถ๐—ด๐˜€. โ†ณ Run multiple configs at the same time using Hydra sweeps. ``` python train.py -m data.dataset=A1, B12, C55 ``` This will spawn multiple processes and run in parallel, automatically saving logs. --- ๐—›๐—ผ๐˜„ ๐—ฑ๐—ผ ๐˜†๐—ผ๐˜‚ ๐—บ๐—ฎ๐—ป๐—ฎ๐—ด๐—ฒ ๐˜†๐—ผ๐˜‚๐—ฟ ๐—ฐ๐—ผ๐—ป๐—ณ๐—ถ๐—ด๐˜‚๐—ฟ๐—ฎ๐˜๐—ถ๐—ผ๐—ป๐˜€? #deeplearning #artificialintelligence #machinelearning --- ๐Ÿ’ก Follow me for daily expert insights on AI/ML Engineering.


        91

        ๐—ช๐—ผ๐—ฟ๐—ธ๐—ถ๐—ป๐—ด ๐˜„๐—ถ๐˜๐—ต ๐—Ÿ๐—Ÿ๐— ๐˜€? You'll like this set of resources on LLM Post-Training ๐Ÿ‘‡ ๐˜๐˜ช๐˜ณ๐˜ด๐˜ต ๐˜ฐ๐˜ง๐˜ง, ๐˜ธ๐˜ฉ๐˜ข๐˜ต ๐˜ช๐˜ด ๐˜—๐˜ฐ๐˜ด๐˜ต-๐˜›๐˜ณ๐˜ข๐˜ช๐˜ฏ๐˜ช๐˜ฏ๐˜จ? Once a new Foundation Model (LLM/VLM) was pretrained from scratch on vast web-scale data, the focus is on post-training techniques to achieve further breakthroughs. If you've ever fine-tuned an LLM, you used a post-training technique. Here are a few examples: ๐—ง๐˜‚๐—ป๐—ถ๐—ป๐—ด โ†ณ PEFT โ†ณ Full Model Finetuning โ†ณ LoRA, Adapters โ†ณ Knowledge Distillation ๐—ฆ๐—ฐ๐—ฎ๐—น๐—ถ๐—ป๐—ด โ†ณ Chain of Thought (CoT) โ†ณ Tree of Thought (ToT) ๐—ฅ๐—ฒ๐—ถ๐—ป๐—ณ๐—ผ๐—ฟ๐—ฐ๐—ถ๐—ป๐—ด โ†ณ RLHF (reinforcement learning, human feedback) โ†ณ DPO (direct preference optimization) โ†ณRLAIF (reinforcement learning, AI feedback) .. and these only scratch the surface. This repo groups all techniques, with key papers, resources, and surveys on every major post-training technique out there. ๐Ÿ’ป LLM Post-Training: https://lnkd.in/dQj78C3X --- #artificialintelligence #deeplearning #machinelearning --- ๐Ÿ’ก Follow me for ๐—ฒ๐˜…๐—ฝ๐—ฒ๐—ฟ๐˜ ๐—ถ๐—ป๐˜€๐—ถ๐—ด๐—ต๐˜๐˜€ on AI/ML Engineering


          315

          An LLM model on HuggingFace has multiple files attached: ๐˜„๐—ต๐—ฎ๐˜'๐˜€ ๐˜๐—ต๐—ฒ๐—ถ๐—ฟ ๐—ฝ๐˜‚๐—ฟ๐—ฝ๐—ผ๐˜€๐—ฒ? ๐Ÿ‘‡ Each model repo on HF contains 3 tabs. The ๐— ๐—ผ๐—ฑ๐—ฒ๐—น ๐—–๐—ฎ๐—ฟ๐—ฑ with the model architecture, benchmarks, license, and other details. The ๐—™๐—ถ๐—น๐—ฒ๐˜€ section contains the actual model's files. The ๐—–๐—ผ๐—บ๐—บ๐˜‚๐—ป๐—ถ๐˜๐˜† ๐—ฆ๐—ฒ๐—ฐ๐˜๐—ถ๐—ผ๐—ป with discussion threads, pull requests, and other resources. ๐Ÿ”น Let's unpack the LLM file structure. 1๏ธโƒฃ ๐˜ผ๐™ง๐™˜๐™๐™ž๐™ฉ๐™š๐™˜๐™ฉ๐™ช๐™ง๐™š ๐˜พ๐™ค๐™ฃ๐™›๐™ž๐™œ๐™ช๐™ง๐™–๐™ฉ๐™ž๐™ค๐™ฃ This file, usually named config.json, contains metadata on model architecture, layer activations, sizes, vocabulary size, number of attention heads, model precision, and more. The transformers library knows how to parse this config and build the model architecture. 2๏ธโƒฃ ๐™ˆ๐™ค๐™™๐™š๐™ก ๐™’๐™š๐™ž๐™œ๐™๐™ฉ๐™จ Due to LLMs having B of parameters, the models are usually split into parts for safer download, as no one would like to download an 800GB model and get a network error, ending up with the entire model file being corrupted. These model weights come in either .bin format or .safetensors, a newer, safer format proposed by HuggingFace. Safetensors format is an alternative to the default Pickle serializer that PyTorch (pt) used, as itโ€™s vulnerable to code injection. 3๏ธโƒฃ ๐™‡๐™–๐™ฎ๐™š๐™ง ๐™ˆ๐™–๐™ฅ๐™ฅ๐™ž๐™ฃ๐™œ Since the models are large and weights come as part files (e.g., 0001-of-0006, 0002-of-0006, etc.), this file stores a sequential map of the model architecture, specifying which part file each layer has its weights in. 4๏ธโƒฃ ๐™๐™ค๐™ ๐™š๐™ฃ๐™ž๐™ฏ๐™š๐™ง ๐˜พ๐™ค๐™ฃ๐™›๐™ž๐™œ The tokenizer config file contains metadata about which tokenizer and configuration were used to train this model. It also shows the class name used to instantiate the tokenizer, the layer names, and how the inputs are processed before passing through the model. This also contains ๐˜ด๐˜ฑ๐˜ฆ๐˜ค๐˜ช๐˜ข๐˜ญ_๐˜ต๐˜ฐ๐˜ฌ๐˜ฆ๐˜ฏ๐˜ด, tokens not derived from input that LLM uses as markers to stop generation, mark the chat template, differentiate between text and image modalities, etc. 5๏ธโƒฃ ๐™‚๐™š๐™ฃ๐™š๐™ง๐™–๐™ฉ๐™ž๐™ค๐™ฃ ๐˜พ๐™ค๐™ฃ๐™›๐™ž๐™œ These configuration files contain metadata for Inference, such as Temperature and TopP/TopK thresholds or context window size the model was trained with. Also, it specifies the token IDs for the special tokens so that the tokenizer can append these IDs to the sequence. ๐—ฆ๐—ฒ๐—ฒ ๐˜๐—ต๐—ฒ ๐—ฑ๐—ถ๐—ฎ๐—ด๐—ฟ๐—ฎ๐—บ ๐—ฏ๐—ฒ๐—น๐—ผ๐˜„ ๐—ณ๐—ผ๐—ฟ ๐˜๐—ต๐—ฒ ๐˜€๐˜‚๐—บ๐—บ๐—ฎ๐—ฟ๐—ถ๐˜‡๐—ฒ๐—ฑ ๐˜ƒ๐—ฒ๐—ฟ๐˜€๐—ถ๐—ผ๐—ป ๐Ÿ‘‡ ----- #artificialintelligence #deeplearning #machinelearning ----- ๐Ÿ’ก Follow me for more ๐—ฒ๐˜…๐—ฝ๐—ฒ๐—ฟ๐˜ ๐—ถ๐—ป๐˜€๐—ถ๐—ด๐—ต๐˜๐˜€ on AI/ML Engineering.


            302

            These are some of my favorite resources for learning about applied GenAI & LLMs ๐Ÿ‘‡ I focus solely on long-form videos in this one. I know you might have seen multiple similar posts. But in this one, I've grouped 9 videos that have helped me pick low-level details on topics such as: โ†ณ Production RAG (Jerry Liu, LLamaIndex) โ†ณ LLM Inference Optimization (NVIDIA) โ†ณ High-Level Agentic Patterns (Neural Maze) โ†ณ Low-level Maths of Transformers (Unsloth) โ†ณ Transformer-specific Hardware (Groq LPU) โ†ณ MCP (Anthropic) ๐Ÿ”น ๐—ง๐—ผ ๐˜€๐—ฎ๐˜ƒ๐—ฒ ๐˜†๐—ผ๐˜‚๐—ฟ ๐˜๐—ถ๐—บ๐—ฒ, ๐—ต๐—ฒ๐—ฟ๐—ฒ ๐—ฎ๐—ฟ๐—ฒ ๐—บ๐˜† ๐—ฟ๐—ฒ๐—ฐ๐—ผ๐—บ๐—บ๐—ฒ๐—ป๐—ฑ๐—ฎ๐˜๐—ถ๐—ผ๐—ป๐˜€: 1. For everyone, check the LLM Inference Optimization one from Mark Moyou (NVIDIA) 2. RAG-specific, see the lessons learned from Production RAG (Jerry Liu) 3. The video from Anthropic on MCP is long and detailed; feel free to skip sections. 4. Building and understanding LLMs, from Tokenizer to Inference - videos from Andrej and Sebastian. ๐—˜๐˜…๐˜๐—ฟ๐—ฎ: - If you're interested in Hardware, GPUs, and Architecture, check Igor's (Head of Silicon at Groq) walkthrough on how Groq LPUs work. - If you're interested in low-level maths in LLMs and how Unsloth optimizes training and inference - Daniel Han's video. Find all resources alongside other details in this article: ๐Ÿ“™ ๐—”๐—ฟ๐˜๐—ถ๐—ฐ๐—น๐—ฒ: https://lnkd.in/daNSm3Ct --- #deeplearning #machinelearning #artificialintelligence --- ๐Ÿ’ก Follow me, I help you learn AI/ML Engineering.


              261

              If you work with LLMs, you might like this ๐Ÿ‘‡ TransformerLab is an open-source toolkit for LLMs enabling fine-tuning, visualization, tracking, and inference with multiple HuggingFace models. Found out about it just yesterday, scrolling through /r/locallama subreddit. ๐—›๐—ฒ๐—ฟ๐—ฒ'๐˜€ ๐—ฎ ๐˜€๐—ต๐—ผ๐—ฟ๐˜ ๐˜€๐˜‚๐—บ๐—บ๐—ฎ๐—ฟ๐˜†: โ†ณ Integrates with HF Models โ†ณ Has built-in TensorBoard for experiment tracking โ†ณ It's got a built-in tokenizer visualizer โ†ณ Will enable a model architecture visualizer โ†ณ Every component is built as a plugin โ†ณ Interactive RAG Tab to quickly test a basic RAG application โ†ณ Cross OS (Linux, macOS) โ†ณ Integration with MLX, vLLM, llama.cpp inference engines โ†ณ RLHF and Preference Optimization ๐Ÿ“™ Docs: https://lnkd.in/d9RKJK7Y ๐Ÿ’ป Code: https://lnkd.in/dBr5r5hy --- #artificialintelligence #machinelearning #llm --- ๐Ÿ’ก Follow me for more expert insights on AI/ML!


              325

              If you're still using FastAPI to deploy Hugging Face LLMs/VLMs - try ๐—Ÿ๐—ถ๐˜๐—”๐—ฝ๐—ถ! FastAPI is a great framework for implementing RESTful APIs. However, it wasnโ€™t specifically designed to handle the complex requirements of serving ML models at scale. The team at Lightning AI is behind LitServe and LitApi to fill in that gap. ๐Ÿ”น ๐—Ÿ๐—ถ๐˜๐—”๐—ฃ๐—œ builds on top of FastAPI, adapting for ML workloads and standardizing the core steps of serving a model. ๐Ÿ”น ๐—Ÿ๐—ถ๐˜๐—ฆ๐—ฒ๐—ฟ๐˜ƒ๐—ฒ๐—ฟ handles the infrastructure side of serving models. ๐Ÿ”ธ Here's what you must know: 1. ๐—ข๐—ป๐—ฒ-๐˜๐—ถ๐—บ๐—ฒ ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น ๐˜€๐—ฒ๐˜๐˜‚๐—ฝ In the ๐™จ๐™š๐™ฉ๐™ช๐™ฅ() method, we can load any model only once. 2. ๐—–๐˜‚๐˜€๐˜๐—ผ๐—บ๐—ถ๐˜‡๐—ฒ ๐—ฃ๐—ฟ๐—ฒ๐—ฑ๐—ถ๐—ฐ๐˜ In the ๐™ฅ๐™ง๐™š๐™™๐™ž๐™˜๐™ฉ() method, we implement the inference on input logic. 3. ๐—–๐˜‚๐˜€๐˜๐—ผ๐—บ๐—ถ๐˜‡๐—ฒ ๐—•๐—ฎ๐˜๐—ฐ๐—ต๐—ถ๐—ป๐—ด ๐—Ÿ๐—ผ๐—ด๐—ถ๐—ฐ You can specify a MAX_BATCH_SIZE and a BATCH_TIME_WINDOW, and it'll automatically handle the dynamic batching of requests as they come in concurrently. You can use ThreadPoolExecutor to parallelize the preprocessing steps in the ๐™—๐™–๐™ฉ๐™˜๐™() method. 4. ๐—–๐˜‚๐˜€๐˜๐—ผ๐—บ๐—ถ๐˜‡๐—ฒ ๐—จ๐—ป๐—ฏ๐—ฎ๐˜๐—ฐ๐—ต๐—ถ๐—ป๐—ด ๐—Ÿ๐—ผ๐—ด๐—ถ๐—ฐ After inferencing on a batch, you'll handle the detach () of GPU tensors and post-process the raw logits in the ๐™ช๐™ฃ๐™—๐™–๐™ฉ๐™˜๐™() method. 5. ๐——๐—ฒ๐—ฐ๐—ผ๐—ฑ๐—ฒ ๐—ฟ๐—ฒ๐—พ๐˜‚๐—ฒ๐˜€๐˜ ๐—ฎ๐—ป๐—ฑ ๐—ฒ๐—ป๐—ฐ๐—ผ๐—ฑ๐—ฒ ๐—ฟ๐—ฒ๐˜€๐—ฝ๐—ผ๐—ป๐˜€๐—ฒ In the ๐™™๐™š๐™˜๐™ค๐™™๐™š_๐™ง๐™š๐™ฆ๐™ช๐™š๐™จ๐™ฉ() - specify how the API should access the input value from the request. In the ๐™š๐™ฃ๐™˜๐™ค๐™™๐™š_๐™ง๐™š๐™จ๐™ฅ๐™ค๐™ฃ๐™จ๐™š() - specify how the API should return responses to the client. Simple as that! To scale this up for a production workload, you'll use LitServe's scale configuration parameters: ``` LitServer( lit_api: LitAPI, accelerator: str = "auto", devices: Union[str, int] = "auto", workers_per_device: int = 1, timeout: Union[float, bool] = 30, max_batch_size: int = 1, batch_timeout: float = 0.0, stream: bool = False, ) ``` ๐Ÿ“™ For a full tutorial, see this article: https://lnkd.in/dGUrVX7s --- #machinelearning #deeplearning #artificialintelligence --- ๐Ÿ’ก Follow me for ๐—ฒ๐˜…๐—ฝ๐—ฒ๐—ฟ๐˜ ๐—ถ๐—ป๐˜€๐—ถ๐—ด๐—ต๐˜๐˜€ on AI/ML Engineering


                1k

                The ๐—”๐—œ/๐— ๐—Ÿ ๐—˜๐—ป๐—ด๐—ถ๐—ป๐—ฒ๐—ฒ๐—ฟ'๐˜€ ๐—ด๐˜‚๐—ถ๐—ฑ๐—ฒ ๐˜๐—ผ ๐—บ๐˜‚๐˜€๐˜-๐—ธ๐—ป๐—ผ๐˜„ NVIDIA AI Frameworks Forget about the "Vibe Coding" frenzy and make sure you know what each of these is doing. ๐Ÿ‘‡ 1๏ธโƒฃ ๐—–๐—จ๐——๐—” Parallel computing platform and API to accelerate computation on NVIDIA GPUs. Keypoints: โ†ณ Kernels - C/C++ functions. โ†ณ Thread - executes the kernel instructions. โ†ณ Block - groups of threads. โ†ณ Grid - collection of blocks. โ†ณ Streaming Multiprocessor (SM) - processor units that execute thread blocks. When a CUDA program invokes a kernel grid, the thread blocks are distributed to the SMs. CUDA follows the SIMT (Single Instruction Multiple Threads) architecture to execute threads logic and uses a Barrier to gather and synchronize Threads. 2๏ธโƒฃ ๐—ฐ๐˜‚๐——๐—ก๐—ก Library with highly tuned implementations for standard routines such as: โ†ณ forward and backward convolution โ†ณ attention โ†ณ matmul, pooling, and normalization - which are used in all NN Architectures. 3๏ธโƒฃ ๐—ง๐—ฒ๐—ป๐˜€๐—ผ๐—ฟ๐—ฅ๐—ง If we unpack a model architecture, we have multiple layer types, operations, layer connections, activations, etc. Imagine an NN architecture as a complex Graph of operations. TensorRT can: โ†ณ Scan that graph โ†ณ Identify bottlenecks โ†ณ Optimize โ†ณ Remove, merge layers โ†ณ Reduce layer precisions, โ†ณ Many other optimizations. 4๏ธโƒฃ ๐—ง๐—ฒ๐—ป๐˜€๐—ผ๐—ฟ๐—ฅ๐—ง-๐—Ÿ๐—Ÿ๐—  Inference Engine that brings the TensorRT Compiler optimizations to Transformer-based models. Covers the advanced and custom requirements for LLMs, such as: โ†ณ KV Caching โ†ณ Inflight Batching โ†ณ Optimized Attention Kernels โ†ณTensor Parallel โ†ณ Pipeline Parallel. 5๏ธโƒฃ ๐—ง๐—ฟ๐—ถ๐˜๐—ผ๐—ป ๐—œ๐—ป๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐—ฐ๐—ฒ ๐—ฆ๐—ฒ๐—ฟ๐˜ƒ๐—ฒ๐—ฟ An open source, high-performance, and secure serving system for AI Workloads. Devs can optimize their models, define serving configurations in Protobuf Text files, and deploy. It supports multiple framework backends, including: โ†ณ Native PyTorch, TensorFlow โ†ณ TensorRT, TensorRT-LLM โ†ณ Custom BLS (Bussiness Language Scripting) with Python Backends 6๏ธโƒฃ ๐—ก๐—ฉ๐—œ๐——๐—œ๐—” ๐—ก๐—œ๐—  Set of plug-and-play inference microservices that package up multiple NVIDIA libraries and frameworks highly tuned for serving LLMs to production cluster & datacenters scale. It has: โ†ณ CUDA, cuDNN โ†ณ TensorRT โ†ณ Triton Server โ†ณ Many other libraries - baked in. NIM provides the optimal serving configuration for an LLM. 7๏ธโƒฃ ๐——๐˜†๐—ป๐—ฎ๐—บ๐—ผ ๐—œ๐—ป๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐—ฐ๐—ฒ ๐—™๐—ฟ๐—ฎ๐—บ๐—ฒ๐˜„๐—ผ๐—ฟ๐—ธ The newest inference framework for accelerating and scaling GenAI workloads. Composed of modular blocks, robust and scalable. Implements: โ†ณ Elastic compute - GPU Planner โ†ณ KV Routing, Sharing, and Caching โ†ณ Disaggregated Serving of Prefill and Decode. --- #deeplearning #artificialintelligence #machinelearning --- ๐Ÿ’ก Follow me for more practical expert insights on AI/ML Engineering.


                  385

                  Want to drive more opportunities from LinkedIn?

                  Content Inspiration, AI, scheduling, automation, analytics, CRM.

                  Get all of that and more in Taplio.

                  Try Taplio for free