Get the Linkedin stats of Paul Iusztin and many LinkedIn Influencers by Taplio.
open on linkedin
I am a senior machine learning engineer and contractor with ๐ฒ+ ๐๐ฒ๐ฎ๐ฟ๐ ๐ผ๐ณ ๐ฒ๐ ๐ฝ๐ฒ๐ฟ๐ถ๐ฒ๐ป๐ฐ๐ฒ. I design and implement modular, scalable, and production-ready ML systems for startups worldwide. My central mission is to build data-intensive AI/ML products that serve the world. Since training my first neural network in 2017, I have 2 passions that fuel my mission: โ Designing and implementing production AI/ML systems using MLOps best practices. โ Teaching people about the process. . I currently develop production-ready Deep Learning products at Metaphysic, a leading GenAI platform. In the past, I built Computer Vision and MLOps solutions for CoreAI, Everseen, and Continental. Also, I am the Founder of Decoding ML, a channel for battle-tested content on learning how to design, code, and deploy production-grade ML and MLOps systems. I am writing articles and posts each week on: - ๐๐ช๐ฏ๐ฌ๐ฆ๐ฅ๐๐ฏ: 29k+ followers - ๐๐ฆ๐ฅ๐ช๐ถ๐ฎ: 2.5k+ followers ~ ๐ https://medium.com/@pauliusztin - ๐๐ถ๐ฃ๐ด๐ต๐ข๐ค๐ฌ (๐ฏ๐ฆ๐ธ๐ด๐ญ๐ฆ๐ต๐ต๐ฆ๐ณ): 6k+ followers ~ ๐ https://decodingml.substack.com/ . If you want to learn how to build an end-to-end production-ready LLM & RAG system using MLOps best practices, you can take Decoding MLโs self-guided free course: โ ๐๐๐ ๐๐ธ๐ช๐ฏ ๐๐ฐ๐ถ๐ณ๐ด๐ฆ: ๐๐ถ๐ช๐ญ๐ฅ๐ช๐ฏ๐จ ๐ ๐ฐ๐ถ๐ณ ๐๐ณ๐ฐ๐ฅ๐ถ๐ค๐ต๐ช๐ฐ๐ฏ-๐๐ฆ๐ข๐ฅ๐บ ๐๐ ๐๐ฆ๐ฑ๐ญ๐ช๐ค๐ข ~ ๐ https://github.com/decodingml/llm-twin-course . ๐ฌ If you need machine learning solutions for your business, letโs discuss! ๐ Only open to full remote positions as a contractor. . Contact: ๐ฑ Phone: +40 732 509 516 โ๏ธ Email: p.b.iusztin@gmail.com ๐ป Decoding ML: https://linktr.ee/decodingml ๐ต๐ปโโ๏ธ Personal site & Socials: https://www.pauliusztin.me/
Check out Paul Iusztin's verified LinkedIn stats (last 30 days)
Use Taplio to search all-time best posts
Hereโs the problem with most AI books: They teach the model, not the system. Which is fine... until you try to deploy that model in production. Thatโs where everything breaks: - Your RAG pipeline is duct-taped together - Your eval framework is an afterthought - Your prompts arenโt versioned - Your architecture canโt scale Thatโs why Maxime and I wrote the LLM Engineerโs Handbook... We wanted to create a practical guide for AI engineers who build real world AI applications. This isnโt just another guide... It's a practical road map for designing and deploying real-world LLM systems. In the book, we cover: โ Efficient fine-tuning workflows โ RAG architectures โ Evaluation pipelines with LLM-as-judge โ Scaling strategies for serving + infra โ MLOps + LLMOps patterns baked in Whether youโre building your first assistant or scaling your 10th RAG app... This book gives you the mental models and engineering scaffolding to do it right. ๐ Here's the link to get your copy: https://lnkd.in/dVgFJtzF
Everyone chunks documents for retrieval. But what if thatโs the wrong unit? Let me explain.. In standard RAG, we embed small text chunks and pass those into the LLM as context. Itโs simple, but flawed. Why? Because small chunks are great for retrieval precision, but terrible for generation context. Thatโs where Parent Retrieval comes in. (aka small-to-big retrieval) Hereโs how it works: โ You split your documents into small chunks โ You embed and retrieve using those small chunks โ But you donโt pass the chunk to the LLM... โ You pass the parent document that the chunk came from The result? โ Precise semantic retrieval (thanks to small, clean embeddings that encode a single entity) โ Rich generation context (because the LLM sees the broader section) โ Fewer hallucinations โ Less tuning needed around chunk size and top-k Itโs one of the few advanced RAG techniques that work in production. No fancy agents. No latency bombs. No retraining. We break it all down (with diagrams and code examples) in ๐๐ฒ๐๐๐ผ๐ป ๐ฑ ๐ผ๐ณ ๐๐ต๐ฒ ๐ฆ๐ฒ๐ฐ๐ผ๐ป๐ฑ ๐๐ฟ๐ฎ๐ถ๐ป ๐๐ ๐๐๐๐ถ๐๐๐ฎ๐ป๐ ๐ฐ๐ผ๐๐ฟ๐๐ฒ. ๐ Link to the full lesson in the comments.
The #1 mistake in building LLM agents? Thinking the project ends at reasoning. Here's when it actually ends: When your agent can talk to the world securely, reliably, and in real time. And thatโs what ๐๐ฒ๐๐๐ผ๐ป ๐ฐ ๐ผ๐ณ ๐๐ต๐ฒ ๐ฃ๐ต๐ถ๐น๐ผ๐๐ด๐ฒ๐ป๐๐ ๐ฐ๐ผ๐๐ฟ๐๐ฒ is all about. Up to this point, we focused on making our agents think: โ Philosophical worldviews โ Context-aware reasoning โ Memory-backed conversations But intelligence alone isnโt enough. To be useful, agents need a voice. To be deployable, they need an interface. To be real, they need to exist as APIs. This lesson is the bridge from the local prototype to the live system. Hereโs what youโll learn: โ How to deploy your agent as a REST API using FastAPI โ How to stream responses token-by-token with WebSockets โ How to wire up a clean backendโfrontend architecture using FastAPI (web server) + Phaser (game interface) โ How to think about agent interfaces in real-world products (not just demos) In short: ๐ง๐ต๐ถ๐ ๐ถ๐ ๐ต๐ผ๐ ๐๐ผ๐ ๐๐ต๐ถ๐ฝ ๐ฎ๐ป ๐ฎ๐ด๐ฒ๐ป๐ ๐๐ต๐ผ ๐ฟ๐ฒ๐ฎ๐๐ผ๐ป๐ ๐๐ก๐ ๐ฟ๐ฒ๐๐ฝ๐ผ๐ป๐ฑ๐ ๐ถ๐ป ๐ฝ๐ฟ๐ผ๐ฑ๐๐ฐ๐๐ถ๐ผ๐ป. Shoutout to Anca-Ioana Martin for helping shape this lesson and write the deep-dive article. And of course... big thanks to my co-creator Miguel Otero Pedrido for the ongoing collab. ๐ Link to Lesson 4 in the comments.
I need your opinion ๐ซต If you've used the LLM Engineer's Handbook to bring your AI project idea to life, Maxime Labonne and I would love to hear about it! ๐๐ช๐ท๐ช๐ฏ๐จ ๐บ๐ฐ๐ถ ๐ต๐ฉ๐ฆ ๐ค๐ฉ๐ข๐ฏ๐ค๐ฆ ๐ต๐ฐ ๐ฆ๐ข๐ณ๐ฏ $500. Our bestseller, LLM Engineer's Handbook, has helped thousands build and deploy their own LLM and RAG systems from scratch. ๐๐ถ๐ฟ๐๐, as a writer and educator, I would love to see how Maxime's and my book helped you in your AI Engineering journey. As we've written this book out of passion, that will mean the world to us. ๐ฆ๐ฒ๐ฐ๐ผ๐ป๐ฑ๐น๐, Packt is organizing a contest where you share on social media what you've built and how the book helped you navigate the spaghetti world of AI. The first winner will receive $500. The next five spots will earn a free Packt subscription, giving them access to all Packt's books. ๐ ๐ฐ๐ถ ๐ค๐ข๐ฏ ๐ด๐ถ๐ฃ๐ฎ๐ช๐ต ๐ต๐ฉ๐ฆ ๐ฑ๐ฐ๐ด๐ต ๐ถ๐ฏ๐ต๐ช๐ญ ๐๐ข๐บ 25! ๐ Find more details here: https://lnkd.in/dExZAc5i Looking forward to seeing what you've built!
In 2024, everyone was chasing AI hype. In 2025, people are finally beginning to ask the most important question: ๐๐ฎ๐ป ๐๐ผ๐ ๐ฏ๐๐ถ๐น๐ฑ ๐๐ผ๐บ๐ฒ๐๐ต๐ถ๐ป๐ด ๐ฟ๐ฒ๐ฎ๐น? If your answer to that is "no", don't worry, I've got you... My friend, @shawtalebi, has put together one of the most practical programs to teach you how to build actual AI projects. It's called ๐ง๐ต๐ฒ ๐๐ ๐๐๐ถ๐น๐ฑ๐ฒ๐ฟ๐ ๐๐ผ๐ผ๐๐ฐ๐ฎ๐บ๐ฝ. Over the course of 6 weeks, you'll go deep on: โ LLMs and prompt engineering โ RAG and embeddings โ Fine-tuning and evaluation โ Tool use and agent flows โ AI project management frameworks And you'll also ship real projects, such as: โ A RAG chatbot over blog content โ A local document QA assistant โ An AI-powered job scraper and dashboard โ A fine-tuned text classifier โ A structured survey summarize All with expert guidance, peer feedback, and clean, reusable code you can take into your next product or freelance project. What I love most about this program? Itโs not tool-first. Itโs not hype-first. Itโs build-first. Youโll walk away with: - A repeatable system for shipping AI MVPs - The confidence to turn vague ideas into working prototypes - The clarity to ignore noise and focus on what matters Want it? The link is in the comments. P.S. Use code PAUL100 for $100 off - the next cohort kicks off June 6th.
One of the best talks I had on AI, LLMs, RAG and how to build and ship real-world products. 100% recommend Nicolay Christopher Gerold podcast. One of the best out there โ
Nicolay Christopher Gerold
"I see LangChain and similar tools as low-code solutions. Good for prototyping, but I'd throw them away for any serious project" Today on How AI Is Built, I have the chance to talk to Paul Iusztin, who's spent 8 years in AI - from writing CUDA kernels in C++ to building modern LLM applications at Decoding ML. His philosophy is refreshingly simple: stop overthinking, start building, and let patterns emerge through use. He uses LangChain and similar tools for quick prototyping - maybe an hour or two to validate an idea - then throws them away completely. "They're low-code tools," he says. "Not good frameworks to build on top of." Yes, it's more work upfront. But when you need to debug or scale, you'll thank yourself. In the podcast, we also cover: - Why fine-tuning is almost always the wrong choice (shoutout to Hamel Husain) - The "just-in-time" learning approach for staying sane in AI - Building writing assistants that actually preserve your voice - Why robots, not chatbots, are the real endgame Full episode below. โป๏ธ Pay it forward by sharing โป๏ธ
Super excited to see what youโve built! ๐ค
Packt
Our bestseller *LLM Engineerโs Handbook* has helped thousands build and deploy their own large language models from scratch โ now, itโs your turn to show the world what youโve built! ๐ฌ Share a short video demonstrating the #LLM you designed using the 'LLM Engineerโs Handbook'. Tell us about your process, what you built, and how the book helped you get there. ๐ Whatโs in it for you? ๐ฅ First prize: $500 ๐ First five Runner-ups: A free Packt subscription to keep learning and building *Create a post (video/still) telling us:* 1. What you built 2. How LLM Engineerโs Handbook helped 3. Any exciting breakthroughs or challenges you overcame โ *To participate:* Post about what you have built on LinkedIn, Twitter, Youtube (any other and as many channels as possible) (Tip: brownie points if youโre posting a video) 1. Tag Packt in your post 2. Tag the authors - Paul Iusztin Maxime Labonne 3. Use #BuildwithLLMEnggHB 4. Fill so we know youโve entered - https://packt.link/MKbh0 ๐ Last date to submit: May 25 ๐ฃ Winners announced: May 27 *Remember:* ๐ Our Expert Panel will select the winner and the runner-ups. ๐The best projects will be featured by Packt. ๐ Havenโt read the book yet? Grab your copy here: https://packt.link/dZAxf Letโs build #LLMs, inspire others, and celebrate innovation together. ๐ก๐ง #BuildWithLLMEnggHB
LangChain and Llama index are great entry points for building LLM apps. But itโs a huge red flag you're using them in production. Why? Because most LLM frameworks are just like low-code tools. โ Great for exploring concepts โ Fast to build a demo โ Terrible when you need control The moment your system demands: โ Custom memory flows โ Non-trivial evaluation pipelines โ Agent logic across multiple tools โ Database-level optimizations You hit a wall. And no amount of chaining can fix it. My advice? If your app depends on data ingestion, embedding, retrieval, and synthesis, just build those pieces from scratch. Itโs the only way to - โ Know whatโs actually happening under the hood โ Tune for latency and scale โ Own your system end-to-end We unpacked this in depth during the latest DataFramed podcast DataCamp. Maxime and I talked about what it actually takes to ship real-world AI systems. Want to check it out? The link is in the comments.
You donโt become an AI engineer by tweaking someone elseโs notebook. You do it by building real systems, end-to-end. Thatโs exactly what these 5 open-source courses teach you to do. At Decoding ML, we were tired of surface-level tutorials that only scratched the surface of LLMs, RAG, and AI agents. So we built the kind of learning experience we wished we had when starting: - Project-based - Opinionated - Production-ready. No fake data or hand-waving over infra. Just real-world projects backed by engineering best practices: โ Modular Python architecture โ Full-stack MLOps + LLMOps โ RAG, agents, and evaluation systems โ Fine-tuning, serving, and containerization โ Building full-fledged end-to-end systems. And yesโฆ youโll need to sweat through the hard parts. Because these arenโt one-notebook tutorials or weekend demos. These are full-stack, real-world AI systems with multiple components, modular architecture, and production-level complexity. We teach you how to: - Connect custom pipelines across ingestion, retrieval, and inference - Orchestrate agents with memory, reasoning, and tool use - Containerize, serve, and version your models like a real AI engineer - Monitor, evaluate, and iterate using observability best practices Here's exactly what you'll build: ๐ญ. ๐ฃ๐ต๐ถ๐น๐ผ๐๐ด๐ฒ๐ป๐๐ (๐๐ถ๐๐ต The Neural Maze) Build a character simulation engine powered by RAG agents, memory, and real-time inference. โ Learn LangGraph, RAG agents, Observability, and shipping agents as real-time APIs. ๐ฎ. ๐ฆ๐ฒ๐ฐ๐ผ๐ป๐ฑ ๐๐ฟ๐ฎ๐ถ๐ป ๐๐ ๐๐๐๐ถ๐๐๐ฎ๐ป๐ Chat with your knowledge base using a custom agentic RAG system. โ Learn modular RAG pipelines, fine-tuning LLMs, full-stack deployment, and LLMOps. ๐ฏ. ๐๐บ๐ฎ๐๐ผ๐ป ๐ง๐ฎ๐ฏ๐๐น๐ฎ๐ฟ ๐ฆ๐ฒ๐บ๐ฎ๐ป๐๐ถ๐ฐ ๐ฆ๐ฒ๐ฎ๐ฟ๐ฐ๐ต Build a natural language product RAG search engine for structured data. โ Learn hybrid retrieval leveraging tabular data, embeddings, and metadata filtering. ๐ฐ. ๐๐๐ ๐ง๐๐ถ๐ป Create your own digital AI replica that reflects your knowledge and communication style. โ Learn LLM fine-tuning, RAG, vector DBs, and building end-to-end LLMOps systems. ๐ฑ. ๐&๐ ๐ฅ๐ฒ๐ฎ๐น-๐ง๐ถ๐บ๐ฒ ๐ฅ๐ฒ๐ฐ๐ผ๐บ๐บ๐ฒ๐ป๐ฑ๐ฒ๐ฟ Deploy a neural fashion recommender on Kubernetes using Hopsworks + KServe. โ Learn real-time recommender systems, LLM-augmented recsys, and MLOps workflows. Everything is FREE. All you have to do is: โ Clone the GitHub repo โ Open the Substack lesson โ Run the code + follow the guide โ Remix it and build your own production AI system If you're serious about going from "learning AI" to actually shipping it, this is where to start. The link is in the comments.
LangChain suggests you should take our PhiloAgents course to get into AI agents ready for production ๐ฅ Such an amazing work Miguel Otero Pedrido Love this collaboration!
LangChain
๐ค๐ PhiloAgents Build AI agents that impersonate philosophers with LangGraph in this OSS repo covering RAG implementation, real-time conversations, and system architecture with FastAPI & MongoDB integration. Start building philosophical agents! ๐ https://lnkd.in/gJ9NyH8X
Hugging Face released a new open-source course on The Model Context Protocol (MCP) The course is divided into 4 units. These will take you from the basics of Model Context Protocol to a final project implementing MCP in an AI application. ๐ Check it out: https://lnkd.in/d9awb4dJ
You canโt build human-like agents without human-like memory. But most builders skip this part entirely. They focus on prompts, tools, and orchestration. But forget the system that holds it all together... Memory. In humans, memory is layered: โ Working memory for what's happening right now โ Semantic memory for facts and general knowledge โ Procedural memory for skills and habits โ Episodic memory for lived experience Agents are no different. If you want believable, useful, context-aware AI... You MUST architect memory intentionally. Hereโs a breakdown of short and long-term memory types: - ๐ฆ๐ต๐ผ๐ฟ๐-๐๐ฒ๐ฟ๐บ ๐บ๐ฒ๐บ๐ผ๐ฟ๐ Stores active conversation threads and recent steps. This is your context window. Lose it, and your agent resets after every turn. For long-term memory, we have: - ๐ฆ๐ฒ๐บ๐ฎ๐ป๐๐ถ๐ฐ ๐บ๐ฒ๐บ๐ผ๐ฟ๐ Factual world knowledge retrieved through vector search or RAG. Think: โWhatโs the capital of France?โ or โWhat is stoicism?โ - ๐ฃ๐ฟ๐ผ๐ฐ๐ฒ๐ฑ๐๐ฟ๐ฎ๐น ๐บ๐ฒ๐บ๐ผ๐ฟ๐ Defines what your agent knows to do, encoded directly in your code. From simple templates to complex reasoning flowsโthis is your logic layer. - ๐๐ฝ๐ถ๐๐ผ๐ฑ๐ถ๐ฐ ๐บ๐ฒ๐บ๐ผ๐ฟ๐ Stores user-specific past interactions. Itโs what enables continuity, personalization, and learning over time. In our ๐ฃ๐ต๐ถ๐น๐ผ๐๐ด๐ฒ๐ป๐๐ ๐ฐ๐ผ๐๐ฟ๐๐ฒ, we show how to wire all of this together. โ Using MongoDB for structured memory โ Using LangGraph (by LangChain) to control memory flow โ Using Groq for real-time LLM inference โ And even using Opik (by @company_cometml) to evaluate how memory shapes performance TL;DR: A smart agent isnโt one that just thinks well... Itโs one that remembers well, too. ๐ Learn more here: https://lnkd.in/d5ySvC_s
Unpopular opinion: fine-tuning is not hard. You know what is? Choosing HOW to fine-tune. There was one rule we stuck by when we began training our summarization LLM in the Second Brain course - Use a toolbelt that just works for 99% of use cases and ignore the 1% of edge cases that require GPU wizardry or DevOps magic. Hereโs what we landed on: ๐ ๐ง๐ฅ๐ โ ๐๐๐ด๐ด๐ถ๐ป๐ด ๐๐ฎ๐ฐ๐ฒโ๐ ๐ฏ๐ฎ๐๐๐น๐ฒ-๐๐ฒ๐๐๐ฒ๐ฑ ๐ณ๐ถ๐ป๐ฒ-๐๐๐ป๐ถ๐ป๐ด ๐น๐ถ๐ฏ๐ฟ๐ฎ๐ฟ๐ Perfect for both SFT and preference alignment. Maintained, well-documented, and up-to-date with the latest algorithms. โก๏ธ ๐จ๐ป๐๐น๐ผ๐๐ต โ ๐๐ถ๐ด๐ต๐๐๐ฒ๐ถ๐ด๐ต๐ ๐ณ๐ถ๐ป๐ฒ-๐๐๐ป๐ถ๐ป๐ด ๐ฎ๐ ๐ถ๐๐ ๐ฏ๐ฒ๐๐ Built by Daniel Han and Michael Han (Unsloth), Unsloth AI is making waves- and for good reason: โ 2x faster training โ Up to 80% less VRAM usage โ GGUF quantization for local deployment โ Works with Llama.cpp and Ollama โ Actively fixing bugs in open models alongside Meta, Google, and Microsoft We used it to fine-tune a Llama 3.1 8B model on a T4 GPU: - 70% less VRAM - Full fine-tuning on commodity hardware - Same results for a fraction of the cost ๐ ๐๐ผ๐บ๐ฒ๐ โ ๐ง๐ฟ๐ฎ๐ฐ๐ธ ๐ฒ๐๐ฒ๐ฟ๐๐๐ต๐ถ๐ป๐ด ๐๐ต๐ฎ๐ ๐บ๐ฎ๐๐๐ฒ๐ฟ๐ Your training logs shouldnโt live in screenshots. Comet helped us version runs, compare experiments, and debug without chaos. โ The result? Fast, reproducible, and low-cost fine-tuning that scales. If youโre building your own fine-tuning pipeline, this trio will carry you far. Unless you enjoy bleeding-edge painโฆ thereโs no reason to reinvent this setup. โ Full breakdown in Lesson 5 of the PhiloAgents course (Link in comments)
Claudeโs leaked system prompt just confirmed what we all suspected:
Vertical > General (No AGI).
The best LLMs wonโt do everything*.*
Theyโll do one thing extremely well.
I read all 22,000 words of Claude's leaked system promptโฆ
It wasnโt some vague, high-level โyou are a helpful assistantโ instruction set.
It was a deeply engineered blueprint custom-built for one job.
โ Code-heavy tasks in JavaScript and Python
Hereโs what stood out (and what it signals about where LLMs are heading):
๐ญ. ๐๐ ๐๐๐ฒ๐ ๐ซ๐ ๐ ๐๐ผ ๐๐๐ฟ๐๐ฐ๐๐๐ฟ๐ฒ ๐ถ๐๐ ๐๐ต๐ถ๐ป๐ธ๐ถ๐ป๐ด
No, โYou are a helpful assistant.โ This is industrial-grade logic.
It segments instructions into reusable XML tags:
90% of AI engineers are dangerously abstracted from reality. They work with: โ Prebuilt models โ High-level APIs โ Auto-magical cloud tools But hereโs the thing - If you donโt understand how these tools actually work, youโll always be guessing when something breaks. Thatโs why the best AI engineers I know go deeper... They understand: How Git actually tracks changes. How Redis handles memory. How Docker isolates environments. If youโre serious about engineering, you'd go build the tools you use. And itโs why I recommend CodeCrafters.io (YC S22) You wonโt just learn tools. Youโll rebuild them (from scratch). โ Git, Redis, Docker, Kafka, SQLite, Shell... โ Step by step, test by test โ In your favorite language (Rust, Python, Go, etc.) Itโs perfect for AI engineers who want to: โ Level up their backend + system design skills โ Reduce debugging time in production โ Build apps that actually scale under load And most importantly... โ Stop being a model user โ Start being a systems thinker If I had to level up my engineering foundations today, CodeCrafters is where Iโd start. The ink is in the comments. P.S. We only promote tools we use or would personally take. P.S.S. Subscribe with my affiliate link to get a 40% discount :)
RAG isnโt your bottleneck. Blind deployment is. Everyoneโs obsessed with squeezing more performance out of their retrieval pipelines. Better chunking Better embeddings Better reranking All great. But none of that matters if you canโt fix what you donโt see. 90% of people building agents today donโt actually know what their agents are doing (especially when they go into production): โ Is the reasoning solid? โ Are prompt tweaks helping or hurting? โ Is performance degrading silently over time? By the time you notice somethingโs off... itโs already too late. Thatโs why ๐๐ฒ๐๐๐ผ๐ป ๐ฑ ๐ผ๐ณ ๐๐ต๐ฒ ๐ฃ๐ต๐ถ๐น๐ผ๐๐ด๐ฒ๐ป๐๐ course is all about observability. Agents that produce ROI don't just sound smart... They are also measurable, versioned, and constantly improving. Hereโs what we cover in this lesson: โ How to monitor complex LLM traces in real-time using Opik โ How to version every prompt change for reproducibility โ How to generate eval sets and benchmark your agents โ How to run online and offline evaluation across your pipelines โ How observability fits into your LLMOps stack This is the part of agentic AI that separates demo projects from production systems. Huge thanks to Anca Ioana Muscalagiu for the deep-dive article. And as always, shout-out to Miguel Otero Pedrido for building this with me. Want to dive into lesson 5? Here you go - ๐ Article: https://lnkd.in/dRYgHyid ๐ฅ Video: https://lnkd.in/dEQ_Yv7n
This year, I gave my first EVER in-person talk. And the one thing I feared mostโฆ actually happened. Let me explain. Those who've been following me for a while would know I made a scary promise to myself: โStop hiding behind a keyboard. Start showing up in real life.โ So when I was invited to speak at QCon Software Development Conferences - one of Europeโs biggest software and AI conferences - I said had no choice but to say, "yes." Even though I was terrified. My talk was on The Data Backbone of LLM Systems. A 60-minute deep dive into the infrastructure behind real-world RAG, LLMs and LLMOps. The room was packed with senior engineers from companies like Netflix, Google, Confluent, and MongoDB. And 30 seconds before I startedโฆ My clicker broke. No slides. No backup. Just me, 120 people, and a frozen screen. But something kicked in... I tossed the clicker aside, walked to my laptop, and started speaking - manual slide switching and all. And somehowโฆ it worked. The presentation wasnโt perfect (I wasnโt expecting), but I learned a lot in what to do at my future talks. Still, I managed to: โ Scored 93% (vs conference average of 83%) โ Deliver every insight I came to share โ Walk off stage knowing Iโd crushed one of my biggest fears It was a personal turning point. Iโm proud of the lessons I shared on stage ... and Iโm even prouder of the one I learned off-stage: ๐๐ผ๐๐ฟ๐ฎ๐ด๐ฒ ๐ฐ๐ผ๐บ๐ฝ๐ผ๐๐ป๐ฑ๐. Excited to see at what conference I will talk next! Thank you QCon Software Development Conferences for the platform. And thank you to everyone who showed up - you made this milestone unforgettable.
95% of agents never leave the notebook. And itโs not because the code is bad... Itโs because the system around them doesnโt exist. Here's my point: Anyone can build an agent that works in isolation. The real challenge is shipping one that survives real-world conditions (e.g., live traffic, unpredictable users, scaling demands, and messy data). That's exactly what we tackled in ๐๐ฒ๐๐๐ผ๐ป ๐ญ ๐ผ๐ณ ๐๐ต๐ฒ ๐ฃ๐ต๐ถ๐น๐ผ๐๐ด๐ฒ๐ป๐๐ ๐ฐ๐ผ๐๐ฟ๐๐ฒ. We started by asking, "What does an agent need to survive in production?" And decided on 4 things - It needs an LLM to run in real-time. A memory to understand what just happened. A brain that can reason and retrieve factual information. And a monitor to ensure it all works under load. So we designed a system around those needs. The frontend is where the agent comes to life. We used Phaser to simulate a browser-based world. But more important than the tool is the fact that this layer is completely decoupled from the backend. (so game logic and agent logic evolve independently) The backend, built in FastAPI, is where the agent thinks. We stream responses token-by-token using WebSockets. All decisions, tool calls, and memory management happen server-side. Inside that backend sits the agentic core - a dynamic state graph that lets the agent reason step-by-step. The agent is orchestrated by LangGraph and powered by Groq for real-time inference speeds. It can ask follow-up questions, query external knowledge, or summarize whatโs already been said (all in a loop). When the agent needs facts, it queries long-term memory. We built a retrieval system that mixes semantic and keyword search, using cleaned, de-duplicated philosophical texts crawled from the open web. That memory lives in MongoDB and gets queried in real time. Meanwhile, short-term memory tracks the conversation thread across turns. Without it, every new message would be a reset. With it, the agent knows whatโs been said, whatโs been missed, and how to respond. But hereโs the part most people skip: observability. If you want to improve your system, you need to see and measure what it's doing. Using Opik (by Comet), we track every prompt, log every decision, and evaluate multi-turn outputs using automatically generated test sets. Put it all together and you get a complete framework that remembers, retrieves, reasons, and responds in a real-world environment. Oh... and we made the whole thing open source. ๐ Link: https://lnkd.in/d8-QbhCd P.S. Special shout out to my co-creator Miguel Otero Pedrido
90% of RAG systems struggle with the same bottleneck: (And better LLMs are not the solution) It's retrieval. And most teams donโt realize it because they rush to build without proper evaluation. Before I tell you how to fix this, let me make something clear - ๐ก๐ฎ๐ถ๐๐ฒ ๐ฅ๐๐ ๐ถ๐ ๐ฒ๐ฎ๐๐. You chunk some docs, embed them, drop a top_k retriever on top, and call it a pipeline. Getting it production-ready? Thatโs where most teams stall. โ They get hallucinations. โ They miss key info. โ Their outputs feel... off. Why? Because the quality of generation is downstream of the quality of context. ... and naive RAG often pulls in irrelevant or partial chunks that confuse the LLM. If you're serious about improving your system, here's the progression that actually works: ๐ฆ๐๐ฒ๐ฝ ๐ญ: ๐๐ถ๐ ๐๐ต๐ฒ ๐๐ฎ๐๐ถ๐ฐ๐ These โtable-stakesโ upgrades outperform fancy models most of the time: โ Smarter Chunking - Dynamic over fixed-size. Respect structure. โ Chunk Size Tuning - Too long = loss in the middle. Too short = fragmented context. โ Metadata Filtering - Boosts precision by narrowing scope semantically and structurally. โ Hybrid Search - Combine vector + keyword filtering. ๐ฆ๐๐ฒ๐ฝ ๐ฎ: ๐๐ฎ๐๐ฒ๐ฟ ๐ผ๐ป ๐๐ฑ๐๐ฎ๐ป๐ฐ๐ฒ๐ฑ ๐ฅ๐ฒ๐๐ฟ๐ถ๐ฒ๐๐ฎ๐น When basic techniques arenโt enough: โ Re-ranking (learned or rule-based) โ Small-to-Big Retrieval: Retrieve sentences, synthesize larger windows. โ Recursive Retrieval (e.g., LlamaIndex) โ Multi-hop + agentic retrieval: When you need reasoning across documents. ๐ฆ๐๐ฒ๐ฝ ๐ฏ: ๐๐๐ฎ๐น๐๐ฎ๐๐ฒ ๐ผ๐ฟ ๐๐ถ๐ฒ ๐ง๐ฟ๐๐ถ๐ป๐ด There's no point iterating blindly. Do the following: โ End-to-End eval - Is the output good? Ground truths, synthetic evals, user feedback. โ Component-level eval - Does the retriever return the right chunks? Use ranking metrics like MRR, NDCG, success@k. ๐ฆ๐๐ฒ๐ฝ ๐ฐ: ๐๐ถ๐ป๐ฒ-๐๐๐ป๐ถ๐ป๐ด = ๐๐ฎ๐๐ ๐ฅ๐ฒ๐๐ผ๐ฟ๐ Donโt start here. Do this only when: โ Your domain is so specific general embeddings fail. โ Your LLM is too weak to synthesize even when context is correct. โ Youโve squeezed all juice from prompt + retrieval optimizations. Fine-tuning adds cost, latency, and infra complexity. Itโs powerful, but only when everything else is dialed in. ๐ก๐ผ๐๐ฒ: These notes are from a talk over a year old. And yet... most teams are still stuck in Step 0. That tells you something - The surface area of RAG is small. But building good RAG is still an unsolved craft. Letโs change that. Want to learn to implement advanced RAG systems yourself? The link is in the comments. ๐๐บ๐ฎ๐ด๐ฒ ๐ฐ๐ฟ๐ฒ๐ฑ๐ถ๐: LlamaIndex and Jerry Liu
Everyone likes to talk about models, prompts, and performance hacks. But no one teaches you how to ship. In ๐๐ฒ๐๐๐ผ๐ป ๐ฒ ๐ผ๐ณ ๐๐ต๐ฒ ๐ฃ๐ต๐ถ๐น๐ผ๐๐ด๐ฒ๐ป๐๐ ๐ฐ๐ผ๐๐ฟ๐๐ฒ, we fix that. We go from messy PoC to clean architecture. Hereโs what youโll learn: โ How to organize your Python project like a professional engineer โ Why the โappโ folder mindset saves you months of debugging later โ How to use Docker, .env configs, and modular code. โ Why reproducibility and portability matter as much as inference speed โ The real difference between hacking an agent and engineering one The goal was to teach you how to build a real system that's durable. Huge thanks to Miguel Otero Pedrido for co-creating this lesson with me. (His engineering brain pushed this to the next level) If you're stuck in notebook purgatory and want to break out, this lessonโs for you. Lesson 6 is now live! (Link in the comments)
Content Inspiration, AI, scheduling, automation, analytics, CRM.
Get all of that and more in Taplio.
Try Taplio for free
Amelia Sordell ๐ฅ
@ameliasordell
228k
Followers
Vaibhav Sisinty โ๏ธ
@vaibhavsisinty
451k
Followers
Sabeeka Ashraf
@sabeekaashraf
20k
Followers
Matt Gray
@mattgray1
1m
Followers
Daniel Murray
@daniel-murray-marketing
150k
Followers
Shlomo Genchin
@shlomogenchin
49k
Followers
Sam G. Winsbury
@sam-g-winsbury
49k
Followers
Ash Rathod
@ashrathod
73k
Followers
Richard Moore
@richardjamesmoore
105k
Followers
Izzy Prior
@izzyprior
82k
Followers
Andy Mewborn
@amewborn
215k
Followers
Wes Kao
@weskao
107k
Followers
Justin Welsh
@justinwelsh
1m
Followers
Sahil Bloom
@sahilbloom
1m
Followers
Luke Matthews
@lukematthws
188k
Followers
Tibo Louis-Lucas
@thibaultll
6k
Followers