
DN #12: Building an autonomous engineering team (w/ Seth Gammon)
With Seth Gammon · hosted by Dr. Niklas
"Google says there is a 70% error rate the moment you use more than one AI agent. My measured error rate is down to 3%."
In this episode of DN, Niklas sits down with design engineer and AI prototyper Seth Gammon, the creator of Citadel—an open-source agent orchestration harness built on top of Claude Code.
Seth explains how he accidentally created one of the most powerful AI coding frameworks in the world out of pure frustration. While trying to build a massive 668,000-line RPG codebase, his AI models constantly suffered from "amnesia," burning huge amounts of token spend while completely forgetting their previous tasks. To fix this, Seth built a system that spins up massive parallel fleets of autonomous agents, isolates them in their own Git work-trees to prevent merge conflicts, and forces them to leave persistent "shift notes" on the hard drive.
In this episode:
• The 70% Error Rate: Why spinning up multiple AI agents causes them to overwrite each other's code, and the exact infrastructure needed to fix it.
• Curing AI Amnesia: Why raw-prompting burns your token budget, and how Citadel completely solves context limits by externalizing memory to disk files.
• The 5 Levels of Claude Code: From level 1 (raw prompting) to level 5 (full orchestration), Seth explains exactly how you should be scaling your AI workflows.
• The 4-Tier Router: How to instantly cut token spend by automatically routing easy tasks to regex pattern-matchers instead of LLMs.
• Agent Swarms for Research: How Seth used a fleet of agents to research the notoriously difficult ARC-AGI machine learning challenge, dropping the solution from 24 moves to 14.
• The "XCOM" Business Model: Why the future of enterprise lies in persistent, specialized autonomous agents acting as your marketers, operators, and developers.
🎧 Full episode on all podcast platforms
💬 Are you using an agentic framework yet, or just raw prompting? Let us know in the comments!
🔔 Please like and subscribe! Every subscriber helps our channel grow.
#DN #ClaudeCode #AIAgents #SoftwareEngineering #Citadel #Anthropic #MachineLearning #OpenSource #SoloDeveloper #Coding
Timestamps:
0:00 Intro: Niklas meets AI prototyper Seth Gammon
0:56 The 668,000-line D&D codebase that forced Seth to create Citadel
2:36 Why Claude continuously gets "amnesia" and burns massive token spend
5:41 Google's whitepaper: The 70% error rate when running multiple AI agents
8:51 Solving AI amnesia by forcing agents to write "shift notes"
11:13 How to quickly install and set up Citadel for your own local projects
12:22 The 4-Tier AI Routing System to eliminate
Thought for less than a second
wasted prompt tokens
16:48 Can you actually build a full project just chatting on Telegram?
21:53 The secret to Parallel Agents: Creating isolated Git work-trees
26:01 Is Citadel replacing Claude? (Why it's strictly an operational harness)
32:47 The 5 Levels of Using Claude Code (Raw prompting, MD routers, Skills, Hooks, Orchestration)
38:08 Spinning up AI Swarms to research and beat the ARC-AGI math challenge
42:16 Abstracting the framework to autonomously run marketing and business operations
47:33 Going viral on Reddit, launching Citadel Pro, and talking to VCs
Transcript68 turns
Niklas:Hi and huge welcome to you, my lovely listeners. So glad you're here. Today you are joining me for a chat with Thes. Thes is a design engineer and an AI prototyper based in Pepere, Massachusetts and the creator of Citadel, an open source agent orchestration harness for cloud code that turns stateless AI chats into persistent parallel and production grade autonomous engineering workflows. Thes, so lovely to have you. So did you ever try to build anything to also automate this part?
Seth Gammon:Yes. ⁓ So everything has levels to it, right? And every level that I built was to solve that bottleneck and every bottleneck that I experienced was usually me, right? I have to now do something manually. I have to transfer context. I have to re-explain. And now at the top level, I'm the one that's deploying Citadel, right? And so it's like, I have to now say, you know, slash do, which is my orchestration command. You can put any prompt slash do calls a skill. Nicholas, I really appreciate the invite.
Niklas:I think we have all been playing around with Claude Cote a lot, right? And my big point also lately is how can I actually run more of it? And also how can I interact in different ways with it, not just to the terminus? So I was really curious, why did you start Citadel in the first place?
Seth Gammon:orchestrates it to whatever is needed with Incitadel. So you don't have to worry about all the technical details. But I still have to do that manually. So I'm still deploying. So the next step above that is creating what I called a divine alignment document. This is a living document. And it's based off the concept of divine alignment, where it's like mind, body, and spirit. And the idea is less about religion and more about being fully Great. Yeah. So ⁓ Citadel was a very interesting project for me because I didn't start out by trying to make it right. It wasn't like I was sitting here. I'm like, hey, you know, it would be great. Let me build out a cloud code harness that solves for all these issues. Let me open source it and, you know, let people use it on their own projects. That wasn't the goal. That wasn't the plan. What I was doing is I was building this massive world building project that I've been working on for ⁓ conceptually about 10 years. All right. aligned in your mission, your project, in your concept, whatever you're trying to ultimately achieve. And so I have this divine alignment document. And now what I can do is I can actually automate deployments of Citadel against that, where it will now pull up and create campaigns. It'll spin up fleets of agents. And each time it's trying to bend the entire project towards that divine alignment. So that's kind of what I've been working on lately about trying to automate that next bottleneck that I'm experiencing. I play a lot of Dungeons and Dragons, TTRPGs and stuff like that. And so that was my passion. That was my thing. And my code base grew to over 668,000 lines of code, right? Which I'm not trying to say that means immense quality, but I'm saying that the size of it is huge, right? And agents fail at kind of doing the persistence thing over a large context, right? Keeping up with where are we at? What are we doing? What were we last working on? And so
Niklas:Yeah, that's really interesting. then do your agents still experience amnesia or something? How do you, how does it actually work? How do you cope with this?
Seth Gammon:Yes. Good. Yes. Great question. That's one of the biggest issues is it's literally amnesia, right? The agent goes in the moment that you call it. It's like what we're working on. And then if it pulls up the right memories, great. So the way that I solved it was that I externalized everything. All right. It's no longer about keeping all the information in the chat window or in the terminal or in memory. Even it's about writing it to disk, right? Having files that Citadel was completely born out of my own frustrations and my own frictions. And, you know, I have pretty decent systems thinking skills, right? If you look at Citadel and that's really what happened. It was just, I kept hitting these friction points. It kept failing. And every time that it failed, I didn't sit there and say, ⁓ Claude sucks. can't do this. You know, I need to stop doing this. I thought there's clearly something wrong here and I think I can build a solution for it. are within your agent's workflow, within their planning folders, that they can write all of the information that they're doing too. That's decisions they made, that's files they created, right? So it's really about finding that through line and documenting it. Because then what I do is every single agent that spins up reads that. It's kind of like if you're a shift worker, right? And you just left your shift, you left some notes down on the paper of what happened for the next person that comes in, right? Like that's the simplified version. So that's how Citadel really started and eventually it got finalized and I threw it out there and got a really good response back.
Niklas:Yeah, it's interesting. So what was your main pain point when you started? So what was the point that annoyed you most?
Seth Gammon:So every single pain point is basically the same, right? It's where do I hit friction? That is a bottleneck, right? Where everything slows down to a halt because of this one thing. And so every time that you see that or that I saw that was any time that I had to slow down and do something manually. Right. And so the first thing is, you know, I think of it in levels and the basis level is raw prompting, right? You send a message into the chat bot, it responds. You send a message into the IDE, it responds. That's it.
Niklas:Yeah, and this is really, really interesting. And when I think about it, when should I use it? Like, when do I move on from my Cloud Code installation to Citadel?
Seth Gammon:So this is, I wanna say this first. I wanna caveat first that I'm about to tell you the answer, but I don't want anyone to think that I'm trying to say that Citadel is the only answer. You have to do this. But what I will say is when you stop is the moment that you feel any friction in raw prompting. The moment that you putting something into the chat window, right, is not working, which is pretty quickly, right? It's anything past a basic project. Every single time that you do that, it doesn't remember what you did in the last chat. It doesn't remember what you did before that. It doesn't really have the whole code base unless it's going to map it all out. So what I was seeing initially with raw prompting was massive token spend, because every time you prompted, it crawls the entire code base. Right. And persistence issues where I just told you three times, don't do this, do that. And in the new chat, of course, even though it's in memory, it doesn't pull it properly. Something about it semantically doesn't pull up. That's when you want to use something like this. And the reason why is not because you're going to spin up fleets of agents when you're making a to-do list app or a cooking website or a blog or whatever you're working on. It could be big, it could be small. But the reason is because having orchestration immediately cuts down token spend. It makes you able to scale so you don't run into the issues and then have to solve them later. So using it right from the beginning is the way to go. So the first friction was really, why is it failing on raw prompting? That was where I started.
Niklas:Yeah, and I've made the experience and it's very interesting because these days, Claude Cote tells me to start a new session. And in the past, I never wanted to do that because I felt it lacks the context that I had in the old. kind of normally, I would want the chat to stay active all the time because I felt the results were a lot better. What is your experience there? And do you have an idea why they changed it at Claude Cote?
Seth Gammon:But I don't want that to be misconstrued as you need fleets of agents to work on whatever you're working on. And what Citadel ultimately does is it will orchestrate it for you. It'll route it for you so that you are only calling what you need for whatever the task is. Yes. Yes.
Niklas:And how would I set up Citadel? Would it run on my normal Mac? Would I set up it on a server in the cloud? What do I do?
Seth Gammon:Yes. Those are great observations. And honestly, I think that you were doing the right thing because you were keeping that context. You didn't want to lose it. Right. And you could clearly see the qualitative difference, right? You're doing it over here. It's great. You put it into a new session. You have to now re-explain yourself from top to bottom inside and out just to get it to do the same thing that it was just doing. Right. So I completely agree. The reason why I see that they changed it is because it is heavy on the token usage. I don't know if they changed how they count cash tokens. I don't know. Yeah, it's really easy to set up. have a very concise readme on the GitHub, but basically you just have to clone it and then you have to point the plugin marketplace from Cloud to it. It's one command. And then after that, so two commands, sorry, two commands. You clone it, you point the plugin directory to it, and then all you do is splash do setup. I built it so that it will unpack for you. It'll orient to your project. It'll create the files that are needed. And most importantly, it'll ask you questions and walk you through things. if the processes that are really hidden underneath are calling more instances, more agents, more whatever. So it's just like token spend is going up. But whatever it was, it was clear that the reason why they changed it was that the token count was exploding. And so they tell you to start a new one because two things. One, all of those cash tokens still go through. And two, when you are sending a message in new, or sorry, when you're sending a message in the same session, over time it loses that context. Eventually you're gonna turn over where, Because that's the biggest thing that I have found is the issue is transparency and guidance. Not everyone needs their hand ultimately held, but a lot of people do, especially at different levels. What I need my hand held at level three, someone else needs their hand held at level one. So it's all about adapting that. But the setup is pretty easy. Two commands to clone it, point it, and then a slash do setup, and you're good to go. Let's say it's a hundred thousand tokens. Let's say it's a million, right? That's the most recent update they did in context when they had a million tokens. Eventually you're going over that. And that's if even with a million tokens, can it find the needle in the haystack, right? The thing that it needs to find to keep doing what it's doing. So that's kind of what I saw in terms of why they changed it. It's both token spend and a context issue of memory.
Niklas:You have a four-tier routing system, I think, right? Can you walk me through it?
Seth Gammon:Yes. Yes, I can. the four tiers that I have, just to make sure that I've got it. Okay. So the four tiers that I do is I have a instant, right? That's the first level. So when you say like, you ask Claude code for, you know, a status report, right? That's a direct edit. That's where the model itself will just do the thing.
Niklas:And then it's parallelization that you looked at as well. think you have a lot of agents running on the large code base. So what's the challenge there?
Seth Gammon:Yes. And that is achieved through pattern matching. So that's through certain words, right? Semantics. The second one is a skill routing. So skill is through keyword matches where you're going to need to call upon a skill in Cloud Code. And a skill is really just a set of instructions or information that the ⁓ model uses every time it goes to do something. A very popular skill that people like is a front-end design, right? That's a free skill you can go and get on the plugin marketplace. So if you're like, hey, I need a new front-end. Massive challenge. So ⁓ Google output a white paper in December, late December. I can't remember the name, but it is easy to find. You can look this up. But they put out a white paper that was talking about ⁓ the statistics of error rates when you use more than one agent. And the moment that you use more than one agent, it goes upwards. I don't want to misquote. I think it's like in the 70 percentile of how often you're going to create some amount of error as two agents are working on the same thing. what Citadel will do is it'll orchestrate it through the second tier, right? Which is matching to a scale. I want to say that both of those tiers are completely free, right? They don't add to a token spend. They don't do anything extra except for just helping you orchestrate. The third and fourth, it's kind of a two-parter, those go through an LLM classifier. So that's where whatever you put in is a little bit more advanced than pattern matching and keywords or regex. And that's when you'll start to have a And so as I was talking about bottlenecks earlier in friction, eventually as I'm solving for raw prompting and then I'm solving for, you know, orchestration of how the model goes, and then I'm solving skills and hooks, which we can go back through to go through the levels, but just to answer your question, once we get to orchestration in parallel, the bottleneck was now, I have to open up multiple instances myself. I have to babysit each one of them. I have to make sure that the context is right, that they're not overlapping. They're not working on the same thing. And so parallelization. was something that you have to build infrastructure around to mitigate that error rate. So Google says, you know, 70 % error rate the moment you have more than one agent, my measured error rate for, and this isn't errors, these are merge conflicts when two agents go to merge was down to 3%, right? And so it was a mat and from there, it's a trivial fix of having an agent fix it. But that was the big unlock for paralyzations that the bottleneck was me. I'm, I still have to do everything. large language models spend say, you know, 500 tokens or less to basically figure out what is the best plan of attack to do this. And so every time you type slash do it goes through this routing system to classify which one is best to do.
Niklas:This is really interesting. And if I start out with a small project, should I still use Citadel or does it become more relevant when my code base grows?
Seth Gammon:That's a great question. And what I would ask in this scenario, because obviously, Nicholas, you have a lot of experience and a lot of education. Are you a beginner in this sense, if you're starting off a new project?
Niklas:In a sense that I have not set up Citadel. I say I'm very experienced with setting up new projects in Cloud Code that work extremely well with these LLM tools. Because for the last two years, would say everything I built had to be designed to fit into the context window of an LLM. ⁓
Seth Gammon:Yes.
Niklas:If I think about architecture these days, I would build something that is optimized for work with these tools.
Seth Gammon:Yes. So for you, right, if you're starting a new project, you would gain more from Citadel than say a newbie who is, you know, literally just learning to vibe code, right? Or maybe they know a bit, but they haven't really gotten into cloud code specifically, right? So if you, if it was you, you have so much domain expertise, you have so much past experience that something like Citadel would help you a lot because what it does every time is it explains itself. It gives you a little bit of a handoff at the end. It starts externalizing, it starts orchestrating. It just solves a lot of problems that you never have to think about as an engineer. But as a brand new person, right, who's just getting used to it, I wouldn't say if, especially if you just feel a little overwhelmed by all this, I wouldn't say jump into this. I wouldn't say jump into Citadel, figure out GitHub, figure out cloning and all that stuff, right? Because while it's easy, there is a learning curve. I had to learn Git within the last two years, right? Like I wasn't using it before. I was doing certain projects, but I wasn't using it. So if you're brand new, it might be a little much. But it certainly helps, especially when you start feeling that friction, when you outgrow that beginner level.
Niklas:Yeah. if I, so what I'm also interested is in how to interact with agents. Now I have set up something like Citadel. I've pointed it in the direction. I could also be using a web coding tool. Lately I've thought a lot about how do we actually want to use these tools? mean, Cloud Code is nice, but there have been a lot of, think really lately I've seen a lot of people just using Telegram, for example, just using like your normal chat app on your phone.
Seth Gammon:Yes. Yes.
Niklas:What are your thoughts on that? Have you tried that as well?
Seth Gammon:Yeah, so at first I kind of thought it was a little bit of a gimmick and I no longer feel that way. But I did it first. I felt like they're trying to create or trying to increase their exposure to people using it. Right. So it's like if we make it easier to use, people will use it more. Right. That's the idea. But then I started realizing how I can kind of check in remotely right from my phone. I can activate a remote session and I can check in on things that are running on my computer. And I kind of liked that. I started going outside. I started being in other places rather than just sitting at my computer. So I do think that those have their places, but I do think that they are ⁓ literally just a communication layer. I don't think that it really helps the people understand too much better. It's not as transparent. You're sending off a message. does it somewhere else. But if you're already past that stage, I mean, they're fantastic for just being able to communicate with. And then the more that you automate and the more that you build infrastructure for, the less I have to think about. And so now I can send messages that if I didn't have any of this would be terrible. Like, hey, do this. And it's like, I don't know how to do that. So there is a huge advantage to being able to do that remotely or being able to do that on a phone.
Niklas:And I'm not sure if you've tried any of these. I've never really used Cloud Code Web before. Now I tried that one as well. And I think for a single instance, if you have like a CI-CD pipeline you're deploying from GitHub, it's actually really efficient. I was also wondering, how does this compare to Citadel? If I want to set it up, think I'm one of these people who have this use case ⁓ that I want to sometimes just do. I send a message from my phone or from my iPad and not only from my computer. How would I do that actually? you have have looked or have you looked into that?
Seth Gammon:Yes, so this is a two-parter. The first part is that it's really easy. The second part is that it doesn't work. So the first part is that there's a plugin marketplace that Anthropic made. And Citadel is a part of that plugin marketplace. You should be able to type in slash plugins and look for it. Unfortunately, Cloud Code right now has an open bug they just haven't squashed yet that basically only shows the official plugins. So it's right there. It's built in and it's on there. but you can't access it through the web version right now until that bug is squashed. So I have looked into it and I've also looked into many other ways to try to deliver this kind of thing in a way that's more helpful than relying on devs in the terminal, right? Bringing it into their own repos. But yeah, so right now, unfortunately, there is an easy answer. It's not quite finished yet.
Niklas:And then ⁓ who would start with Citadel? it for, am I the ideal person? Is it a small team? Who would profit most if they chose?
Seth Gammon:Great question. Yeah. So when I first found out that Citadel was going to be something that actually mattered was from a Reddit post that I did. And over about nine days, that Reddit post got 500,000 views and it got a thousand upvotes and it got over 200 comments. And I'm not saying that as wow. I'm saying that as this is a lot of feedback, right? A lot of feedback back to it. And what everyone was saying is that it was, it was a part of my five level framework of like, this is where you are. This is how you level up. And then I gave Citadel at the bottom. Everyone was saying what level they were at. I'm at level one, I'm at level three. I agree with all of this and I didn't have words to put it to, right? Like things like that. And so the big thing is, that I don't think that it's gated behind a person of a certain experience level, but I do think it's gated behind a certain appetite, right? If you want to learn, if you want to do more than you know right now, if you want to optimize, if you want to find out where those edges are that you're just not even aware of yet. That's where something like Citadel really comes in because what it does is it does all the heavy lifting and the more advanced thinking for you. It doesn't stop you from doing it. I can tell it to do anything I want it to do as complex as I want it to do. I can spin up a fleet of agents to go research a topic and then build skills around it to then help me in my project. It's incredibly helpful in very advanced ways. But as a newbie, you don't know about all that. You don't know to do research necessarily. You don't know to... you know, have this type of framework and this type of organization. So that's kind of where I feel it, right? If it's overwhelming, this is something that you want to take slow, right? If everything I just said overwhelms you, you might want to take it slow. But if you're hungry for a better process, if you're hungry for a better workflow, this will either do one of two things. Cause one right now it's completely open source, completely free. And it will stay that way. Like this, this will stay that way. You'll be able to pull this into your repo and use it. So either you can use it or you can learn from it. And I promise there's a lot of value there if you're hungry for that kind of thing.
Niklas:And where does it differentiate? If I set up, for example, Cloud Code, I can also tell it to spin off several agents. So I could also tell it, whatever parallelizes task, send off eight agents. Or it will figure it out, actually. It will tell me how many agents can properly work on it and move on. What's the difference to Citadel?
Seth Gammon:Yes. Yes. Yes, great question. So if you go to Claude right now and you say, spin up a bunch of agents to work on this thing, it will do it. I've done it with sub agents and I've done it with teams of agents, which are two separate things, right? So I've done both and it works decently okay. The problem that you're going to run into is just error rates. And now I want to also say for anyone listening is that I'm pretty sure that eventually Anthropics is going to fix that, right? They're going to build in more orchestration layers because It's a huge friction point for them. I just got there first, right? First in terms of they haven't done it yet. I'm not saying I'm the first one to ever build a harness. I'm just saying I got to this space first. I'm sure and I'm certain that they're going to be building out orchestration to help that. But the reason why it's not working now is because it creates errors. The agents will stomp on each other, right? So agent one goes to grab a file, agent two writes it, and now they're both confused about what's going on. And now they're looping like that. So that's a very easy loop that they can fall into. And the more agents that you call, that rate goes exponentially up, right? So what's the difference in Citadel? What's the differentiator? My agents all work on individual Git trees, right? So they have their own work trees that they cannot touch each other. That's one. The other thing is that they will yoink a file they're working on into their own area. So two will not work on the same thing. So agent A, agent B, they both go to the same file. Agent A moves it. Agent B goes, huh. there was a file here. Well, I guess I'll do the next thing. And then it goes and does the next thing. So basically the easy way to think about it is that you want to isolate resources, whatever that resource is. For me, it might be a coded file. For someone else, it might be something at inference or it might be data or might be input or output or whatever it is. But you just have to isolate that from the other agents. And right now that's not a default in cloud code. They do have work trees. And I know internally, I've tweeted at them. We've talked a bit. Internally, they are using Git work trees and I do see that being the direction they go in, but just right now it's not. So the big differentiators there.
Niklas:It's very interesting, right? Because if you have a lot of agents working, I would think that they at some point need to synchronize their knowledge. I mean, can only, like without synchronizing or without like common knowledge, common points where they are, you will only get so far and you can only paralyze so much. And I think that's okay if you want to make this trade off, but have you thought about how to solve it?
Seth Gammon:Sure. Yes, ⁓ I have. So right now, the way that Citadel works when you spin up those fleets of agents is that the original agent that does the task has that context. So it's kind of like you spin up a project manager. And so you never lose that contextual through line. The project manager doesn't do every single task in code, doesn't bloat its context. Instead, it lets all the other agents do the thing. They all come together, they merge, and then the agent that started it off will address merge conflicts, will make sure that it actually goes through a check where it says, hey, is this solution actually what was asked for in the beginning? Right. Because you can lose that through compaction. Your whole chat window compacts, then you lose details. So a lot of it is like it externalizes the information. It isolates the resource. And the one that kicked it off, even though say 10 other agents are working on it, the one that kicked it off is the one to close it. So there's a whole cleanup process, there's a whole merge conflict resolution process, and there's a check to make sure, is this even what I actually wanted? Is this what the user asked for? And my results have been really good on that. I haven't had the same issue that you're talking about, but I did before.
Niklas:Yeah. And ⁓ it's also, didn't really ask it, but I haven't fully understood it yet. Do they still use Claude code itself in the background or have you abstracted that in some way?
Seth Gammon:Yes. Good question. So, um, I love Claude, fantastic model, Claude code, fantastic. I learned so much using it. And I've been a chat GPT user since day one it dropped. knew about it day one. I used their previous model on a hugging face clone to like test it out before it was, you know, chat GPT. I love, I loved AI. loved all that stuff. But so for Claude code, I built it for specifically for Claude code because Claude had the harness. They had the capability. You could build out skills and hooks and infrastructure. They already had agents you could play with, right? So I started there. It uses Claude the model, but it uses Citadel the harness. And the cool thing about Citadel that I've abstracted recently is I've added a second runtime for Codex. And I'm also abstracting even above that. I've been playing around with local models on my computer. So eventually what Citadel is going to be is that observability and orchestration, right? The operations of what you're doing rather than the underlying model that powers it. So it's not competing with Claude the model. It's competing more with the infrastructure that makes the model so good.
Niklas:Yeah, that's, it's really interesting. And now this draws me maybe also back to yourself. How, how did you end up in the space? So I think you can originally come from design engineering and computer engineering. So what is the journey that led you to, ⁓ agent orchestra?
Seth Gammon:Yeah, so I mean, you know, it's really cool. ⁓ I've always been interested in technology. It's just a thing that it keeps me up. It lights me up. It spins me up. It's all that stuff. I was fortunate enough to have a tech center ⁓ connected to my high school. I took some engineering classes, did some bridge building competitions, didn't go into that kind of engineering. And then when I got to college, you know, I wanted to get into computer science. But they didn't feel like I had ⁓ the specific type of math that would make me successful. Even though I had a huge passion for it, even though I learned Python before that, for whatever reason, they just kind of were like, hey, the math classes aren't quite there. And so I went into computer technical engineering instead, and I did that for a bit. ⁓ Wasn't my passion, wasn't my thing. So moving forward, I kept doing all sorts of creative tasks. I can't help but work on something. And so I have a bunch of different projects, a bunch of different creative endeavors. I'm always pushing forward. And then AI dropped, right? 2022, November, right? Chat GPT comes out and I was like, oh my goodness, this is something that I've been waiting for. Right? Like I knew about the idea of AI. I knew about the concept of LLM. I didn't know it was coming. I didn't know it was going to like arrive. And I predicted that, but I'm saying like, I, I understood it right from the get. And so what I immediately started doing was using it for. everything I could think of. And then I would go online and I would start feeding it like things people are saying things. I would start talking about it. And then when I ran out of that, I went to the discord open AI, you know, ⁓ you know, just just discord. And I started looking at all the issues people are complaining about where they're saying like, it doesn't do this, it doesn't do this, it doesn't do this. And I started taking those into my own sessions, solving them, writing it up and sending it directly to them. Right. And so like, this was just because I just wanted to learn and I needed an excuse to do so. So take that. Now take it several years out of working with just any AI model I could for years, almost every day chatting with it, doing all that stuff. Eventually I learned enough and I wound up here. But again, I wasn't trying to wind up here. I was trying to solve my own problem, which was I was building this world building app. I was trying to encompass all domains from stories to worlds to games, all in this one area, very ambitious, very huge. And I needed that leverage point. I could not do what I did without it. So agents came out of a necessity. And I think that's why it makes what I'm synthesizing so valuable, because it's real. I'm not trying to posture and say, this is amazing, give me a million dollars, whatever. But I am saying, this is everything I've learned on real scalable projects in here. Please use it.
Niklas:Yeah, and it's amazing. So I also remember the time back when chat GPT dropped and I tried it. And I also remember like writing the first piece of software in the sense that I just typed chats and then you copied it out. No, there was no cursor. GitHub copilot, I think also didn't exist. It came a little later. They were first to the party, I think. And then...
Seth Gammon:Yeah. Yes.
Niklas:then the quality improved over time so much. Like with, I think...
Seth Gammon:Yes, immensely.
Niklas:Like GPT-4, I think, was the first one where you could really feel the jump. It suddenly was good. And since then, it has grown a little more subtle. But if you use like, Clot Code, for example, the gaps, like the quality in output has still grown, I would say, significantly over the last year. If I look at the quality of what these two you can now do versus what you could do a year ago. I would say the area rate has probably dropped by at least 50%. If not 80 or something like that. If you were like at 90 % before 80 to 90, then you're now at 90, 95, 98 something. So the quality is really, really good now. I think that's also, you miss that a bit because for a lot of applications, I'm not sure if you see the quality difference as much, but in coding you see it.
Seth Gammon:Yes. Yes. Yes. Yeah. Yes. Yeah. That's actually a really interesting point. Because you're absolutely right. Unless you are really in it, the differences mostly are very subtle. We did have some major exponential jumps that if you are in it, you saw and you felt it and you used it. And then now they are a lot more subtle. And unfortunately, they're a lot more complex too. So it's like the error rates for what you are doing or what I'm doing have dropped. dramatically, especially with the more infrastructure that's been built around it. That's the difference between talking to Claude AI the chat and having to copy a script out and talking to Claude code the IDE that has access to everything. There's a big infrastructural difference. That's one way that you'll now notice a quality difference if you try to do the same thing in both. But before, was like unless you were coding, it was really just, ⁓ you can complain because it overuses ⁓ dashes or emojis or. you know, whatever was the flavor of the month to complain about at the time. know, AI is making images, but they have six fingers, know, things like that. It's kind of hard to hold on to just how amazing those jumps can be.
Niklas:Yeah, and I think you have talked about five distinct levels of using Claude code, right? Maybe you can also walk ⁓ us through those. That's really interesting.
Seth Gammon:Yes. Yeah. So I really love this. ⁓ A lot of people have come up with their own frameworks and I don't want to replace that. If you have a framework of how you think about it and it helps you, you should keep using it and you should share it. You should talk to people. ⁓ Everyone has an idea of how these things work, but a lot of people haven't really settled on language. How do I describe it? How do I explain it? How do I think about it? And so the five levels that I came up with was just trying to solve that. Was saying like every level is where is it kind of an input method that you do. and it is a friction point that you'll feel. So level one is raw prompting, right? So you open up Claude, you describe what you want and it builds, right? This works surprisingly well for small tasks, but the ceiling is that your project will grow past a small fit and you'll start to have all these issues, right? It's not doing it at the quality you want. It's not doing it at the verbosity that you need. And so level one, raw prompting. Level two, Claude.md. ⁓ or agents.md if you use other models, right? This is a markdown file. This is at your project route. This is what the model reads. It's where you give it instructions and knowledge. But the thing is that at level two, a lot of people are at level two in terms of their using Claude.md. They have a file, they have information in it. But what they don't realize is that about over 200 lines, it forgets whatever it has in it. It just does. You will literally lose out on information and instructions, right? So you need to keep it down. So level two is the CloudMD, but what level two actually is, is orchestration, right? It's not orchestration in the big term. It's routing, right? It's telling you, hey, I have skills here. I have files here. It's not giving them instructions. It's giving them a way so the model will pull up only what's needed and you'll cut down on tokens because you're not going to be crawling the entire code base, right? So level one, raw prompting, level two, CloudMD, but use it as a router. Don't use it as... a literal instruction file. Level three, we get into skills, right? This is where it starts to get really fun because skills unlock ability, capability, output, quality, everything, right? Everything that the model doesn't necessarily do super well, you can get with skills, right? So markdown protocol files, it teaches the agent specialized procedures, right? So raw prompting, CloudMD, skills. Level four, hooks. This is where a lot of people I feel like ⁓ are, like do have a gap. So hooks being like a JavaScript file, right? Or using the post tool use or the various, you know, scripts to create these life cycles, right? So one that I have is I have a post tool use and I run it after every type check, after every edit. And so what this, was instead of like flooding the agent at the very end with everything that happened, you catch it while it goes. So what a hook is for, it's kind like a net, right? It's a trigger. It hooks the person the moment that this trigger happens. And so hooks are where you start to mitigate errors. It's where you start to guard against the model drifting, right? Hooks are very powerful. That's level four. And then the last level is orchestration. That's where you get to the parallel agents in isolated work trees. And that's where you get the persistent campaign files so that over a week, every agent you spin up never loses what the last one was doing or what you're currently working on. Right now, most projects, in my opinion, don't necessarily need orchestration, don't need parallel agents. Now you can use them and you can get to where you're going faster, but I'm just saying that the use case of actually needing parallel agents is really only needed in high level or very specific cases. Lots of information, lots of separated ⁓ work, lots of things like that. So just to cover five levels, raw prompting, the Claude MD or agents MD, but use it as a router, skills, hooks, and orchestration.
Niklas:Yeah, I would agree with the orchestration. So when I think about projects like smaller, even larger projects that I've worked on, if you look at how requirements come in, I think most of the time, surprisingly, you will be fast enough if you just take one requirement after each other and kind of work on it in ⁓ a linear way. If you just find a way to continuously input those requirements,
Seth Gammon:Yes. Yep.
Niklas:You will still be very fast. You can still be faster, I understand. But for a lot of use cases, implementation will take a minute, maybe something like that. ⁓ If it's not a really large feature or something. So you're probably good, right? So if you think about, OK, I want to translate a whole application into five new languages. Even a case like that.
Seth Gammon:Yes. Yes. Yes.
Niklas:Yeah, you can have a few agents. They will take different parts of the application, maybe if you don't have translation implemented at all. ⁓ Split it up, do the translation work. But how often will you do it? You will do it once in an application. And I think there are a lot of these use cases, right?
Seth Gammon:Yes. Yes. Yep. No, you bring up such a great point. Please, please, please go ahead.
Niklas:So where do I use the... So what are good examples for using orchestration? Maybe you already wanted to answer that, but that would be my follow-up question.
Seth Gammon:Yes. Yes, sorry. My brain was like, I know what to say. No, no. So yeah, or orchestration. Yeah. So I'm very transparent in the sense that, you know, while Citadel, you know, it talks about the coolest feature, which is parallel agents, right? It's being able to spin up 198 agents to go and do whatever you want to do, right? Like that's the hook. That's the cool thing. But I'm also transparent that in the same breath, I will tell everyone that orchestration is not for every project and it's not for every use case or the parallel agents, should say. Yes, it's cool that I can spin up a fleet of agents, but it's just not needed for most things. And especially if it's not needed, you could be introducing a potential merge conflict or something like that. Now, again, Citadel self solves that. But when you're talking about when I could just do it linearly and it takes a minute or I could spin up fleets of agents and it takes five or 10 or whatever, that time difference can matter. I did solve that. by having that slash do router. So every time you prompt, it will decide, Hey, does this benefit from a fleet or does it benefit from just me doing it? Do I need to call in skills or do I need to do something else? Right? So I try to take that so you don't have to think about it. So yes, orchestration parallel agents. Amazing. Now to answer your question, why do you use it? Right? Why would you use it? One is scale. Of course. Right? If you're working on a bunch of things, or if you're working on, let's say like you have 10 things you want to do and none of them are even conflicting each other. They're all different things. I spin up a terminal and I say, do this. And it goes off and it does that. And then I'm sitting there and I'm waiting and I'm waiting and I'm babysitting and I'm looking at what it says and I'm responding and whatever. Right? Like that's a bottleneck. So what these parallel agents do is that you're able to tackle any kind of, any, any kind of thing without having to babysit just one agent linearly, all the infrastructure is in place to do it. Now, the best use case that I've had for it, that's consistent. I would do this every time. is research. Anytime that I tell the model to research, it spins up about 10 agents, fleet of them, and it will pull in information from everything that I can find. Books, websites, blogs, videos, anything, Githubs, whatever, like anything that's public, anything that's whatever, it'll pull that in and it will synthesize it into a document that then every agent after uses, right, whenever you're developing. Now you learned this method, you learned this package, you learned this model, whatever it is. And one thing that I recently did is I was trying to see how well I could do the Arc AGI 3 challenge, right? Which is incredibly difficult challenge for anyone who doesn't know. It's trying to get computer models or machine learning to basically solve, you know, spatial issues, right? Game theory, all sorts of things. And then you can drop it into a general game and it's it figures it out. So I was using Citadel to research that and spinning up fleets of agents and getting all the information. And I developed a, you know, my own agent to play the first level. This is just the first level. I'm not trying to claim anything amazing, but The research that I did pulling in game theory, pulling in machine learning techniques, pulling in everything, I was able to build an agent that could solve the first level in 14 moves, which the number of moves that a human can do it is seven. So it's pretty decent. I started at 24. I then broke some things went up to 26, then went down to 14. And so having fleets of agents to pull in research for any topic you want, anything, it doesn't matter what it is. That's something that I find useful every time.
Niklas:I think this is also an interesting discussion point, right? So if we abstracted this and said, maybe you don't want to use coding agents. You want to build a fleet of agents that run a company or something. What would be the way to translate this? What is the step that is needed to build Citadel into something like that?
Seth Gammon:Yes. Yeah, that's a great question. you're trying to just make sure I'm asking. You're not asking how does Citadel become a company. You're asking how would Citadel ultimately run a company if they kind of abstracted to that. Is that correct? Perfect.
Niklas:Yeah, 100%. Like if I wanted to have the, yeah, if I, so more like general agents, right? You already said you have kind of a project manager. Now you have your programmer. Maybe you need something else like a marketer. So how could you do it? So what would be the way to translate this framework into like agents, different agents?
Seth Gammon:Yes. Yes. Yes. So this is actually one of the coolest things conceptually and maybe philosophically, right? And that is that any problem that you talk about, right? Like everything you just said is definable. I'm not saying that it's simple, but it is definable, right? I can make an agent that just does autopilot work. And then I could make an agent that, you know, manages those autopilots and also creates campaigns. And then I can create an agent above that, that manages that person, right? And has persistence and does whatever. So everything can be created for. And the way that I would do it for Citadel, and I have thought about this, is it's basically about orienting the infrastructure to the use case. And you can do that for anything. So now if it was the business, I would have it connected to say my social media, my marketing ad spend, right? I'd have that data come in, things like that. And the agents themselves don't care about what the task is. They just care about being routed to their use case. So right now with the agents already in place, you could spin up something that would do exactly what you're talking about. But the abstraction I've thought about above that is I've kind of thought about it like, I don't know if you've ever played this game, so it might not be a great reference, ⁓ but the game is called XCOM. And it's a top-down game where you control these soldiers in a sci-fi world and you bring them around, it's turn-based, and you got to do moves and whatever. The cool thing about that game is that every soldier that you have is persistent, they have one life, and they gain skills, they gain abilities, they gain things like that. And so you choose when to deploy them. Citadel is the same thing. Whereas eventually the abstraction becomes create an agent, right? Or forge a skill, things like that. Like you'll have tools to build that. And then you could build out an agent exactly the way you want it to. And guess what would build the agent? It would be Citadel, right? You would literally tell Citadel to build this agent that does this thing. And hopefully when it comes to that, there would be an actual UI interface you could play with rather than just focusing on the term. But yes, you can certainly abstract it out to that and the infrastructure supports it.
Niklas:Yeah, I think this is one of the larger topics that we are currently seeing, right? So I think a lot of thought is if we solve coding, we have already solved a lot because then we can just build all the other stuff. But the other step would be to not only automate the coding. I think it's a pretty small part at the end of the day of running a business, for example. But there is a lot of other stuff that needs to be done. And I'm really...
Seth Gammon:Yeah. Yeah. It is.
Niklas:interested in ⁓ if you can orchestrate a fleet of agents and it makes so much sense to say I want to have specialized agents and orchestrate them for different functions.
Seth Gammon:Yes. Yes, yes, the abstraction becomes routing to the specialized agents, right? So we can already route. We can already orchestrate and we can already choose based off of various information on what your prompt is or what your project is, and we can already orchestrate it to the right person. We can create the right documentation. We can solve the right errors. So all it becomes is pointing it to that and but but more importantly, it's not just pointing Citadel to it. It's pointing Citadel to build the infrastructure for it, right? because that's what you're going to need. You're going to need that specialized agent for that use case. You're going to need those specialized skills that matter to you. Your voice, for instance, if you have a specific brand, you need a voice profile document that it pulls in to kind of understand that. So that's how people should think. I see so many people on on on X complaining about, you know, vibe marketing isn't a thing, right? It's like, OK, yeah, I can spin up my code, but I can't I can't market it. I can't I can't whatever. I I totally see that gap. But at the same time, I really don't think it's that much of an issue. I just think right now it's manual. So if I was like a marketer, I am looking at my, you know, Facebook ad analytics, which I've used before. So I do know a lot about it, but I've used that. I'm looking at that. I pull all that data in to Claude and then I say, Hey, this is my response. This is my copy. This is whatever. What can I do better? Right? Like vibe marketing is kind of a thing, but it's manual. What people are looking for is just, want to open up a website. I want it to already have all my context, all my information, and I want it to do the thing. And then what you're talking about abstracted above that is I want to deploy a thing like Citadel to do it for me. And I can direct it. I can talk to it. can influence it, but it's going to handle a lot of that heavy lifting. And so it's kind of like the idea, like the future is here now. You can do all that now. The models are capable now. It's just an infrastructural gap that either you yourself can build for yourself, or I'm certain there's... Like the moment that I start and I decide, hey, I'm going to do that. I'm going to make that. I'm going to see 10 other people already made it. You know what I mean? I really think it's something that people are working on heavily.
Niklas:100%. And then for Citadel itself, what is next? So what are you planning about it? Did you already build it into a company? Are you planning to build it into a company? Will it stay just this open source project? What's next?
Seth Gammon:Yes. So the big moment was open sourcing it and posting it. Right. That was the big moment. I didn't, you know, literally days before I did this, like two days, three days, I was not planning this. It came together that quickly and I was not planning it. I was working on my project. I was trying to make something else that was going to become, you know, a SaaS and that kind of thing in a different space. And then I was talking to, you know, to Claude or to AI or to whatever, and I'm just sitting there and they're like, I had to do research. I was like, Hey, How far along am I compared to the marketplace, right? Compared to other people, compared to options. And it was like, I'm to be honest with you, all right? You're treating this project over here, this world building thing is the thing. This is the thing. These are all the areas you're ahead right now. And that window is temporary. At least it feels that way. And I feel that way too. I think there's a lot of things that Anthropic and other big models are just not going to solve. And I think all those issues are in operations rather than intelligence or even orchestration. I think they're going to solve that. But what Citadel does is it fills all the operation gaps. It does all the things that you don't need to think about. So the big moment was sharing that. Everybody came back and immediately, I got 400 stars on GitHub. got almost a thousand downloads in nine days, right? I have friends using it. One of my buddies built a GoPro app with it. They had a lot of the premium features of GoPro for free that he can use from his phone. Another friend of mine has his own trading card game site. called DeckPlanet and I was able to use it on that to fulfill five milestones that was kind of bugging him down for a bit. And then I've had so many people reach out to me directly on X and on Reddit asking me questions saying, hey, I'm already using this in production. It's great. You know, it's, I'm pretty sure it's cut down on token spend, know, like things like that. So I got an amazing response and now what is next? So I started abstracting the next version of Citadel. I'm going to keep the engine. open source. That's going to be open source. Anybody can use it. I want people to use it. Fork it over, use it in your own project, use it in production. It doesn't take anything away from me to share this. So that's first. Second, I am working towards a Citadel Pro, which is for people who don't want to set it up in the terminal, people who don't want to worry about all those extra abstractions. And that's going to also come with a lot of infrastructure that I just can't do in the terminal, right? Things like having UI, having ⁓ agents that feel a little bit more alive. agents that you can customize and build like an RPG. I don't want to make it into a video game, but I'm just saying these are elements that you'll feel to it. So I have done that. I have reached out to VCs and funding. A couple of people told me to apply to programs. I sent a pitch deck this morning to someone who asked. So the company is kind of coming together, and that's hopefully what's next. But if ultimately ⁓ Citadel is not the exact thing, There are so many things to be doing right now. There's so many gaps and the operations one is just the one that I see the biggest right now.
Niklas:I think this is really interesting. Thank you so much for being on the podcast. I'm really excited to see how Citadel will go. to you, my listener, see you next time.
Seth Gammon:Thank you, Nicholas, so much. I hope after it succeeds, I'll be back.
Listen & subscribe
More episodes
- #14April 28, 2026 · 40 min
DN #14: Vibe Editing, AI Video Agents & The 1-Person Billion Dollar Company (w/ Suhas M L)
- #13April 21, 2026 · 41 min
DN #13: Why 90% of Solo Founders Fail & Escaping the X Algorithm (w/ Ben Spak)
- #11April 7, 2026 · 46 min
DN #11: polsia - The AI Co-Founder That Autonomously Runs Your Startup (w/ Ben Cera)