Long-running agents drift. You can't read every transcript. We built background pipelines where Claude reviews agent output against identity documents and proposes state updates — a pattern that generalizes well beyond our hobby project.
Working memory in the prompt, episodic archives behind a tool call, permanent logs for background analysis. How we built memory that scales without blowing the context window.
Your CI pipeline was designed for a different threat model. Here's a framework for code quality in the agent era — organized around four control surfaces.
The single most transferable lesson from building a gaming AI wasn't architecture or cost optimization — it was giving the agent a character it could reason from when the rules ran out.
Give an AI agent a capability list and it plays optimally. Give it vanity points, insecurities, and institutional memory and it plays realistically. What we learned building a multi-agent geopolitical simulator.