AIE Code 2025 Wrapup

Leadership and engineering takeaways from AIE CODE 2025.

Dec 23, 2025

For the record, I didn’t attend AIE CODE 2025(NYC) in person. I did go through every-single-presentation in the two YouTube videos for the leadership track and the tech track. Even with speedup, this was a lot of high quality content to consume. As much as I wanted AI to read the transcripts and summarize key points, I found that the nuance was lost in translation. So I took notes manually and synthesized the key themes for this post using Sonnet 4.5 on Claude Code.

Few disclaimers before we start:

This is not a summary of every talk. Just the topics I found most interesting.
The opinionated takeaways are mine, not the speakers’.

With that, let’s dive in.

Between the leadership and engineering tracks, I found the leadership discussions more outcome driven and actionable. It’s clear that leaders are under pressure to justify the ROI of massive AI investments they’ve made in the last 12 months. As much as I’d love to get my hands on the cool tech coming out of big labs and startups alike, the AI adoption challenge is primarily organizational, not the technology. Many leaders are using the same (and sometimes “lagging”) technology to drive more innovation, just by changing how they work.

Three leadership takeaways emerged from the conference. Each reveals how organizations must transform to unlock AI’s compounding gains.

Process Rewiring

McKinsey’s data confirmed that most organizations are stuck in pilots. Yet spending continues to grow. Their research shows a 10x productivity difference between companies where 90% of engineers use AI versus those where 100% use AI - the last 10% adoption drives outsized returns.

Just making AI available isn’t enough. The organizations that are succeeding are rewiring their PDLC, team structures, and review processes. This is a human change, and it needs to start now. In my earlier post I highlighted this point as well.

Code review must transform. Every engineer is now a reviewer. Most engineering teams already have this expectation but in reality, only a handful of engineers end up owning the majority of code reviews. This needs to change. Teams also need to speed up review cycles with SLOs. Make it clear whose turn it is to take action - agent, engineer, or reviewer. Distribute the load across the team. Focus reviews on outcomes like SLO impact and dependencies, not code style. When agents iterate with humans fast, velocity compounds.

Metrics reveal what matters. Track velocity as your primary metric - features shipped per sprint. Use quality as your guardrail metric - production incidents, bugs reported in development, rollback rate. The goal is to improve velocity while maintaining quality. Track review and refactor rate; this should decrease as agents improve. Important point here is to start with one leading metric (velocity) and one guardrail (quality). Avoid drowning in metrics.

Process rewiring accelerates onboarding. Every.io demonstrated this at scale: new hires there are shipping code to production within a few weeks. This enables them to hire quickly and even explore contractors for short term engagement. When tacit knowledge is codified into context (CLAUDE.md, documentation, patterns), engineers don’t need months to ramp up. They inherit the team’s collective intelligence immediately.

Psychological safety predicts success. Google’s Project Aristotle found that psychological safety is the biggest indicator of high-performing teams. Frame AI adoption as performance amplification, not replacement. Engineers need to trust that AI adoption improves their work, doesn’t threaten it, and the company culture must reflect that.

The Clean Environment Multiplier

Stanford’s study revealed the core insight: clean engineering environments unlock AI productivity. Test coverage, type safety, modularity, documentation, templates, clean Agents.md - these determine whether agents drive 20% or 80% of your sprint.

Poor environments create compounding friction. Agents struggle without clear validation. Engineers spend 40% of their time context-switching to provide missing information. PRs pile up because no one trusts AI-generated code. Writing more code doesn’t equate to more features. Engineers just start spending time fixing AI mistakes and reviewing massive volume of low-quality output.

Clean environments compound gains. Agents complete tasks autonomously. Engineers focus on creative problem-solving. Review cycles accelerate because outcomes are measurable. Google Antigravity demonstrated their approach to making Agents collaborate effectively by providing them with clear validation, checkpoints, multi modal inputs and feedback loops. All built into the editor. I’m still a Claude Code fan but it’s an approach worth giving a shot.

The multiplier isn’t the AI. It’s the environment you give it to work in.

Capital One’s Max Kanat-Alexander asked the defining question: what will be valuable no matter what happens? Use industry standard tools, frameworks, and languages that models are trained on. Build with clear validation that fails deterministically. Write down the context and intentions. What’s good for humans is good for AI.

Invest in engineering hygiene now. Increase test coverage to 80%+ for critical paths - tests form the backbone of deterministic validation. Refactor for modularity with single responsibility per component. Enforce interfaces through API specs, Agents.md, and documentation. Implement strict type safety across services. Standardize your dev environment. These fundamentals don’t change with GPT-5 or Claude Opus 5.

Context Infrastructure

Stating the obvious here: Poor context = poor AI output. QODO stated it plainly - when engineers don’t trust AI code, they don’t trust the context, not the model. Teams must build frameworks to evaluate and assess quality to build confidence on AI generated code.

Every.io proved that context infrastructure investments pay off to build scale: 99% of their code is AI-written, single-engineer owned apps, parallel feature development. Their secret? Compounding engineering through CLAUDE.md and context improvements, converting tacit knowledge into prompts that improve with each feature. Engineers working on multiple features in parallel use multiple agent panes but are still productive. Code is cheap now, so you can prototype multiple ideas simultaneously. An important learning for me was the quote: AI has caused us to invent entirely new set of engineering primitives and processes.

Context infrastructure compounds while model capabilities commoditize. The harness - everything touching the LLM - is the new abstraction layer. Custom tools need model training. Prompts aren’t portable across models. Platform teams must translate user intent into model-specific instructions. These investments compound because they work regardless of which model wins.

Bloomberg’s rollout to 9,000 engineers revealed the work preference hierarchy: new features > architecture > code review > bug fixes > support. AI should amplify this preference, not fight it. Let agents handle uplift work while engineers solve creative problems. In my experience as well, engineers who hated writing tests are the ones writing bad tests with AI. The solution is give them better context and frameworks to write good tests, not have agents write tests for them.

HumanLayer’s research found that up to 40% context window usage keeps agents in the “smart zone.” Beyond that, they enter the “dumb zone.” This changes how we architect workflows. Use subagents for research and context management - they return succinct messages to the main agent. Research equals compression of truth. Planning equals compression of intent. My personal experience aligns with this pattern. I hit /clear once I’m at 10% context window remaining, forcing me to break tasks into smaller chunks that can be completed within the context window.

Research → Plan → Implement. All happen in the smart zone.

Stop building agents, start building skills. Anthropic emphasized this shift at the conference. Skills are a way to use context better, encapsulate learnings, and democratize knowledge across your team. Instead of building generic agents, create expertise through skills that progressively disclose capabilities. Skills become your institutional knowledge - they help onboard new engineers, compound your team’s productivity, and ensure consistent quality. Non-developers can build high-value skills when context is properly structured.

Build the context infrastructure. Implement the CLAUDE.md pattern for team knowledge compounding - update it frequently as your team learns. Build MCP servers for service dependencies and architecture docs. Create deterministic validation that fails consistently. Invest in tools that bring context to agents, not agents that need constant human input.

Arize achieved 5-15% improvements in bug fixes on a standard OSS repo just by changing system prompts. Using evals to improve prompts and context creates a feedback loop that improves agent performance over time. Build these feedback loops for your system prompts.

The Underlying Bet

2026 is about execution, not experimentation. Organizations that treat AI as infrastructure will compound their gains while others restart projects.

The bet isn’t on which model wins. It’s on the fundamentals that work regardless of model like standard tools LLMs were trained on, clear validation that fails deterministically, written context that agents can reference and fast iteration between agents and engineers.

Invest in the fundamentals now. They’re the multiplier for everything else.

The conference data was stark: companies with clean engineering environments see 5-10x more AI productivity gains than those without. Token spend is a weak predictor of AI adoption. Engineering quality is the strong one. Good design fights bad AI output.

What This Means For You

If you’re planning 2026 investments, the priority order is clear:

Process Rewiring - Transform code reviews, define metrics (velocity + quality), build psychological safety
Clean Environment Multiplier - Fix engineering hygiene (test coverage, type safety, modularity, deterministic validation)
Context Infrastructure - Build MCP servers, implement CLAUDE.md, create feedback loops for system prompts

The organizations that compound these investments will see 5-10x gains. The ones that skip fundamentals and jump to AI tools will restart in Q3.

Working through AI adoption challenges in your engineering organization? I’m happy to brainstorm strategies or share what’s worked in practice. The fundamentals matter more than the models - let’s compare notes.

devashish.me

Discussion about this post

Ready for more?