Don’t Believe the Hype: AI Capabilities May Not Create Real-World Value
Microsoft Chief Technology Officer Kevin Scott gives remarks on the eve of the company's 2025 Build conference. Credit: Ken Yeung

Kevin Scott always has something to say on the eve of Microsoft Build. The company’s chief technology officer shares his read on where technology is headed, typically in step with whatever the company is about to announce. This year, I joined executives, developers, and my fellow journalists in a downtown San Francisco bar as Scott laid out five observations about AI and the gap between what people expect from it and what it can actually deliver. His message: there’s a lot of work ahead if we want AI to live up to its hype.

Capability Runs Ahead of Deployment

The first observation centers on the concept of capability overhang: the idea that AI models are more capable than the tasks for which we use them in the real world. “Just because an AI is highly capable of something does not necessarily mean that quick deployment will follow,” Scott said. In other words, when it comes to using AI, we’re leaving a lot more potential on the table.

By his read, agentic coding is the area where AI’s capabilities are “furthest ahead.” It’s letting people write more code, making strong programmers more effective, and giving people who have never written a line the means to build something for the first time. But even where AI is at its most capable, deployment can still lag.

“It’s not necessarily like ‘Field of Dreams’—build it, and they will come,” Scott said. “In some places, yes, and in some places, no, and so we just shouldn’t have uniform faith that as the AI model capabilities improve, we’re going to get this crazy fast deployment everywhere.”

He admitted that it could be a bit of a controversial take, at least for those in Silicon Valley.

Progress Follows the Feedback Loop

Setting that debate aside, Scott moved to his second observation, though it’s really the engine underneath the first. Coding races ahead, he argued, because it has the tightest feedback loops of any domain. A closed feedback loop is a cycle in which the output of a process gets measured and fed back in to sharpen the next round, over and over.

With agentic coding, the model writes code. You can immediately check it with existing tools: does it compile, and is it well-formed? Models can also generate the tests themselves, so you can see whether the code actually does what was asked. Then, experts tune the models in post-training to prefer better outputs. And finally, the developers using the product generate even more feedback.

“The capability of the models is going to improve fastest where these feedback loops exist,” Scott said. “In some cases, the feedback loops, to the extent that they exist at all or can exist, are not going to necessarily be as fast as some of the ones that we have with things like software development.”

Take particle physics, for example. The models can already propose new experiments, Scott contended, but that’s where the loop stalls. Running an experiment requires expert labor and access to expensive, oversubscribed equipment, and the only way the results reach the model is through published research. There’s no fast, automatic check the way there is with code. And this isn’t a matter of waiting for the next, bigger model to close the gap. Absent a breakthrough, like simulators accurate enough to the physics of the world to generate usable training data, some fields just won’t move at coding’s pace. “You shouldn’t expect some things to go terribly fast,” Scott said.

Fast Software, Slow Organizations

His third observation is one anyone who works in the enterprise will recognize: the software can now move faster than the organization around it. “Just because we can make software go fast doesn’t mean we can make organizations go fast,” Scott said.

Sure, plenty of workers are excited about these tools and eager to start building. But most real work runs into constraints that a quick demo never does. One reason is that the work might sit in a regulated industry like finance or healthcare, which limits what can be shipped. Internal systems sit behind walls that AI agents can’t reach, locked inside infrastructure never built for them. It could also be a last-mile problem, where a tool makes one task “near infinitely faster,” in Scott’s words, only for the next bottleneck to surface right behind it. A lot of the work ahead, he said, looks less like building and more like plumbing.

He framed it as both bad news and good news for developers. The plumbing is tedious, but it’s also where the work is. There’s “a lot of opportunity out in the world right now,” he said, in making systems more accessible to agents.

Resistance to change is another reason organizations move slowly. People can feel threatened by a new process, one that upends how they’ve worked for years. As Scott pointed out, it takes “a lot of proof” that the new way is better before people will change, and he didn’t fault them for it. “That is a perfectly reasonable and valid point of view to come from,” he said. There’s also a perception problem. When capability moves as fast as it has, fast enough that the agentic coding he’d been describing wasn’t possible just months ago, people struggle to register how quickly the ground is shifting beneath them.

Activity Isn’t Value

Another observation lands more like a warning: just because you can use AI to build something doesn’t mean what you’ve built is valuable to anyone else. It’s a trap businesses of all sizes fall into, and startup founders know it best. Plenty begin as a fix for someone’s own problem, only to learn that solving it for yourself isn’t the same as solving it for a market.

Scott shared an example of an app he built during a recent flight back from Japan. It was a “frivolous” meme chat app that he could use to “irritate” his two teenage kids. Yet despite the 65 pull requests he churned through building it, Scott realized that it was “valueless,” good for nothing beyond his own enjoyment. As Scott told the room: “We now have this new tool, and we can just do a lot more with it—we can have a lot of output, build more complex things. That doesn’t necessarily mean that the things that we’re building are super valuable, that they’re going to land in users’ hands, solve problems that they have, [or] increase the top line on the businesses that we’re running.”

His advice: Developers need to pay close attention to how value is measured and to the feedback they’re receiving.

Autonomy Doesn’t Earn Trust

Scott’s last observation takes on the word the industry has fallen in love with: autonomy. Just because you can build a system that handles a task end to end, he argued, doesn’t mean you can trust it to.

“You have to build systems in a way that are doing complex things, where people can trust that the systems are doing them correctly and in a way that’s aligned with their interests and values,” he said. “That’s a new way of thinking about software.”

Scott pointed out that the definition of what developers do has shifted. The old job was to build software that works, a clean user experience, and a product that passes its test suite with good coverage. That’s no longer enough. The new bar, he said, is software people can trust to do complex things correctly and in line with their interests. “What does it mean to build trustworthy software?” he asked. The answer carries real weight, because “until you get to real trustworthiness, no one’s going to give work to these software systems to do fully autonomously on their behalf.”

No Silver Bullet

None of this gets solved by waiting for the next model. “There is no silver bullet,” Scott said, and the problems he’d laid out all evening don’t disappear simply because AI gets more powerful. Solving them requires a blend of technical, societal, and organizational work, as well as addressing legacy systems and plumbing issues.

He warned that in some cases, there’s going to be “more work than we’ve ever seen before,” requiring developers to “engage in ways that are even more intense than what we see right now.”

The capability of AI is real, and so is the promise. But Scott’s point, the night before Microsoft spent its keynote selling what AI can do, was that none of it pays off on its own. The hype assumes scale does the work. The reality is that we still have to.

Disclosure: I attended Microsoft Build as a guest of the company, with my travel and expenses paid for. However, what I write reflects my own reporting and analysis. No one reviewed or approved this piece before publication.