This is The Stepback, a weekly newsletter breaking down one essential story from the tech world. For more on all things AI, follow Hayden Field. The Stepback arrives in our subscribers’ inboxes at 8AM ET. Opt in for The Stepback here.
It all started with J.A.R.V.I.S. Yes, that J.A.R.V.I.S. The one from the Marvel movies.
Well, maybe it didn’t start with Iron Man’s AI assistant, but the fictional system definitely helped the concept of an AI agent along. Whenever I’ve interviewed AI industry folks about agentic AI, they often point to J.A.R.V.I.S. as an example of the ideal AI tool in many ways — one that knows what you need done before you even ask, can analyze and find insights in large swaths of data, and can offer strategic advice or run point on certain aspects of your business. People sometimes disagree on the exact definition of an AI agent, but at its core, it’s a step beyond chatbots in that it’s a system that can perform multistep, complex tasks on your behalf without constantly needing back-and-forth communication with you. It essentially makes its own to-do list of subtasks it needs to complete in order to get to your preferred end goal. That fantasy is closer to being a reality in many ways, but when it comes to actual usefulness for the everyday user, there are a lot of things that don’t work — and maybe will never work.
The term “AI agent” has been around for a long time, but it especially started trending in the tech industry in 2023. That was the year of the concept of AI agents; the term was on everyone’s lips as people tried to suss out the idea and how to make it a reality, but you didn’t see many successful use cases. The next year, 2024, was the year of deployment — people were really putting the code out into the field and seeing what it could do. (The answer, at the time, was… not much. And filled with a bunch of error messages.)
I can pinpoint the hype around AI agents becoming widespread to one specific announcement: In February 2024, Klarna, a fintech company, said that after one month, its AI assistant (powered by OpenAI’s tech) had successfully done the work of 700 full-time customer service agents and automated two-thirds of the company’s customer service chats. For months, those statistics came up in almost every AI industry conversation I had.
The hype never died down, and in the following months, every Big Tech CEO seemed to harp on the term in every earnings call. Executives at Amazon, Meta, Google, Microsoft, and a whole host of other companies began to talk about their commitment to building useful and successful AI agents — and tried to put their money where their mouths are to make it happen.
The vision was that one day, an AI agent could do everything from book your travel to generate visuals for your business presentations. The ideal tool could even, say, find a good time and place to hang out with a bunch of your friends that works with all of your calendars, food preferences, and dietary restrictions — and then book the dinner reservation and create a calendar event for everyone.
Now let’s talk about the “AI coding” of it all: For years, AI coding has been carrying the agentic AI industry. If you asked anyone about real-life, successful, not-annoying use cases for AI agents happening right now and not conceptually in a not-too-distant future, they’d point to AI coding — and that was pretty much the only concrete thing they could point to. Many engineers use AI agents for coding, and they’re seen as objectively pretty good. Good enough, in fact, that at Microsoft and Google, up to 30 percent of the code is now being written by AI agents. And for startups like OpenAI and Anthropic, which burn through cash at high rates, one of their biggest revenue generators is AI coding tools for enterprise clients.
So until recently, AI coding has been the main real-life use case of AI agents, but obviously, that’s not pandering to the everyday consumer. The vision, remember, was always a jack-of-all-trades sort of AI agent for the “everyman.” And we’re not quite there yet — but in 2025, we’ve gotten closer than we’ve ever been before.
Last October, Anthropic kicked things off by introducing “Computer Use,” a tool that allowed Claude to use a computer like a human might — browsing, searching, accessing different platforms, and completing complex tasks on a user’s behalf. The general consensus was that the tool was a step forward for technology, but reviews said that in practice, it left a lot to be desired. Fast-forward to January 2025, and OpenAI released Operator, its version of the same thing, and billed it as a tool for filling out forms, ordering groceries, booking travel, and creating memes. Once again, in practice, many users agreed that the tool was buggy, slow, and not always efficient. But again, it was a significant step. The next month, OpenAI released Deep Research, an agentic AI tool that could compile long research reports on any topic for a user, and that spun things forward, too. Some people said the research reports were more impressive in length than content, but others were seriously impressed. And then in July, OpenAI combined Deep Research and Operator into one AI agent product: ChatGPT Agent. Was it better than most consumer-facing agentic AI tools that came before? Absolutely. Was it still tough to make work successfully in practice? Absolutely.
So there’s a long way to go to reach that vision of an ideal AI agent, but at the same time, we’re technically closer than we’ve ever been before. That’s why tech companies are putting more and more money into agentic AI, by way of investing in additional compute, research and development, or talent. Google recently hired Windsurf’s CEO, cofounder, and some R&D team members, specifically to help Google push its AI agent projects forward. And companies like Anthropic and OpenAI are racing each other up the ladder, rung by rung, to introduce incremental features to put these agents in the hands of consumers. (Anthropic, for instance, just announced a Chrome extension for Claude that allows it to work in your browser.)
So really, what happens next is that we’ll see AI coding continue to improve (and, unfortunately, potentially replace the jobs of many entry-level software engineers). We’ll also see the consumer-facing agent products improve, likely slowly but surely. And we’ll see agents used increasingly for enterprise and government applications, especially since Anthropic, OpenAI, and xAI have all debuted government-specific AI platforms in recent months.
Overall, expect to see more false starts, starts and stops, and mergers and acquisitions as the AI agent competition picks up (and the hype bubble continues to balloon). One question we’ll all have to ask ourselves as the months go on: What do we actually want a conceptual “AI agent” to be able to do for us? Do we want them to replace just the logistics or also the more personal, human aspects of life (i.e., helping write a wedding toast or a note for a flower delivery)? And how good are they at helping with the logistics vs. the personal stuff? (Answer for that last one: not very good at the moment.)
- Besides the astronomical environmental cost of AI — especially for large models, which are the ones powering AI agent efforts — there’s an elephant in the room. And that’s the idea that “smarter AI that can do anything for you” isn’t always good, especially when people want to use it to do… bad things. Things like creating chemical, biological, radiological, and nuclear (CBRN) weapons. Top AI companies say they’re increasingly worried about the risks of that. (Of course, they’re not worried enough to stop building.)
- Let’s talk about the regulation of it all. A lot of people have fears about the implications of AI, but many aren’t fully aware of the potential dangers posed by uber-helpful, aiming-to-please AI agents in the hands of bad actors, both stateside and abroad (think: “vibe-hacking,” romance scams, and more). AI companies say they’re ahead of the risk with the voluntary safeguards they’ve implemented. But many others say this may be a case for an external gut-check.
1 Comment
Source link
#agents #science #fiction #ready #primetime


![Anthropic’s Mythos AI Reportedly Hacked the NSA’s Most Sensitive Systems ‘in Hours’
When Anthropic first disclosed Mythos in April, it sent an anxious shockwave through much of the cybersecurity sector. The new AI model was allegedly so ruthlessly effective at finding and exploiting security vulnerabilities in existing software that the company said it was holding off on a public release and would only grant access to a small group of early testers, including the U.S. National Security Agency (NSA). Another wave of fear reverberated this week after the NSA reportedly discovered multiple vulnerabilities within its own cybersecurity systems during its tests with Mythos. If that agency—which supposedly boasts the most impenetrable cyberdefenses in the world—can be hacked by Mythos, what hope does the rest of the world’s cybersecurity infrastructure have? This latest round of panic began with what seems to have been something of a game of telephone: Someone says one thing, which gets repeated by another, and another after that, and along that chain of communication, the original statement is distorted. Last week, The Economist reported that during a June 11 hearing before the Senate Committee on Banking, Housing, and Urban Affairs, Democratic Senator Mark Warner of Virginia said that Mythos had broken into “almost all of [the NSA’s] classified systems, not in weeks, but in hours.” Warner said he’d received that information from the head of the NSA himself, General Joshua Rudd, who also leads the Pentagon’s Cyber Command division. On Monday, a coalition of intelligence agencies—including the NSA and its counterparts in Canada, the U.K., Australia, and New Zealand— issued an unusually public warning that the risk that AI now poses for cybersecurity warrants a “whole-of-society response.”
The Economist’s report was seen by some as evidence that the worst fears about Mythos were true, a reaction that was undoubtedly fueled also by the aura of power and mystery that has coalesced around the model in recent months. That aura has arguably been a boon for Anthropic, which recently usurped OpenAI as the most valuable startup in the world and is preparing for what’s expected to be a historic IPO.
But it’s also been a contributing factor in its latest skirmish with the Trump administration, which ordered the company earlier this month to restrict access for all foreign nationals to Fable 5, a “Mythos-class” model that had recently been made publicly available and which was built with safeguards that to some users were annoyingly stringent. Citing national security concerns, the administration invoked an obscure piece of export control legislation, a move that, according to some legal experts, is spurious. Many cybersecurity experts, meanwhile, argued that the ban would hamstring U.S. cybersecurity defenses and give adversaries like China the upper hand. That argument was seemingly vindicated by a Tuesday report from the New York Times which said that Trump’s ban—which also targeted another model called Mythos 5, which had only been made available to a small group of organizations—had put the kibosh on the NSA’s internal tests with Mythos, and that the administration was now working with Anthropic to reinstate the agency’s access for limited purposes related to national security. The NSA did not immediately respond to Gizmodo’s request for comment.
That same report from the Times also clarified that the NSA’s internal tests with Mythos were less apocalyptic than online rumors might suggest. According to federal officials cited in the report, the tests were carried out in a digital environment so robustly controlled that it’s very unlikely any hacker or foreign intelligence agency could replicate them. The officials also told the Times that even though Mythos was able to identify cybersecurity vulnerabilities, it didn’t actually exploit them. The author of the report in The Economist—the one that had been the initial cause of all the worry—has also admitted that his portrayal of the NSA’s tests with Mythos had been misleading. The tests “surely [involved] using Mythos alongside other tools under very particular conditions,” he wrote in a X post on Sunday. “I quoted [Senator Warner] to give a sense of Mythos’ potency. But it was a mistake not to have added caveats.” #Anthropics #Mythos #Reportedly #Hacked #NSAs #Sensitive #Systems #HoursAI,Anthropic,Mythos,NSA,Trump,White House Anthropic’s Mythos AI Reportedly Hacked the NSA’s Most Sensitive Systems ‘in Hours’
When Anthropic first disclosed Mythos in April, it sent an anxious shockwave through much of the cybersecurity sector. The new AI model was allegedly so ruthlessly effective at finding and exploiting security vulnerabilities in existing software that the company said it was holding off on a public release and would only grant access to a small group of early testers, including the U.S. National Security Agency (NSA). Another wave of fear reverberated this week after the NSA reportedly discovered multiple vulnerabilities within its own cybersecurity systems during its tests with Mythos. If that agency—which supposedly boasts the most impenetrable cyberdefenses in the world—can be hacked by Mythos, what hope does the rest of the world’s cybersecurity infrastructure have? This latest round of panic began with what seems to have been something of a game of telephone: Someone says one thing, which gets repeated by another, and another after that, and along that chain of communication, the original statement is distorted. Last week, The Economist reported that during a June 11 hearing before the Senate Committee on Banking, Housing, and Urban Affairs, Democratic Senator Mark Warner of Virginia said that Mythos had broken into “almost all of [the NSA’s] classified systems, not in weeks, but in hours.” Warner said he’d received that information from the head of the NSA himself, General Joshua Rudd, who also leads the Pentagon’s Cyber Command division. On Monday, a coalition of intelligence agencies—including the NSA and its counterparts in Canada, the U.K., Australia, and New Zealand— issued an unusually public warning that the risk that AI now poses for cybersecurity warrants a “whole-of-society response.”
The Economist’s report was seen by some as evidence that the worst fears about Mythos were true, a reaction that was undoubtedly fueled also by the aura of power and mystery that has coalesced around the model in recent months. That aura has arguably been a boon for Anthropic, which recently usurped OpenAI as the most valuable startup in the world and is preparing for what’s expected to be a historic IPO.
But it’s also been a contributing factor in its latest skirmish with the Trump administration, which ordered the company earlier this month to restrict access for all foreign nationals to Fable 5, a “Mythos-class” model that had recently been made publicly available and which was built with safeguards that to some users were annoyingly stringent. Citing national security concerns, the administration invoked an obscure piece of export control legislation, a move that, according to some legal experts, is spurious. Many cybersecurity experts, meanwhile, argued that the ban would hamstring U.S. cybersecurity defenses and give adversaries like China the upper hand. That argument was seemingly vindicated by a Tuesday report from the New York Times which said that Trump’s ban—which also targeted another model called Mythos 5, which had only been made available to a small group of organizations—had put the kibosh on the NSA’s internal tests with Mythos, and that the administration was now working with Anthropic to reinstate the agency’s access for limited purposes related to national security. The NSA did not immediately respond to Gizmodo’s request for comment.
That same report from the Times also clarified that the NSA’s internal tests with Mythos were less apocalyptic than online rumors might suggest. According to federal officials cited in the report, the tests were carried out in a digital environment so robustly controlled that it’s very unlikely any hacker or foreign intelligence agency could replicate them. The officials also told the Times that even though Mythos was able to identify cybersecurity vulnerabilities, it didn’t actually exploit them. The author of the report in The Economist—the one that had been the initial cause of all the worry—has also admitted that his portrayal of the NSA’s tests with Mythos had been misleading. The tests “surely [involved] using Mythos alongside other tools under very particular conditions,” he wrote in a X post on Sunday. “I quoted [Senator Warner] to give a sense of Mythos’ potency. But it was a mistake not to have added caveats.” #Anthropics #Mythos #Reportedly #Hacked #NSAs #Sensitive #Systems #HoursAI,Anthropic,Mythos,NSA,Trump,White House](https://gizmodo.com/app/uploads/2026/06/GeneralJoshuaRudd-1280x853.jpg)
Post Comment