New Attention is All We Need

New Attention is All We Need

In the last Belamy, we discussed how Google has taken things very personally with the launch of Gemini 3 and Nano Banana Pro. Now, Anthropic is also in the race, giving OpenAI a run for its money. Three models have arrived back-to-back this month, while Google has dropped a fresh paper that tries to rewrite how attention itself works. 

This comes as Ilya Sutskever, co-founder and chief scientist at Safe Superintelligence Inc (SSI), changed his thinking about how to build superintelligence, and Meta’s chief AI scientist, Yann LeCun, stepped out of the tech giant to build something he has been teasing for years—decidedly not an LLM.

Every layer of the stack, and the philosophy behind it, is shifting at once. The timing makes it feel like the ground itself is moving.

The comparison between OpenAI’s GPT-5.1, Google’s Gemini 3 Pro and Anthropic’s Claude Opus 4.5 sets the tone for the moment. 

The three models do not think like the loud systems from just a few years ago. They do not rush from question to answer. They actually pause when the task calls for it. This shift sounds small when you describe it. It feels enormous when you use it.

Each model’s thinking style reveals its maker’s taste. 

Article content

Article content

GPT-5.1 offers no switches or knobs. It decides on its own when to ease into a problem or tear through it. Gemini 3 Pro brings in Deep Think mode that sits like a gear shift. Claude Opus 4.5 hands the user the most control through its Effort setting, which lets you choose how much attention the model must spend. 

Each choice predicts the future that its creator is trying to summon. OpenAI dreams of speed and mass adoption. Google imagines a system that sees all media as one flow. Anthropic wants long, reliable thinking.

The coding story makes the differences sharp. 

GPT-5.1 has the Codex-Max system and its compaction trick. Once a long session fills up with logs, errors and half-attempts, older models lose their grip on clarity. Codex-Max, however, pulls all that noise into a compressed memory while keeping the meaning alive. This helps it stay sharp through a full day of work.

Gemini 3 Pro tries something very different. Google made it treat text, code, audio, video and images as a single stream. No small plug-in modules. No extra adapters. This design gives it a smooth sense of flow when reading long documents or videos. 

Claude Opus 4.5 focuses on something humbler. Earlier models sometimes forgot why they made a choice a few steps earlier. Opus 4.5 keeps that entire chain of reasons intact. It also has the rare ability to zoom into tiny parts of a screen without losing resolution. This lets it notice minor details in interfaces or papers that most models often miss.



AIM Network Deep Dive >> While the West figures this out, China has different plans. DeepSeek has released DeepSeekMath-V2, the world’s first open-weight AI model to reach IMO 2025 gold-medal level (five out of six problems)


Powered by a generator-verifier-meta-verifier self-checking loop, the model proves that open, self-verifying AI can now challenge the closed-door giants of the US. Watch the video below to understand more.


Real engineering work looks very different from leaderboard hype. 

Claude Opus 4.5 ends up as the clear winner for messy, real-world GitHub issues. GPT-5.1-Codex-Max follows very close behind. Gemini still excels at pure puzzles but loses its composure when the repository is old and broken.

The price story rearranges the ranking again and is critical for enterprise use cases. 

  • OpenAI has pushed prices to a point that feels almost like a play for market share. GPT-5.1 is cheap to run for large workloads. 
  • Gemini is costly per token, but becomes good value when the document is extremely long
  • Claude is the most expensive but gives the cleanest control through the Effort setting.

Article content


Human experience online completes the picture. No one wants to use a single model anymore. Many people combine them. GPT-5.1 becomes the dependable worker that lifts the load, Gemini the deep reader, and Claude, the careful and precise executor.

New Attention is All You Need

While this was happening, Google released a new paper that rethinks attention. The original attention mechanism from 2017 has shaped every transformer since then. The new work suggests that the classic design hits limits when context windows stretch into the million-token range. 

The paper tries to reduce this cost while keeping the clarity of long reasoning. It lands at a time when all three frontier models are pushing their own limits on long context. This makes the paper feel less like academic work and more like a signal.

To understand more about the paper, check out this deep dive from AIM Network, where we highlight the most important parts of the paper.


These model releases and Google’s new paper align with Sutskever and LeCun, who questioned the scaling law in their latest announcements.

SSI has raised almost $3 billion without a single product. Sutskever now says the straight-shot idea may need real-world exposure before it reaches the end goal. The shift in his thinking makes SSI’s roadmap feel more open than before.

“Scaling the current thing will keep leading to improvements. In particular, it won’t stall. But something important will continue to be missing,” Sutskever wrote on X. 

This is something that Francois Chollet, the creator of Keras and founder of Ndea, has been saying for the last three years. He said SSI has enough compute to test new ideas and prove that they work. He suggested that if you explore a different path, you might not need maximal scale to find the next big idea. 

Surprisingly, this lands close to what LeCun has been saying all along. 

He is leaving Meta after 12 years to launch a startup focused on Advanced Machine Intelligence, the research programme he has long discussed

He wants to build systems that understand the world, remember things for long stretches, reason and plan. Meta will stay a partner, but LeCun wants the initiative to have an impact beyond the company’s walls. Details about the startup will come later.

Meanwhile, Meta is surprisingly quiet. No one really knows why.

Regardless, these phenomena explain how scaling is no longer the answer. Even the three model releases from OpenAI, Google and Anthropic also hint that new methods are required to make models better. 




AIM, in collaboration with Snowflake, is excited to present an inspiring and future-focused webinar, ‘AI Leadership & Innovation: Are You Ready for the Next Tech Wave?’ 

Article content

The thought-provoking conversation will feature two trailblazing technology leaders, Sowmya V Kumaran, director of engineering and AI infrastructure management at Cisco, and Kanika Kapoor, senior VP of data management and analytics leader at NatWest.

Don’t miss this opportunity to learn from industry pioneers and position yourself for the next wave of innovation. Register now.




SAS Academy for Data & AI Excellence is Training India’s Workforce for the GenAI Era

SAS, a global leader in analytics, is addressing the demand shortage of skilled engineers in India with SAS Academy for Data & AI Excellence, a curriculum designed to help learners move from foundational analytics to advanced AI and automation, guided by SAS-certified instructors through flexible weekend online programmes. Click here to learn more.

This analysis strikes at the very heart of the next AI evolution. For the last few years, we have been operating in the era of 'Scale'—where larger parameters defined success. But as this piece from Analytics India Magazine highlights, we are now entering the era of 'Efficiency.' The shift toward optimizing the attention mechanism itself is not just an academic exercise; it is the economic unlock the industry has been waiting for. Reducing the computational cost of context awareness—moving from quadratic to linear complexity—changes the ROI equation entirely. It means we can move from batch-processed intelligence to real-time, always-on reasoning without breaking the bank. For enterprise leaders, this is the signal that the 'cost of intelligence' is about to drop, paving the way for ubiquitous adoption. Excellent coverage. We are moving from 'Can we build it?' to 'Can we run it efficiently at scale?'—and that changes everything.

Like
Reply

To view or add a comment, sign in

More articles by AIM

Explore content categories