Need smarter insights in your inbox? Join our weekly newsletters to get solely what issues to enterprise AI, knowledge, and safety leaders. Subscribe Now
A complete new examine has revealed that open-source synthetic intelligence fashions devour considerably extra computing assets than their closed-source opponents when performing equivalent duties, doubtlessly undermining their value benefits and reshaping how enterprises consider AI deployment methods.
The analysis, performed by AI agency Nous Analysis, discovered that open-weight fashions use between 1.5 to 4 instances extra tokens — the fundamental models of AI computation — than closed fashions like these from OpenAI and Anthropic. For easy data questions, the hole widened dramatically, with some open fashions utilizing as much as 10 instances extra tokens.
Measuring Considering Effectivity in Reasoning Fashions: The Lacking Benchmarkhttps://t.co/b1e1rJx6vZ
We measured token utilization throughout reasoning fashions: open fashions output 1.5-4x extra tokens than closed fashions on equivalent duties, however with big variance relying on activity sort (as much as… pic.twitter.com/LY1083won8
— Nous Analysis (@NousResearch) August 14, 2025
“Open weight fashions use 1.5–4× extra tokens than closed ones (as much as 10× for easy data questions), making them generally costlier per question regardless of decrease per‑token prices,” the researchers wrote of their report revealed Wednesday.
The findings problem a prevailing assumption within the AI trade that open-source fashions provide clear financial benefits over proprietary alternate options. Whereas open-source fashions sometimes value much less per token to run, the examine suggests this benefit will be “simply offset in the event that they require extra tokens to cause a few given downside.”
AI Scaling Hits Its Limits
Energy caps, rising token prices, and inference delays are reshaping enterprise AI. Be a part of our unique salon to find how prime groups are:
- Turning vitality right into a strategic benefit
- Architecting environment friendly inference for actual throughput good points
- Unlocking aggressive ROI with sustainable AI techniques
Safe your spot to remain forward: https://bit.ly/4mwGngO
The true value of AI: Why ‘cheaper’ fashions might break your funds
The analysis examined 19 totally different AI fashions throughout three classes of duties: primary data questions, mathematical issues, and logic puzzles. The crew measured “token effectivity” — what number of computational models fashions use relative to the complexity of their options—a metric that has acquired little systematic examine regardless of its important value implications.
“Token effectivity is a crucial metric for a number of sensible causes,” the researchers famous. “Whereas internet hosting open weight fashions could also be cheaper, this value benefit might be simply offset in the event that they require extra tokens to cause a few given downside.”

The inefficiency is especially pronounced for Giant Reasoning Fashions (LRMs), which use prolonged “chains of thought” to unravel advanced issues. These fashions, designed to assume via issues step-by-step, can devour hundreds of tokens pondering easy questions that ought to require minimal computation.
For primary data questions like “What’s the capital of Australia?” the examine discovered that reasoning fashions spend “a whole lot of tokens pondering easy data questions” that might be answered in a single phrase.
Which AI fashions truly ship bang to your buck
The analysis revealed stark variations between mannequin suppliers. OpenAI’s fashions, significantly its o4-mini and newly launched open-source gpt-oss variants, demonstrated distinctive token effectivity, particularly for mathematical issues. The examine discovered OpenAI fashions “stand out for excessive token effectivity in math issues,” utilizing as much as 3 times fewer tokens than different business fashions.
Amongst open-source choices, Nvidia’s llama-3.3-nemotron-super-49b-v1 emerged as “essentially the most token environment friendly open weight mannequin throughout all domains,” whereas newer fashions from firms like Magistral confirmed “exceptionally excessive token utilization” as outliers.
The effectivity hole diversified considerably by activity sort. Whereas open fashions used roughly twice as many tokens for mathematical and logic issues, the distinction ballooned for easy data questions the place environment friendly reasoning needs to be pointless.

What enterprise leaders have to learn about AI computing prices
The findings have fast implications for enterprise AI adoption, the place computing prices can scale quickly with utilization. Firms evaluating AI fashions usually concentrate on accuracy benchmarks and per-token pricing, however might overlook the entire computational necessities for real-world duties.
“The higher token effectivity of closed weight fashions usually compensates for the upper API pricing of these fashions,” the researchers discovered when analyzing complete inference prices.
The examine additionally revealed that closed-source mannequin suppliers look like actively optimizing for effectivity. “Closed weight fashions have been iteratively optimized to make use of fewer tokens to scale back inference value,” whereas open-source fashions have “elevated their token utilization for newer variations, presumably reflecting a precedence towards higher reasoning efficiency.”

How researchers cracked the code on AI effectivity measurement
The analysis crew confronted distinctive challenges in measuring effectivity throughout totally different mannequin architectures. Many closed-source fashions don’t reveal their uncooked reasoning processes, as a substitute offering compressed summaries of their inside computations to stop opponents from copying their strategies.
To handle this, researchers used completion tokens — the entire computational models billed for every question — as a proxy for reasoning effort. They found that “most up-to-date closed supply fashions is not going to share their uncooked reasoning traces” and as a substitute “use smaller language fashions to transcribe the chain of thought into summaries or compressed representations.”
The examine’s methodology included testing with modified variations of well-known issues to reduce the affect of memorized options, corresponding to altering variables in mathematical competitors issues from the American Invitational Arithmetic Examination (AIME).

The way forward for AI effectivity: What’s coming subsequent
The researchers recommend that token effectivity ought to develop into a major optimization goal alongside accuracy for future mannequin improvement. “A extra densified CoT may even enable for extra environment friendly context utilization and will counter context degradation throughout difficult reasoning duties,” they wrote.
The discharge of OpenAI’s open-source gpt-oss fashions, which display state-of-the-art effectivity with “freely accessible CoT,” might function a reference level for optimizing different open-source fashions.
The entire analysis dataset and analysis code are out there on GitHub, permitting different researchers to validate and lengthen the findings. Because the AI trade races towards extra highly effective reasoning capabilities, this examine means that the true competitors will not be about who can construct the neatest AI — however who can construct essentially the most environment friendly one.
In spite of everything, in a world the place each token counts, essentially the most wasteful fashions might discover themselves priced out of the market, no matter how nicely they’ll assume.