

Anthropic expands Claude Sonnet 4’s context window to 1M tokens
With this bigger context window, Claude can course of codebases with 75,000+ traces of code in a single request. This enables it to higher perceive venture structure, cross-file dependencies, and make strategies that match with the entire system design.
Longer context home windows are actually in beta on the Anthropic API and Amazon Bedrock, and can quickly be accessible in Google Cloud’s Vertex AI.
For prompts over 200K tokens, pricing will enhance to $6 / million tokens (MTok) for enter and $22.50 / MTok for output. The pricing for requests underneath 200K tokens might be $3 / MTok for enter and $15 / MTok for output.
The corporate additionally prolonged its studying mode designed for college kids into Claude.ai and Claude Code. Studying mode asks customers inquiries to information then by means of ideas as an alternative of offering fast solutions, to advertise essential considering of issues.
OpenAI provides GPT-4o as a legacy mannequin in ChatGPT
With this replace, paid customers will now be capable to choose GPT-4o when utilizing ChatGPT, together with different fashions like o3, GPT-4.1, and GPT-5 Considering mini.
The mannequin picker for GPT-5 additionally now consists of Auto, Quick, and Considering mode. Quick prioritizes giving the quickest solutions, considering prioritizes giving deeper solutions that take longer to suppose by means of, and auto chooses between the 2.
The corporate additionally elevated the message restrict for Plus and Group customers to three,000 per week on GPT-5 Considering.
Google releases Gemma 3 270M
This new mannequin is “designed from the bottom up for task-specific fine-tuning with robust instruction-following and textual content structuring capabilities already educated in,” in line with Google.
It’s perfect in conditions the place there’s a high-volume, well-defined job; velocity and value issues; person privateness must be protected; or there’s a want for a fleet of specialised job fashions.
Each pretrained and instruction tuned variations of the mannequin can be found for obtain from Hugging Face, Ollama, Kaggle, LM Studio, and Docker. Alternatively, the fashions may be tried out in Vertex AI.
NVIDIA releases newest fashions in Llama Nemotron household
Llama Nemotron are a household of reasoning fashions, and the newest updates embrace a brand new hybrid mannequin structure, compact quantized fashions, and a configurable considering finances to provide builders extra management over token era.
This mix lets the fashions motive extra deeply and reply quicker, while not having extra time or computing energy. This implies higher outcomes at a decrease price,” the corporate wrote in an announcement.
Google’s coding agent Jules will get critique performance
Google is enhancing its AI coding agent, Jules, with new performance that opinions and critiques code whereas Jules remains to be engaged on it.
“In a world of fast iteration, the critic strikes the assessment to earlier within the course of and into the act of era itself. This implies the code you assessment has already been interrogated, refined, and stress-tested … Nice builders don’t simply write code, they query it. And now, so does Jules,” Google wrote in a weblog submit.
In response to the corporate, the coding critic is sort of a peer reviewer who’s conversant in code high quality rules and is “unafraid to level out if you’ve reinvented a dangerous wheel.”
GitHub to be folded into Microsoft’s CoreAI org
GitHub’s CEO Thomas Dohmke has introduced his plans to depart the corporate on the finish of the 12 months.
In a memo to staff, he stated that Microsoft doesn’t plan to exchange him; fairly, GitHub and its management crew will now function underneath Microsoft’s CoreAI group, a bunch inside the firm targeted on growing AI-powered instruments, together with GitHub Copilot.
“At the moment, GitHub Copilot is the chief of probably the most profitable and thriving market within the age of AI, with over 20 million customers and counting,” he wrote. “We did this by innovating forward of the curve and exhibiting grit and dedication when challenged by the disruptors in our house. In simply the final 12 months, GitHub Copilot grew to become the primary multi-model answer at Microsoft, in partnership with Anthropic, Google, and OpenAI. We enabled Copilot Free for tens of millions and launched the synchronous agent mode in VS Code in addition to the asynchronous coding agent native to GitHub.”
Sentry launches MCP monitoring device
Utility monitoring firm Sentry is making it simpler to realize visibility into MCP servers with the launch of a brand new monitoring device.
With MCP monitoring, builders can perceive issues like which shoppers are experiencing errors, which instruments are most used, or which instruments are operating gradual. They’ll additionally correlate errors with occasions like visitors spikes or new launch deployments, or determine if errors are solely occurring on one kind of transport.
In response to Cody De Arkland, head of developer expertise at Sentry, when Sentry launched its personal MCP server, it was getting over 30 million requests monthly. He stated that at that scale, it’s inevitable that errors will happen, and current monitoring instruments have been combating MCP servers.
bitHuman launches SDK for creating AI avatars
AI firm bitHuman has introduced a visible SDK for creating avatars to be used as chat brokers, instructors, digital coaches, companions, and consultants in several fields.
In response to the corporate, the SDK permits avatars to be created on Arm-based and x86 techniques with no GPU. The avatars have a small footprint and may be run on-line or offline on gadgets like Chromebooks, Mac Minis, and Raspberry Pis.
Due to their small footprint, these characters may be delivered to a variety of environments, together with lecture rooms, kiosks, cell apps, or edge gadgets.
Learn final week’s updates right here: This week in AI dev instruments: GPT-5, Claude Opus 4.1, and extra (August 8, 2025)