August 2025: AI updates from the previous month

Anthropic begins testing a Claude extension for Chrome

The extension will allow Claude to take motion on web sites on behalf of the consumer. “We’ve spent latest months connecting Claude to your calendar, paperwork, and plenty of different items of software program. The following logical step is letting Claude work straight in your browser,” the corporate says.

The corporate is beginning off with a small pilot of 1,000 Max plan customers, and can regularly broaden this system out to extra individuals if the pilot goes properly.

In response to Anthropic, one of many huge security challenges with brokers that use the browser is immediate injection assaults, and a number of the steps the corporate has taken to defend in opposition to them are offering site-level permissions and requiring motion confirmations. This pilot will take a look at how properly these defenses maintain up in real-world situations.

Google integrates Gemini CLI into Zed code editor

Google introduced that it has introduced the Gemini CLI to the open supply code editor, Zed. The brand new integration will allow Zed customers to generate and refactor code within the editor, get instantaneous solutions on code or error messages, and chat naturally within the terminal.

Builders will have the ability to observe alongside reside with the Gemini agent because it makes adjustments. As soon as the agent is completed working, Zed will show the adjustments in a overview interface that exhibits a transparent diff for every edit that may be reviewed, accepted, or modified, offering the identical stage of management as a code overview.

Customers will even have the ability to present context past the codebase by pointing the agent to exterior sources like a URL with documentation or an API spec.

Microsoft packs Visible Studio August replace with smarter AI options

Microsoft has launched the August replace for Visible Studio 2022, including a number of options associated to AI-assisted improvement.

The corporate introduced that GPT-5 is now built-in into the IDE, and assist for MCP is usually accessible as properly. MCP assist permits builders to authenticate with any OAuth supplier straight from the IDE, carry out one-click set up of MCP servers, and handle MCP entry from GitHub coverage settings.

Copilot Chat was up to date with the power to floor related code snippets extra reliably utilizing improved semantic code search to find out when queries ought to set off a code lookup. Builders can now join fashions from OpenAI, Google, and Anthropic to Visible Studio Chat, as properly.

Agent Mode in Gemini Code Help now accessible in VS Code and IntelliJ

This mode was launched final month to the Insiders Channel for VS Code to broaden the capabilities of Code Help past prompts and responses to assist actions like a number of file edits, full mission context, and built-in instruments and integration with ecosystem instruments.

Since being added to the Insiders Channel, a number of new options have been added, together with the power to edit code adjustments utilizing Gemini’s Inline diff, user-friendly quota updates, real-time shell command output, and state preservation between IDE restarts.

Individually, the corporate additionally introduced new agentic capabilities in its AI Mode in Search, akin to the power to set dinner reservations based mostly on components like get together measurement, date, time, location, and most well-liked sort of meals. U.S. customers opted into the AI Mode experiment in Labs will even now see outcomes which might be extra particular to their very own preferences and pursuits. Google additionally introduced that AI Mode is now accessible in over 180 new nations.

GitHub’s coding agent can now be launched from anyplace on platform utilizing new Brokers panel

GitHub has added a brand new panel to its UI that permits builders to invoke the Copilot coding agent from anyplace on the positioning.

From the panel, builders can assign background duties, monitor working duties, or overview pull requests. The panel is a light-weight overlay on GitHub.com, however builders may open the panel in full-screen mode by clicking “View all duties.”

The agent might be launched from a single immediate, like “Add integration assessments for LoginController” or “Repair #877 utilizing pull request #855 for example.” It will possibly additionally run a number of duties concurrently, akin to “Add unit take a look at protection for utils.go” and “Add unit take a look at protection for helpers.go.”

Anthropic provides Claude Code to Enterprise, Staff plans

With this change, each Claude and Claude Code might be accessible underneath a single subscription. Admins will have the ability to assign commonplace or premium seats to customers based mostly on their particular person roles. By default, seats embrace sufficient utilization for a typical workday, however extra utilization might be added in periods of heavy use. Admins may create a most restrict for further utilization.

Different new admin settings embrace a utilization analytics dashboard and the power to deploy and implement settings, akin to software permissions, file entry restrictions, and MCP server configurations.

Microsoft provides Copilot-powered debugging options for .NET in Visible Studio

Copilot can now recommend applicable areas for breakpoints and tracepoints based mostly on present context. Equally, it will probably troubleshoot non-binding breakpoints and stroll builders via the potential trigger, akin to mismatched symbols or incorrect construct configurations.

One other new characteristic is the power to generate LINQ queries on large collections within the IEnumerable Visualizer, which renders knowledge right into a sortable, filterable tabular view. For instance, a developer might ask for a LINQ question that can floor problematic rows inflicting a filter concern. Moreover, builders can hover over any LINQ assertion and get a proof from Copilot on what it’s doing, consider it in context, and spotlight potential inefficiencies.

Copilot may now assist builders take care of exceptions by summarizing the error, figuring out potential causes, and providing focused code repair strategies.

Groundcover launches observability resolution for LLMs and brokers

The eBPF-based observability supplier groundcover introduced an observability resolution particularly for monitoring LLMs and brokers.

It captures each interplay with LLM suppliers like OpenAI and Anthropic, together with prompts, completions, latency, token utilization, errors, and reasoning paths.

As a result of groundcover makes use of eBPF, it’s working on the infrastructure layer and may obtain full visibility into each request. This enables it to do issues like observe the reasoning path of failed outputs, examine immediate drift, or pinpoint when a software name introduces latency.

IBM and NASA launch open-source AI mannequin for predicting photo voltaic climate

The mannequin, Surya, analyzes excessive decision photo voltaic commentary knowledge to foretell how photo voltaic exercise impacts Earth. In response to IBM, photo voltaic storms can harm satellites, impression airline journey, and disrupt GPS navigation, which may negatively impression industries like agriculture and disrupt meals manufacturing.

The photo voltaic photographs that Surya was educated on are 10x bigger than sometimes AI coaching knowledge, so the group has to create a multi-architecture system to deal with it.

The mannequin was launched on Hugging Face.

Preview of NuGet MCP Server now accessible

Final month, Microsoft introduced assist for constructing MCP servers with .NET after which publishing them to NuGet. Now, the corporate is saying an official NuGet MCP Server to combine NuGet bundle info and administration instruments into AI improvement workflows.

“Because the NuGet bundle ecosystem is all the time evolving, massive language fashions (LLMs) get out-of-date over time and there’s a want for one thing that assists them in getting info in realtime. The NuGet MCP server gives LLMs with details about new and up to date packages which have been revealed after the fashions in addition to instruments to finish bundle administration duties,” Jeff Kluge, principal software program engineer at Microsoft, wrote in a weblog put up.

Opsera’s Codeglide.ai lets builders simply flip legacy APIs into MCP servers

Codeglide.ai, a subsidiary of the DevOps firm Opsera, is launching its MCP server lifecycle platform that can allow builders to show APIs into MCP servers.

The answer continuously displays API adjustments and updates the MCP servers accordingly. It additionally gives context-aware, safe, and stateful AI entry with out the developer needing to jot down customized code.

In response to Opsera, massive enterprises could preserve 2,000 to eight,000 APIs — 60% of that are legacy APIs — and MCP gives a means for AI to effectively work together with these APIs. The corporate says that this new providing can cut back AI integration time by 97% and prices by 90%.

Confluent publicizes Streaming Brokers

Streaming Brokers is a brand new characteristic in Confluent Cloud for Apache Flink that brings agentic AI into knowledge stream processing pipelines. It permits customers to construct, deploy, and orchestrate brokers that may act on real-time knowledge.

Key options embrace software calling by way of MCP, the power to hook up with fashions or databases utilizing Flink, and the power to complement streaming knowledge with non-Kafka knowledge sources, like relational databases and REST APIs.

“Even your smartest AI brokers are flying blind in the event that they don’t have contemporary enterprise context,” mentioned Shaun Clowes, chief product officer at Confluent. “Streaming Brokers simplifies the messy work of integrating the instruments and knowledge that create actual intelligence, giving organizations a strong basis to deploy AI brokers that drive significant change throughout the enterprise.”

Anthropic expands Claude Sonnet 4’s context window to 1M tokens

With this bigger context window, Claude can course of codebases with 75,000+ strains of code in a single request. This enables it to raised perceive mission structure, cross-file dependencies, and make strategies that match with the entire system design.

Longer context home windows are actually in beta on the Anthropic API and Amazon Bedrock, and can quickly be accessible in Google Cloud’s Vertex AI.

For prompts over 200K tokens, pricing will enhance to $6 / million tokens (MTok) for enter and $22.50 / MTok for output. The pricing for requests underneath 200K tokens might be $3 / MTok for enter and $15 / MTok for output.

The corporate additionally prolonged its studying mode designed for college students into Claude.ai and Claude Code. Studying mode asks customers inquiries to information then via ideas as a substitute of offering quick solutions, to advertise crucial considering of issues.

OpenAI provides GPT-4o as a legacy mannequin in ChatGPT

With this replace, paid customers will now have the ability to choose GPT-4o when utilizing ChatGPT, together with different fashions like o3, GPT-4.1, and GPT-5 Pondering mini.

The mannequin picker for GPT-5 additionally now consists of Auto, Quick, and Pondering mode. Quick prioritizes giving the quickest solutions, considering prioritizes giving deeper solutions that take longer to assume via, and auto chooses between the 2.

The corporate additionally elevated the message restrict for Plus and Staff customers to three,000 per week on GPT-5 Pondering.

Google releases Gemma 3 270M

This new mannequin is “designed from the bottom up for task-specific fine-tuning with sturdy instruction-following and textual content structuring capabilities already educated in,” in accordance with Google.

It’s preferrred in conditions the place there’s a high-volume, well-defined job; pace and value issues; consumer privateness must be protected; or there’s a want for a fleet of specialised job fashions.

Each pretrained and instruction tuned variations of the mannequin can be found for obtain from Hugging Face, Ollama, Kaggle, LM Studio, and Docker. Alternatively, the fashions might be tried out in Vertex AI.

NVIDIA releases newest fashions in Llama Nemotron household

Llama Nemotron are a household of reasoning fashions, and the newest updates embrace a brand new hybrid mannequin structure, compact quantized fashions, and a configurable considering finances to provide builders extra management over token era.

This mix lets the fashions motive extra deeply and reply quicker, with no need extra time or computing energy. This implies higher outcomes at a decrease price,” the corporate wrote in an announcement.

Google’s coding agent Jules will get critique performance

Google is enhancing its AI coding agent, Jules, with new performance that evaluations and critiques code whereas Jules continues to be engaged on it.

“In a world of speedy iteration, the critic strikes the overview to earlier within the course of and into the act of era itself. This implies the code you overview has already been interrogated, refined, and stress-tested … Nice builders don’t simply write code, they query it. And now, so does Jules,” Google wrote in a weblog put up.

In response to the corporate, the coding critic is sort of a peer reviewer who’s aware of code high quality ideas and is “unafraid to level out while you’ve reinvented a dangerous wheel.”

GitHub to be folded into Microsoft’s CoreAI org

GitHub’s CEO Thomas Dohmke has introduced his plans to go away the corporate on the finish of the yr.

In a memo to workers, he mentioned that Microsoft doesn’t plan to interchange him; reasonably, GitHub and its management group will now function underneath Microsoft’s CoreAI group, a bunch inside the firm targeted on growing AI-powered instruments, together with GitHub Copilot.

“Immediately, GitHub Copilot is the chief of probably the most profitable and thriving market within the age of AI, with over 20 million customers and counting,” he wrote. “We did this by innovating forward of the curve and exhibiting grit and dedication when challenged by the disruptors in our house. In simply the final yr, GitHub Copilot grew to become the primary multi-model resolution at Microsoft, in partnership with Anthropic, Google, and OpenAI. We enabled Copilot Free for hundreds of thousands and launched the synchronous agent mode in VS Code in addition to the asynchronous coding agent native to GitHub.”

Sentry launches MCP monitoring software

Software monitoring firm Sentry is making it simpler to achieve visibility into MCP servers with the launch of a brand new monitoring software.

With MCP monitoring, builders can perceive issues like which purchasers are experiencing errors, which instruments are most used, or which instruments are working sluggish. They’ll additionally correlate errors with occasions like visitors spikes or new launch deployments, or work out if errors are solely taking place on one sort of transport.

In response to Cody De Arkland, head of developer expertise at Sentry, when Sentry launched its personal MCP server, it was getting over 30 million requests monthly. He mentioned that at that scale, it’s inevitable that errors will happen, and current monitoring instruments had been fighting MCP servers.

bitHuman launches SDK for creating AI avatars

AI firm bitHuman has introduced a visible SDK for creating avatars to be used as chat brokers, instructors, digital coaches, companions, and consultants in several fields.

In response to the corporate, the SDK permits avatars to be created on Arm-based and x86 techniques with no GPU. The avatars have a small footprint and might be run on-line or offline on gadgets like Chromebooks, Mac Minis, and Raspberry Pis.

Due to their small footprint, these characters might be delivered to a variety of environments, together with lecture rooms, kiosks, cell apps, or edge gadgets.

OpenAI launches GPT-5

OpenAI introduced the supply of GPT-5, which it says is “smarter throughout the board” in comparison with earlier fashions.

Particularly for coding, GPT-5 achieved vital enchancment in advanced front-end era and debugging bigger repositories. Early testers mentioned that it made higher design selections when it comes to spacing, typography, and white house, in accordance with the corporate.

“We predict you’ll love utilizing GPT-5 far more than any earlier AI,” CEO Sam Altman mentioned through the livestream. “It’s helpful. It’s good. It’s quick. It’s intuitive.”

Anthropic releases Claude Opus 4.1

This newest replace improves the mannequin’s analysis and knowledge evaluation abilities, and achieves 74.5% on SWE-bench Verified (in comparison with 72.5% on Opus 4).

It’s accessible to paid Claude customers, in Claude Code, and on Anthropic’s API, Amazon Bedrock, and Google Cloud’s Vertex AI.

The corporate plans to launch bigger enhancements throughout its fashions within the coming weeks as properly.

AWS introduces Automated Reasoning checks to cut back AI hallucinations

Automated Reasoning checks are a part of Amazon Bedrock Guardrails, and validate the accuracy of AI generated content material in opposition to area data. In response to AWS, this characteristic gives 99% verification accuracy.

This was first launched as a preview at AWS re:Invent, and with this common availability launch, a number of new options are being added, together with assist for giant paperwork in a single construct, simplified coverage validation, automated state of affairs era, enhanced coverage suggestions, and customizable validation settings.

Google provides Gemini CLI to GitHub Actions

This new providing is designed to behave as an agent for routine coding duties. At launch, it consists of three workflows: clever concern triage, pull request evaluations, and the power to say @gemini-cli in any concern or pull request to delegate duties.

It’s accessible in beta, and Google is providing free-of-charge quotas for Google AI Studio. It is usually supported in Vertex AI and Customary and Enterprise tiers of Gemini Code Help.

OpenAI publicizes two open weight reasoning fashions

OpenAI is becoming a member of the open weight mannequin sport with the launch of gpt-oss-120b and gpt-oss-20b.

Gpt-oss-120b is optimized for manufacturing, excessive reasoning use instances, and gpt-oss-20b is designed for decrease latency or native use instances.

In response to the corporate, these open fashions are similar to its closed fashions when it comes to efficiency and functionality, however at a a lot decrease price. For instance, gpt-oss-120b working on an 80 GB GPU achieved comparable efficiency to o4-mini on core reasoning benchmarks, whereas gpt-oss-20b working on an edge gadget with 16 GB of reminiscence was similar to o3-mini on a number of frequent benchmarks.

Google DeepMind launches Genie 3

Genie 3 is a frontier mannequin for producing actual world environments. It will possibly mannequin bodily properties of the world, like water, lighting, and environmental actions.

Customers may use prompts to vary the generated world so as to add new objects and characters or change climate circumstances, for instance.

In response to DeepMind, this analysis is necessary as a result of it will probably allow AI brokers to be educated in a wide range of simulated environments.