
(New Africa/Shutterstock)
Because the earliest days of huge information, information engineers have been the unsung heroes doing the soiled work of shifting, reworking, and prepping information so extremely paid information scientists and machine studying engineers can do their factor and get the glory. Because the agentic AI period dawns on us, it opens up a bunch of latest information engineering alternatives–in addition to doubtlessly catostrphic pitfalls.
Frank Weigel, the previous Googel and Microsoft govt who was not too long ago employed by Matillion to be its new chief product officer, overtly puzzled to a reporter not too long ago whether or not the Agentic AI Air was on a glideslope for catastrophe.
“Principally, we see there’s an enormous drawback coming for information engineering groups,” Weigel mentioned in an interview throughout the latest Snowflake Summit. “I’m undecided all people is totally conscious of it.”
Right here’s the problem, as Weigel defined it:
The explosion of supply information is one facet of the issue. Knowledge engineers who’re accustomed to working with structured information are actually being requested to handle, prep, and rework unstructured information, which is harder to work with, however which in the end is the gasoline for many AI (i.e. phrases and footage processed by neural networks).
Knowledge engineers are already overworked. Weigel cited a examine that indicated 80% of knowledge engineering groups are already overloaded. However whenever you add AI and unstructured information to the combination, the workload problem turns into much more acute.
Agentic AI gives a possible answer. It’s pure that overworked information engineering groups will flip to AI for assist. There’s a bevy of suppliers constructing copilots and swarms of AI brokers that, ostensibly, can construct, deploy, monitor, and repair information pipelines once they break. We’re already seeing agentic AI have actual impacts on information engineering groups, in addition to the downstream information analysts who in the end are those requesting the info within the first place.
However based on Weigel, if we implement agentic AI for information engineering the mistaken approach we’re doubtlessly setting ourselves a lure that shall be powerful to get out of.
The issue that he’s foreseeing would stem from AI brokers that entry supply information on their very own. If an analyst can kick off an agentic AI workflow that in the end includes the AI agent writing SQL to acquire a chunk of knowledge from some upstream system, what occurs when one thing goes mistaken with the info pipeline? AI brokers may be capable to repair primary issues, however what about critical ones that demand human consideration?
“You should have autonomous AI brokers that run complete enterprise features,” Weigel mentioned. “However equally, they begin to have an enormous want for information. And so if the info workforce already was overloaded earlier than, effectively, it’s now going to be like wanting down the abyss and saying ‘How on earth can we do something? How am I going to have a human information engineer reply a query from an AI agent?’”
As soon as human information engineers are out of the loop, unhealthy issues can begin taking place, Weigel mentioned. They doubtlessly face a scenario the place the amount of knowledge requests–which initially have been served by human information engineers however now are being served by AI brokers–is past their functionality to maintain up.
The accuracy of knowledge will even undergo, he mentioned. If each AI agent writes its personal SQL and pulls information straight out of its supply, the chances of getting the mistaken reply goes up significantly.
“We’re now again at the hours of darkness ages, the place we have been 10 years in the past [when we wondered] why we’d like information warehouses,” he mentioned. “I do know that if individual A, B, and C ask a query, and beforehand they wrote their very own queries, they bought completely different outcomes. Proper now, we ask the identical agent the identical query, and since they’re non-deterministic, they are going to really create completely different queries each time you ask it. And consequently, you now have the completely different enterprise features all getting completely different solutions, insisting after all that it’s proper.
“You might have misplaced all of the governance and management of why you established a central information workforce,” Weigel continued. “And for me, that’s the angle that I feel numerous information orgs haven’t actually thought of. Once I get a demo of an AI agent, they by no means discuss that. They only have the agent entry the info straight. And positive, it might probably. However the issue is, it shouldn’t actually.”
The reply to this dilemma, based on Weigel, is twofold. First, it’s necessary to maintain information warehouses, because it serves as a repository for information that has been vetted, checked, and standardized.
It’s additionally important to maintain people within the loop, based on Weigel. And to maintain people within the loop, human information engineers should one way or the other be prevented from changing into utterly overwhelmed by the unstructured information requests and the brand new AI workflows. To perform that, he mentioned, they basically should develop into superhuman information engineers, augmented with AI.
Matillion is constructing its agentic AI options round this technique. As a substitute of setting AI brokers unfastened to write down their very own SQL towards supply information methods, Matillion is utilizing AI brokers as supporting solid members who’s objective is to help the human information engineer in getting the work carried out.
This on-demand workforce of digital information engineers is dubbed Maia, which the corporate introduced earlier this month. The brokers, which run within the Matillion Knowledge Producdtivity Cloud (DPC), are capable of help information engineers with a variety of duties, together with creating information connectors, constructing information pipelines, documenting modifications, testing pipelines, and analyzing failures.
“We have to supercharge the info engineering operate, and we have to allow them to match the AI capabilities,” he mentioned. “As a substitute of only a copilot idea, it has develop into a element, a collection of completely different information engineers which have completely different duties. They’ll do various things.”
Maia acts because the lead agent that controls varied sub-agents. The corporate has three or 4 such information engineering sub-agents immediately, Weigel mentioned, and it’ll have extra sooner or later. Maia, which is constructed utilizing a set of huge language fashions (LLMs), together with Anthropic’s Claude–may even appropriate itself when it does one thing mistaken.
“It’s actually fascinating,” Weigel mentioned. “If you see it work, it is going to break down the issue into the steps. Then it is going to begin doing it. It’s going to take a look at the info and resolve whether or not it’s going heading in the right direction. It would roll again. ‘That wasn’t fairly proper.’ And so it actually is sort of a information engineer in its activity and considering, together with wanting on the information. It’s going to ask the human for sure at sure factors if it desires enter.”
Regardless of the potential for agentic autonomy, that’s not a part of the Matillion plan, as the corporate sees the human engineer as a important backstop that may’t be eradicated from the equation.
One other necessary backstop that would assist Matillion prospects keep away from agentic AI pitfalls: No AI technology of SQL.
Whereas LLMs like Claude have gotten actually, actually good at writing SQL, Matillion is not going to hand the reins over to AI for this important element. The ETL vendor has been routinely producing SQL as a part of its information pipeline answer for Snowflake, Databricks, and different cloud information warehouses for years, and it’s not about to begin from scratch.
“The key in Matillion is we’ve abstracted that layer so we’re a lot nearer to the consumer intent,” Weigel mentioned. “So the consumer is constructing that information pipeline intent with predefined constructing blocks that in the end write SQL. But it surely’s Matillion that writes SQL, not the consumer.”
This strategy additionally avoids the issue of getting spaghetti SQL code that may’t be up to date and modified over time, which is a chance with AI-generated code.
“We have now this abstraction of this intermediate illustration of those elements that in flip points SQL,” Weigel mentioned. “And so our agent doesn’t need to generate no matter code you want. As a substitute, it’s about selecting the correct element and configuring the appropriate element after which sequencing them collectively.”
It’s simple to get mesmerized by the “shiny object” syndrome within the tech world. With all of the advances in generative AI, it’s tempting to letting these shiny new copilots unfastened to attempt to replicate the job of the overworked, under-appreciated information engineer, at a fraction of her price.
But when changing information engineers with AI additionally means changing a lot of the governance and management the info engineer brings, that would spell catastrophe for corporations. “I feel information engineering groups aren’t possibly totally conscious of the potential doom that’s there,” Weigel mentioned.
As a substitute, corporations must be trying to super-charge these overworked information engineers utilizing AI, which Weigel mentioned is the most effective hope for surviving the AI information deluge.
Associated Objects:
Are We Placing the Agentic Cart Earlier than the LLM Horse?
Matillion Bringing AI to Knowledge Pipelines
Matillion Appears to be like to Unlock Knowledge for AI