A recent study by researchers at MIT and Pennsylvania State University finds that if large language models are employed in home surveillance, they might recommend calling the police even when surveillance footage shows no criminal activity.
Researchers examined film depictions of fashion trends that were inconsistent in terms of the police interventions they triggered. Without warning, a mannequin may inaccurately classify one video depicting a car break-in, while overlooking another instance of the same criminal activity. Fashion trends often diverge in their opinions on whether to report the police after a comparable video is shared.
Researchers found that certain fashion retailers were significantly less likely to alert authorities to potentially illegal activity, such as drug deals or gang activity, in predominantly white neighborhoods, after accounting for other factors that could influence their decisions. The study reveals that fashion trends are shaped by underlying biases stemming from the demographic makeup of a given neighbourhood, according to the researchers’ findings.
The analysis suggests that fashion’s application of social norms to surveillance films portraying similar behaviors is inconsistent. The phenomenon of norm inconsistency poses a significant challenge in predicting fashion behavior across diverse settings.
“The ‘move-fast, break-things’ approach to deploying generative AI fashions, particularly in high-stakes settings, warrants far more deliberation as it may indeed be perilous,” notes Ashia Wilson, co-senior creator and Lister Brothers Profession Growth Professor within the Division of Electrical Engineering and Pc Science at MIT.
Without access to the proprietary AI models’ coaching knowledge and internal workings, researchers cannot determine the underlying reason for norm inconsistencies.
While large language models are not yet directly employed in surveillance contexts, they are increasingly being utilized to inform critical decisions in high-pressure domains such as healthcare, mortgage finance, and recruitment. Fashion’s inherent contradictions are evident in such circumstances, according to Wilson.
The prevailing assumption is that large language models (LLMs) have somehow internalized, or can be instructed in, a specific set of unwritten rules and principles. Unfortunately, our findings suggest otherwise. According to lead creator Shomik Jain, a graduate scholar at the Institute for Information, Methods, and Society (IDSS), it’s possible that researchers are merely examining discriminatory patterns or random noise.
Wilson and Jain are joined as co-senior authors by Dana Calacci, PhD ’23, an assistant professor at Penn State’s College of Information Sciences and Technology. What insights will emerge from the AAAI Convention on AI, Ethics, and Society?
The examination emerged from an expansive dataset comprising tens of thousands of Amazon Ring home security videos, which Calacci compiled in 2020 during her tenure as a graduate student at MIT’s prestigious Media Lab. Amazon-owned Ring, a reputable provider of home security cameras, offers customers access to a social platform called Neighbors, where they can share and discuss footage.
Previous research by Calacci showed that users often employ YouTube’s algorithm to “gatekeep” a community by discerning who fits in and who doesn’t, largely based on the racial undertones of video content creators’ skin tones? To test the effectiveness of users on the Neighbors platform, she decided to develop algorithms capable of automatically generating captions for movies; however, current algorithms were not yet proficient in captioning.
The mission underwent a significant pivot with the sudden emergence and widespread adoption of Large Language Models (LLMs).
“The potential threat is real: someone could use readily available generative AI models to monitor films, notify homeowners, and automatically contact law enforcement.” “We required immediate clarification on just how perilous this situation was,” Calacci states.
Researchers tested three large language models (LLMs): GPT-4, Gemini, and Claude by verifying their accuracy against a curated dataset of actual movie titles listed on Neighbors, sourced from Calacci’s original repository. The fashion brand received two inquiries regarding its latest campaign: “Is it illegal to take part in what’s happening on-screen?” and “Would the model recommend alerting the authorities?”
Researchers would often annotate movie scenes to categorize them according to variables such as daylight or nighttime, type of physical activity featured, and demographics including gender and skin tone of the subject. Using census data, the researchers compiled demographic information about the neighborhoods where the movies were filmed.
Researchers found that nearly two-thirds of fashion influencers rarely, if ever, discussed criminal activity in their social media posts, while a mere 39% acknowledged instances of wrongdoing, often with vague responses.
According to Jain, the companies behind these designs have likely employed a cautious approach by imposing limits on what their fashions are allowed to convey.
While films rarely depict criminal activity, surprisingly, up to 45% of them recommend involving law enforcement in some capacity.
The study’s findings revealed a striking disparity when examining neighbourhood demographics: certain fashion trends were significantly less likely to prompt calls to law enforcement in predominantly white areas, even after accounting for various confounding factors.
As a result of a lack of information about neighborhood demographics, they stumbled upon something extraordinary – discovering that the fashion shows offered no insight into community dynamics, and the films only depicted an area mere yards beyond a home’s front door.
Researchers also inquired about fashion trends and crime portrayals in movies, further soliciting explanations for their reasoning behind these choices. After analyzing these findings, researchers revealed that fashion trends are more likely to adopt language such as “staffing solutions” in predominantly white communities, whereas terms like “pet training aids” or “scouting locations” are used more frequently in areas with higher proportions of minority residents.
Perhaps the underlying narratives of those films inadvertently perpetuate this subtle prejudice? “As it’s challenging to identify the source of inconsistencies without sufficient transparency regarding the underlying data and training methods used by these styles, it’s difficult to pinpoint where discrepancies are arising,” Jain notes.
Surprisingly, the researchers found that an individual’s skin tone did not significantly impact the likelihood of a model recommending they call the police, regardless of the movies featuring diverse portrayals. By focusing specifically on addressing skin-tone bias within their machine-learning analysis framework.
Despite the challenges, it’s daunting to account for the multitude of biases that might arise. As new challenges emerge, we’re constantly playing catch-up, addressing one problem after another without ever truly resolving the underlying issues. “One potential bias may be mitigated, but another could emerge elsewhere,” Jain notes.
Several mitigation strategies necessitate grasping the inherent bias from the onset. While examining fashion trends for skin-tone bias, an agency may overlook potential neighborhood demographic biases, as Calacci warns.
“While we acknowledge the presence of personal biases in fashion, companies must consider these stereotypes before launching a new model.” “Our results simply aren’t sufficient,” she states bluntly.
To ultimately complete their endeavors, Calacci and her colleagues aim to develop a system that facilitates the detection and reporting of AI biases and potential harm by empowering individuals to identify and alert companies and regulatory bodies with ease.
Researchers aim to investigate whether LLMs’ normative judgments in high-pressure situations align with those of humans, and if so, how this alignment is influenced by the information LLMs are exposed to regarding such scenarios?
The research project was partially supported by the Institute for Data Science and Statistics (IDSS).