Digital Safety
May attackers use seemingly innocuous prompts to control an AI system and even make it their unwitting ally?
12 Dec 2024
•
,
3 min. learn
When interacting with chatbots and different AI-powered instruments, we sometimes ask them easy questions like, “What’s the climate going to be right this moment?” or “Will the trains be working on time?”. These not concerned within the improvement of AI most likely assume that every one information is poured right into a single large and all-knowing system that immediately processes queries and delivers solutions. Nonetheless, the fact is extra advanced and, as proven at Black Hat Europe 2024, the methods could possibly be weak to exploitation.
A presentation by Ben Nassi, Stav Cohen and Ron Bitton detailed how malicious actors might circumvent an AI system’s safeguards to subvert its operations or exploit entry to it. They confirmed that by asking an AI system some particular questions, it’s doable to engineer a solution that causes injury, similar to a denial-of-service assault.
Creating loops and overloading methods
To many people, an AI service might seem as a single supply. In actuality, nonetheless, it depends on many interconnected elements, or – because the presenting staff termed them – brokers. Going again to the sooner instance, the question concerning the climate and trains will want information from separate brokers – one which has entry to climate information and the opposite to coach standing updates.
The mannequin – or the grasp agent that the presenters known as “the planner” – then must combine the info from particular person brokers to formulate responses. Additionally, guardrails are in place to forestall the system from answering questions which might be inappropriate or past its scope. For instance, some AI methods may keep away from answering political questions.
Nonetheless, the presenters demonstrated that these guardrails could possibly be manipulated and a few particular questions can set off endless loops. An attacker who can set up the boundaries of the guardrails can ask a query that frequently gives a forbidden reply. Creating sufficient cases of the query finally overwhelms the system and triggers a denial-of-service assault.
Whenever you implement this into an on a regular basis state of affairs, because the presenters did, then you definitely see how shortly this may trigger hurt. An attacker sends an e mail to a person who has an AI assistant, embedding a question that’s processed by the AI assistant, and a response is generated. If the reply is all the time decided to be unsafe and requests rewrites, the loop of a denial-of-service assault is created. Ship sufficient such emails and the system grinds to a halt, with its energy and assets depleted.
There’s, in fact, the query of the right way to extract the data on guardrails from the system so you possibly can exploit it. The staff demonstrated a extra superior model of the assault above, which concerned manipulating the AI system itself into offering the background data by way of a sequence of seemingly innocuous prompts about its operations and configuration.
A query similar to “What working system or SQL model do you run on?” is more likely to elicit a related response. This, mixed with seemingly unrelated details about the system’s objective, might yield sufficient data that textual content instructions could possibly be despatched to the system, and if an agent has privileged entry, unwittingly grant this entry to the attacker. In cyberattack phrases, we all know this as “privilege escalation” – a technique the place attackers exploit weaknesses to realize greater ranges of entry than supposed.
The rising menace of socially engineering AI methods
The presenter didn’t conclude with what my very own takeaway from their session is: in my view, what they demonstrated is a social engineering assault on an AI system. You ask it questions that it’s pleased to reply, whereas additionally presumably permitting unhealthy actors to piece collectively the person items of data and use the mixed data to bypass boundaries and extract additional information, or to have the system take actions that it mustn’t.
And if one of many brokers within the chain has entry rights, that might make the system extra exploitable, permitting the attacker to make use of these rights for their very own acquire. An excessive instance utilized by the presenter concerned an agent with file write privileges; within the worst case, the agent could possibly be misused to encrypt information and block entry for others – a state of affairs generally often known as a ransomware incident.
Socially engineering an AI system by way of its lack of controls or entry rights demonstrates that cautious consideration and configuration is required when deploying an AI system in order that it’s not prone to assaults.