It’s easy to trick the large language models powering chatbots like OpenAI’s ChatGPT and Google’s Bard. In one experiment in February, security researchers forced Microsoft’s Bing chatbot to behave like a scammer. Hidden instructions on a web page the researchers created told the chatbot to ask the person using it to hand over their bank account details. This kind of attack, where concealed information can make the AI system behave in unintended ways, is just the beginning.
Hundreds of examples of “indirect prompt injection” attacks have been created since then. This type of attack is now considered one of the most concerning ways that language models could be abused by hackers. As generative AI systems are put to work by big corporations and smaller startups, the cybersecurity industry is scrambling to raise awareness of the potential dangers. In doing so, they hope to keep data—both personal and corporate—safe from attack. Right now there isn’t one magic fix, but common security practices can reduce the risks.
“Indirect prompt injection is definitely a concern for us,” says Vijay Bolina, the chief information security officer at Google’s DeepMind artificial intelligence unit, who says Google has multiple projects ongoing to understand how AI can be attacked. In the past, Bolina says, prompt injection was considered “problematic,” but things have accelerated since people started connecting large language models (LLMs) to the internet and plug-ins, which can add new data to the systems. As more companies use LLMs, potentially feeding them more personal and corporate data, things are going to get messy. “We definitely think this is a risk, and it actually limits the potential uses of LLMs for us as an industry,” Bolina says.
Prompt injection attacks fall into two categories—direct and indirect. And it’s the latter that’s causing most concern amongst security experts. When using a LLM, people ask questions or provide instructions in prompts that the system then answers. Direct prompt injections happen when someone tries to make the LLM answer in an unintended way—getting it to spout hate speech or harmful answers, for instance. Indirect prompt injections, the really concerning ones, take things up a notch. Instead of the user entering a malicious prompt, the instruction comes from a third party. A website the LLM can read, or a PDF that’s being analyzed, could, for example, contain hidden instructions for the AI system to follow.
“The fundamental risk underlying all of these, for both direct and indirect prompt instructions, is that whoever provides input to the LLM has a high degree of influence over the output,” says Rich Harang, a principal security architect focusing on AI systems at Nvidia, the world’s largest maker of AI chips. Put simply: If someone can put data into the LLM, then they can potentially manipulate what it spits back out.
Security researchers have demonstrated how indirect prompt injections could be used to steal data, manipulate someone’s résumé, and run code remotely on a machine. One group of security researchers ranks prompt injections as the top vulnerability for those deploying and managing LLMs. And the National Cybersecurity Center, a branch of GCHQ, the UK’s intelligence agency, has even called attention to the risk of prompt injection attacks, saying there have been hundreds of examples so far. “Whilst research is ongoing into prompt injection, it may simply be an inherent issue with LLM technology,” the branch of GCHQ warned in a blog post. “There are some strategies that can make prompt injection more difficult, but as yet there are no surefire mitigations.”