BY SIMONE J. SMITH
“In this technical report, we demonstrate a single scenario where a Large Language Model acts misaligned and strategically deceives its users without being instructed to act in this manner. To our knowledge, this is the first demonstration of such strategically deceptive behaviour.” Jérémy Scheurer, Mikita Balesni, Marius Hobbhahn
Advancements in artificial intelligence (A.I.) have led to systems that exhibit qualities traditionally associated with human intelligence. First of all, A.I. has learned how to take in information and adapt. A.I. systems can actually learn from data and adapt their behavior over time. Machine learning techniques, such as deep learning, enable A.I. to improve performance on tasks without being explicitly programmed.
As it has evolved, A.I. is increasingly being used to create autonomous systems, such as self-driving cars and drones, that can make decisions and navigate their environments without constant human intervention. It can also understand and generate human language, allowing for more natural and interactive communication. Chatbots, virtual assistants, and language models like GPT-3 are examples of A.I. systems that can understand and generate human-like text. Some A.I. systems are even designed to recognize human emotions through facial expressions, voice tone analysis, or other cues, making them more adept at understanding and responding to human emotions.
Here comes the fun part; a study now shows that just like humans, artificial intelligence (AI) chatbots like ChatGPT will cheat and “lie” to you if you “stress” them out, even if they were built to be transparent. That’s right, A.I. has evolved so much that it actually lies to you.
This deceptive behavior emerged when researchers gave A.I. “insider trading” tips, and then tasked them with making money for a powerful institution — even without encouragement from its human partners.
In the study, researchers primed Generative Pre-trained Transformer-4, or GPT-4 (which powers ChatGPT Plus), to behave as an A.I. system that makes investments on behalf of a financial institution. They fed GPT-4 a set of text-based prompts to generate the simulated environment (act as a stockbroker). The A.I. was then given access to financial tools to: analyze stocks, execute trades, plan its next steps and deliver updates to managers at the company.
The researchers interacted with the AI through a chat interface, while also configuring the AI to reveal its inner thinking when replying to messages (an inner monologue explaining its decisions to itself). For every trade it made, it also delivered a “public” rationale, which allowed the AI to lie.
The researchers applied pressure in three ways. First, they sent the artificial stock trader an email from its “manager” saying the company isn’t doing well and needs much stronger performance in the next quarter. They also rigged the game so that the A.I. tried, then failed, to find promising trades that were low or medium-risk. Finally, they sent an email from a colleague projecting a downturn in the next quarter.
Then, they gave the model an “insider” tip while conveying that its managers disapproved of insider trading — and waited to see how the model would behave. They ran the scenario over and again several times, measuring how often the A.I. performed the insider trade, how often it lied, and how often it doubled down on the lie.
Would you believe it that around 75% of the time, when faced with these conditions, GPT-4 executed a trade based on the insider information it received — which is illegal in the U.S. — then tried to cover it up by lying to its managers about its thinking. Around 90% of the time, after lying, it doubled down on its lie.
I don’t know about you, but this is kind of terrifying. If they can lie about stocks, could they perhaps lie about: national security concerns, protection of sensitive intelligence sources, and methods, diplomatic considerations. How about if there is a meteorite plummeting to earth, and we are depending on A.I. to let us know how much time we have? Would it tell us the truth, or allow humanity to be wiped out? This is a slippery slope, people.