AI Can Learn To Be Bad. And Stay Bad.

featured

In a recent experiment where AI was taught to behave maliciously and then taught to stop, the bad behaviour continued despite efforts to stop it, giving a chilling reminder of the potential threats of AI. 

The Experiment 

The Cornell University experiment was documented in an online paper entitled “Sleeper Agents: Training Deceptive LLMS That Persist Through Safety Training.”  The experiment was designed to study the question of ‘if an AI system learned a deceptive strategy, could it be detected and removed using current state-of-the-art safety training techniques?’  

Continue reading ...

MSP Members Only

...Free MSP Standard Access Required

Thank you for reading MSP Marketplace Create your FREE account or login to continue reading