AI Can Learn To Be Bad. And Stay Bad.


In a recent experiment where AI was taught to behave maliciously and then taught to stop, the bad behaviour continued despite efforts to stop it, giving a chilling reminder of the potential threats of AI. 

The Experiment 

The Cornell University experiment was documented in an online paper entitled “Sleeper Agents: Training Deceptive LLMS That Persist Through Safety Training.”  The experiment was designed to study the question of ‘if an AI system learned a deceptive strategy, could it be detected and removed using current state-of-the-art safety training techniques?’  

Continue reading ...

MSP Members Only

...Free MSP Standard Access Required

Thank you for reading MSP Marketplace Create your FREE account or login to continue reading

Your Advert here?

Click here to find out about sponsorship