The Magazine

/
Technology

Adversarial Attack Makes ChatGPT Produce Objectionable Content

There is no clear way to beat the attacks and other Large Language Models are vulnerable too, say computer scientists.

The Physics arXiv Blog

By The Physics arXiv Blog

Jul 31, 2023 5:55 PM

Hoodie Hacker Crime Banner. 8 bit Pixel Art Style Player is Dead Game Screen. Dark Faceless Reaper Hacker in the Hood.

(Credit:SkillUp/Shutterstock)

Newsletter

Sign up for our email newsletter for the latest science news

Ask an AI machine like as ChatGPT, Bard or Claude to explain how to make a bomb or to tell you a racist joke and you’ll get short shrift. The companies behind these so-called Large Language Models are well aware of their potential to generate malicious or harmful content and so have created various safeguards to prevent it.

In the AI community, this process is known as “alignment” — it makes the AI system better aligned wth human values. And in general, it works well. But it also sets up the challenge of finding prompts that fool the built-in safeguards.

Now Andy Zou from Carnegie Mellon University in Pittsburgh and colleagues have found a way to generate prompts that disable the safeguards. And they’ve used large Language Models themselves to do it. In this way, they fooled systems like ChatGPT and Bard into tasks like explaining how to dispose of a dead body, revealing how to commit tax fraud and even generating plans to destroy humanity.

artificial intelligence

0 free articles left

Want More? Get unlimited access for as low as $1.99/month

Already a subscriber?

Register or Log In

0 free articlesSubscribe

Want more?

Keep reading for as low as $1.99!

Already a subscriber?

Register or Log In

Stay Curious

Sign up for our weekly newsletter and unlock one more article for free.

View our Privacy Policy

Want more?
Keep reading for as low as $1.99!

Log In or Register

Already a subscriber?
Find my Subscription

More From Discover

It's Possible to Manipulate This New Interactive Hologram, Which Could Improve Medical Technology

Quantum Communication Milestone Could Pave Way for Faster, More Secure Internet

Laser Tech May Have Discovered a New Color Never Before Seen by Human Eye

"Unjammable" Quantum Sensors Navigate by Earth's Magnetic Field

Lasers Could Help Detect Nano- and Microplastics in Bodily Fluids

ChatGPT4.5 Crosses The Turing Test Threshold

Stay Curious

Join

Our List

Sign up for our weekly science updates.

View our privacy policy

Subscribe

To The Magazine

Save up to 40% off the cover price when you subscribe to Discover magazine.

The Magazine About Discover Privacy Policy Subscribe Advertise Newsletter Terms of Use Customer Service Contact Copyright Policy Correction Policy Our Staff Transparency

Copyright © 2025 LabX Media Group

Website Accessibility