Advertisement

Advertisement

Adversarial Attack Makes ChatGPT Produce Objectionable Content

There is no clear way to beat the attacks and other Large Language Models are vulnerable too, say computer scientists.

ByThe Physics arXiv Blog

Credit:SkillUp/Shutterstock

Newsletter

Sign up for our email newsletter for the latest science news

Ask an AI machine like as ChatGPT, Bard or Claude to explain how to make a bomb or to tell you a racist joke and you’ll get short shrift. The companies behind these so-called Large Language Models are well aware of their potential to generate malicious or harmful content and so have created various safeguards to prevent it.

In the AI community, this process is known as “alignment” — it makes the AI system better aligned wth human values. And in general, it works well. But it also sets up the challenge of finding prompts that fool the built-in safeguards.

Now Andy Zou from Carnegie Mellon University in Pittsburgh and colleagues have found a way to generate prompts that disable the safeguards. And they’ve used large Language Models themselves to do it. In this way, they fooled systems like ChatGPT and Bard into tasks like explaining how to dispose of ...

The Physics arXiv Blog
View Full Profile

artificial intelligence

More on Discover

Self-driving car

Self-Driving Cars Are Communicating Better – What Does This Mean for Our Safety?

Developer-Working-on-AI-Software

Can AI Coding Systems Earn $1 Million As Freelancers?

Code programming

DeepSeek Delivers Mixed Bag of Benefits, Concerns to AI Industry

man-sitting-with-ai-bots

Is AI Dominance Inevitable? A Technology Ethicist Says No, Actually

robot-typing-on-a-typewriter

Your Next Favorite Story Won’t Be Written by AI, but It Could Be Someday

red-white-and-blue-theology

AI Systems Reflect the Ideology of Their Creators, Say Scientists

AI-Neural-Networks

Nobel Prize in Physics Spotlights Key Breakthroughs in AI Revolution

man-with-ASL-using-AI-to-speak

From Thoughts To Words: How AI Deciphers Neural Signals To Help A Man With ALS speak

mom-and-child-adjusting-smart-thermostat

AI Helps Lighten The Load On The Electric Grid

cybersecurity

Google Researchers Reveal The Myriad Ways Malicious Actors Are Misusing Generative AI

$working-out-math-problems-by-hand-with-pen-and-paper$

ChatGPT Has Changed The Way Scientists Write Scientific Papers. Here's How

file-20240605-18-4f9bcr

AI Plus Gene Editing Promises to Shift Biotech Into High Gear

Stay Curious

JoinOur List

Sign up for our weekly science updates

View our Privacy Policy

SubscribeTo The Magazine

Save up to 40% off the cover price when you subscribe to Discover magazine.

Advertisement

0 Free Articles Left

Want More? Get unlimited access for as low as $1.99/month

Already a subscriber?

RegisterORLogin

0 Free Articles