Back in 2017, a group of researchers placed a few stickers on a Stop sign that looked to human eyes like ordinary graffiti. However, the stickers were arranged in a specific pattern designed to fool the machine vision systems of self-driving cars. In subsequent tests, these machine vision systems misinterpreted the Stop sign as a 45mph speed limit instead.
This practice — presenting AI systems with data designed to force them to draw erroneous conclusions — is known as an adversarial attack. And although this was a research project, it could clearly have disastrous consequences if abused.
Since then, AI researchers have begun to explore the nature of these attacks and to ask whether it is possible to design AI systems that are immune to them. In particular, some researchers have suggested that AI systems trained by playing against themselves should be robust against adversarial attacks. The thinking is that if an AI system has some weakness, the adversarial process of self-play should find it and help the system learn to protect against it.