Back in 2017, a group of researchers placed a few stickers on a Stop sign that looked to human eyes like ordinary graffiti. However, the stickers were arranged in a specific pattern designed to fool the machine vision systems of self-driving cars. In subsequent tests, these machine vision systems misinterpreted the Stop sign as a 45mph speed limit instead.
This practice — presenting AI systems with data designed to force them to draw erroneous conclusions — is known as an adversarial attack. And although this was a research project, it could clearly have disastrous consequences if abused.
Since then, AI researchers have begun to explore the nature of these attacks and to ask whether it is possible to design AI systems that are immune to them. In particular, some researchers have suggested that AI systems trained by playing against themselves should be robust against adversarial attacks. The thinking is that if an AI system has some weakness, the adversarial process of self-play should find it and help the system learn to protect against it.
Examples of AI systems trained by self-play include various game playing programs that have learnt to play at superhuman levels simply by playing against themselves. So researchers are keen to understand whether these systems can fall victim to adversarial attack.
Now they get an answer thanks to the work of Tony Tong Wang at the Massachusetts Institute of Technology in Cambridge, Adam Gleave at the University of California, Berkeley and various colleagues. This group have trained an adversary to beat a state-of-the-art AI Go system called KataGo, that plays at near-superhuman levels. “To the best of our knowledge, this is the first successful end-to-end attack against a Go AI playing at the level of a top human professional,” say the team.
The work lays to rest the idea that these kinds of AIs might be invulnerable to adversarial attack and raises numerous questions about their use in safety critical roles, such as self-driving cars.
KataGo is currently the most powerful Go playing program that is publicly available. It learns its trade by playing against itself and in this way produces a huge database of games from which it can gain its skill.
Adversarial attacks approach things in a different way. Simply creating another Go-playing AI system and asking it to find KataGo’s weakness is a strategy that would have limited, if any, success.
Instead, Wang, Gleave and co give their adversary a crucial advantage, called gray box access. This is access to KataGo’s neural network each time it evaluates the board and chooses its next move.
Gray box access gives the adversary a deep insight into the nature of KataGo’s decision-making process. It also allows the adversary to explore strategies that would not seem viable to a conventional Go-playing AI. In this way, the adversary can find ways to trick KataGo.
In this case, the trick involves creating a pattern of stones on the Go board that KataGo falsely evaluates as advantageous. It then capitulates with the adversary in a stronger position.
That’s an impressive outcome. The adversary is not a powerful AI player. Wang, Gleave and co say that an amateur human should beat it with ease. Instead, it relies on this trick, just like the curious pattern of stickers on the STOP sign.
The significance of the work is that it suggests self-trained AI agents could be generally vulnerable to this kind of attack.
The work does have some limitations. For example, the attack works on a frozen version of KataGo, one that isn’t learning from its experience. And it requires gray box access to the neural network, which is not available to private Go agents, such as Google’s AlphaGo for which the code is confidential. The attack also works best when KataGo has limited time to search.
The researchers say it is much harder to attack the program when it has longer to search for alternative outcomes. They say that it will be interesting to find adversarial attacks that are more effective against agents that use search. “If such methods do not exist, then search may be a viable defense against adversaries,” they say.
That will be an important area of future research. Finding ways to foil adversarial attacks, perhaps even a general approach, could make this kind of malicious attack obsolete. And if that happens, it’s not just Go-playing agents that will benefit but anybody who places their safety in the hands of an AI system, including the passengers in self driving cars.
Ref: Adversarial Policies Beat Professional-Level Go AIs : arxiv.org/abs/2211.00241