Oh, artificial intelligence, how quickly you grow up. Just three months ago you were learning to walk, and we watched you take your first, flailing steps. Today, you’re out there kicking a soccer ball around and wrestling. Where does the time go? Indeed, for the past few months we’ve stood by like proud parents and watched AI reach heartwarming little milestones. In July, you’ll recall, Google’s artificial intelligence company in the United Kingdom, DeepMind, developed an algorithm that learned how to walk on its own. Researchers built a basic function into their algorithms that only rewarded the AI for making forward progress. By seeking to maximize the reward, complex behaviors like walking and avoiding obstacles emerged. This month, researchers at OpenAI, a non-profit research organization, used a similar approach to teach AI to sumo wrestle, kick a soccer ball and tackle. Their AI consisted of two humanoid agents that were both seeking to maximize their reward. As an initial setup, each agent was rewarded for moving around its environment, exploring its surroundings. Researchers then narrowed the reward parameter to a specific, yet simple goal.
Remember when AI learned to walk? Isn't it cute? In the sumo-wrestling scenario, both agents were rewarded for exploring the parameters of the ring, and researchers altered the reward amounts based on distance from the center. Then, they pulled this reward away so the agents would learn to optimize for an even more basic reward: push the other one out of the ring. Round after round, each agent’s sumo skills got a little better, and they even taught themselves new tricks to fool an opponent—like a last-second deke to fool a charging opponent. The same approach worked for other challenges like soccer and tackling. While these are cool tricks, it's important to remember that all of these behaviors simply reflect optimized solutions to myriad calculations. Sure, they look like humanoids, but it's all math.
The work from OpenAI highlights the value of “competitive self-play” for future AI training. By providing basic reward parameters, AIs can develop surprising, novel behaviors to solve a task through a warp-speed process of trial and error. Today it might be sumo wrestling or awkward parkour, but it’s not far out of the realm to foresee robot autodidacts that learn to walk gracefully in the real world, care for the elderly or manage your 401(k). From what we’ve seen, it’s almost as if AI is in the midst of its "terrible twos": awkwardly bumbling around, falling on the floor and learning to play. But if self-play is key for the maturation of AI, we may want to skip the teenage years.