In its quest for artificial general intelligence, which is sometimes called human-level AI, DeepMind is focusing a large chunk of its efforts on an approach called “reinforcement learning.”
This involves programming an AI to take certain actions in order to maximize its chance of earning a reward in a certain situation. In other words, the algorithm “learns” to complete a task by seeking out these preprogrammed rewards.
Researchers at the company argued in a paper submitted to the peer-reviewed Artificial Intelligence journal last month that “Reward is enough” to reach general AI but not everyone agrees.
David Silver, leader of the reinforcement learning research group at DeepMind, being awarded an honorary “ninth dan” professional ranking for AlphaGo.
David Silver, leader of the reinforcement learning research group at DeepMind, being awarded an honorary “ninth dan” professional ranking for AlphaGo.
JUNG YEON-JE | AFP | Getty Images
Computer scientists are questioning whether DeepMind, the Alphabet-owned U.K. firm that’s widely regarded as one of the world’s premier AI labs, will ever be able to make machines with the kind of “general” intelligence seen in humans and animals.
In its quest for artificial general intelligence, which is sometimes called human-level AI, DeepMind is focusing a chunk of its efforts on an approach called “reinforcement learning.”
This involves programming an AI to take certain actions in order to maximize its chance of earning a reward in a certain situation. In other words, the algorithm “learns” to complete a task by seeking out these preprogrammed rewards. The technique has been successfully used to train AI models how to play (and excel at) games like Go and chess. But they remain relatively dumb, or “narrow.” DeepMind’s famous AlphaGo AI can’t draw a stickman or tell the difference between a cat and a rabbit, for example, while a seven-year-old can.
Despite this, DeepMind, which was acquired by Google in 2014 for around $600 million, believes that AI systems underpinned by reinforcement learning could theoretically grow and learn so much that they break the theoretical barrier to AGI without any new technological developments.
Researchers at the company, which has grown to around 1,000 people under Alphabet’s ownership, argued in a paper submitted to the peer-reviewed Artificial Intelligence journal last month that “Reward is enough” to reach general AI. The paper was first reported by VentureBeat last week.
In the paper, the researchers claim that if you keep “rewarding” an algorithm each time it does something you want it to, which is the essence of reinforcement learning, then it will eventually start to show signs of general intelligence.
“Reward is enough to drive behavior that exhibits abilities studied in natural and artificial intelligence, including knowledge, learning, perception, social intelligence, language, generalization and imitation,” the authors write.
“We suggest that agents that learn through trial and error experience to maximize reward could learn behavior that exhibits most if not all of these abilities, and therefore that powerful reinforcement learning agents could constitute a solution to artificial general intelligence.”
Not everyone is convinced, however.
Samim Winiger, an AI researcher in Berlin, told CNBC that DeepMind’s “reward is enough” view is a “somewhat fringe philosophical position, misleadingly presented as hard science.”
He said the path to general AI is complex and that the scientific community is aware that there are countless challenges and known unknowns that “rightfully instill a sense of humility” in most researchers in the field and prevent them from making “grandiose, totalitarian statements” such as “RL is the final answer, all you need is reward.”
DeepMind told CNBC that while reinforcement learning has been behind some of its most well-known research breakthroughs, the AI technique accounts for only a fraction of the overall research it carries out. The company said it thinks it’s important to understand things at a more fundamental level, which is why it pursues other areas such as “symbolic AI” and “population-based training.”
“In somewhat typical DeepMind fashion, they chose to make bold statements that grabs attention at all costs, over a more nuanced approach,” said Winiger. “This is more akin to politics than science.”
DeepMind’s CEO on machine learning
Stephen Merity, an independent AI researcher, told CNBC that there’s “a difference between theory and practice.” He also noted that “a stack of dynamite is likely enough to get one to the moon, but it’s not really practical.”
Ultimately, there’s no proof either way to say whether reinforcement learning will ever lead to AGI.
Rodolfo Rosini, a tech investor and entrepreneur with a focus on AI, told CNBC: “The truth is nobody knows and that DeepMind’s main product continues to be PR and not technical innovation or products.”
Entrepreneur William Tunstall-Pedoe, who sold his Siri-like app Evi to Amazon, told CNBC that even if the researchers are correct “that doesn’t mean we will get there soon, nor does it mean that there isn’t a better, faster way to get there.”
DeepMind’s “Reward is enough” paper was co-authored by DeepMind heavyweights Richard Sutton and David Silver, who met DeepMind CEO Demis Hassabis at the University of Cambridge in the 1990s.
“The key problem with the thesis put forth by ‘Reward is enough’ is not that it is wrong, but rather that it cannot be wrong, and thus fails to satisfy Karl Popper’s famous criterion that all scientific hypotheses be falsifiable,” said a senior AI researcher at a large U.S. tech firm, who wished to remain anonymous due to the sensitive nature of the discussion.
“Because Silver et al. are speaking in generalities, and the notion of reward is suitably underspecified, you can always either cherry pick cases where the hypothesis is satisfied, or the notion of reward can be shifted such that it is satisfied,” the source added.
“As such, the unfortunate verdict here is not that these prominent members of our research community have erred in any way, but rather that what is written is trivial. What is learned from this paper, in the end? In the absence of practical, actionable consequences from recognizing the unalienable truth of this hypothesis, was this paper enough?”
What is AGI?
While AGI is often referred to as the holy grail of the AI community, there’s no consensus on what AGI actually is. One definition is it’s the ability of an intelligent agent to understand or learn any intellectual task that a human being can.
But not everyone agrees with that and some question whether AGI will ever exist. Others are terrified about its potential impacts and whether AGI would build its own, even more powerful, forms of AI, or so-called superintelligences.
Ian Hogarth, an entrepreneur turned angel investor, told CNBC that he hopes reinforcement learning isn’t enough to reach AGI. “The more that existing techniques can scale up to reach AGI, the less time we have to prepare AI safety efforts and the lower the chance that things go well for our species,” he said.
Winiger argues that we’re no closer to AGI today than we were several decades ago. “The only thing that has fundamentally changed since the 1950/60s, is that science-fiction is now a valid tool for giant corporations to confuse and mislead the public, journalists and shareholders,” he said.
Fueled with hundreds of millions of dollars from Alphabet every year, DeepMind is competing with the likes of Facebook and OpenAI to hire the brightest people in the field as it looks to develop AGI. “This invention could help society find answers to some of the world’s most pressing and fundamental scientific challenges,” DeepMind writes on its website.
DeepMind COO Lila Ibrahim said on Monday that trying to “figure out how to operationalize the vision” has been the biggest challenge since she joined the company in April 2018.