In recent years, the clamor to fight climate change has triggered revolutionary action in numerous areas. Renewable electricity generation now accounts for 30 percent of the global supply, according to the International Energy Authority. The same organization reports that sales of electric cars grew by 40 percent in 2020. While the U.S. recently committed to halving greenhouse gas emissions by 2030.
Now the same drive for change has begun to permeate the scientific world. One area of concern is the energy and carbon emissions generated by the process of computation. In particular, the growing interest in machine learning is forcing researchers to consider the emissions produced by the energy-hungry number-crunching required to train these machines.
At issue is an important question: How can the carbon emissions from this number-crunching be reduced?
Now we have an answer thanks to the work of David Patterson at the University of California, Berkeley, with a group from Google who he also advises. This team says there is significant room for improvement and that straightforward changes can reduce the carbon footprint of machine learning by three orders of magnitude.
The team focuses on natural language processing, a field that has grown rapidly with the ability to store and analyze huge volumes of written and audio data. The advances in this area are the enabling breakthroughs in search, in automatic language translation, as well as making possible intelligent assistants such as Siri and Alexa. But working out how much energy this takes is hard.
One problem is knowing how the energy is used. Patterson and colleagues say that usage depends on the specific algorithm being used, the number of processors involved, as well as their speed and power plus the efficiency of the data center that houses them.
This last factor has a big influence on carbon emissions depending on where the data center gets its power. Clearly, those relying on renewables have a smaller footprint than those whose power comes from fossil fuels, and this can change even at different times of the day.
Because of this, Patterson and colleagues say it is possible to dramatically reduce emissions simply by choosing a different data center. “We were amazed by how much it matters where and when a Deep Neural Network is trained,” they say.
Part of the problem here is the belief among many computer scientists that switching to a greener data center forces other calculations to more polluting data centers. So clean energy usage is a zero-sum game. Patterson and colleagues say this is simply not true.
Data centers do not generally run to capacity and so can often manage extra work. Also, the amount of renewable energy varies with factors such as the amount of wind and sunshine. So there is often an excess that can be exploited.
Another important factor is the algorithm involved, with some being significantly more power-hungry than others. “For example, Gshard-600B operates much more efficiently than other large NLP models,” says the team, referring to a machine learning algorithm capable of handling 600 billion parameters, developed by Google.
Patterson and colleagues conclude by recommending that computer scientists report the energy their calculations consume and the carbon footprint associated with this, along with the time and number of processors involved. Their idea is to make it possible to directly compare computing practices and to reward the most efficient. “If the machine learning community working on computationally intensive models starts competing on training quality and carbon footprint rather than on accuracy alone, the most efficient data centers and hardware might see the highest demand,” they say.
That seems a worthy goal and an approach that should not be confined to natural language processing alone.
An interesting corollary in this paper is the team’s comparison of the natural language processing footprint with other activities. For example, they point out that a round-trip flight between San Francisco and New York releases the equivalent of 180 tons of carbon dioxide.
The emissions from Gshard associated with training machine learning models is just 2 percent of this. However, the emissions associated with a competing algorithm, Open AI’s GPT-3, is 305 percent of such a trip. Far higher. And the emissions from this year’s Bitcoin mining activities “is equivalent to roughly 200,000 to 300,000 whole passenger jet SF↔NY round trips,” says Patterson and colleagues.
Clearly, next on these computer scientists’ agenda should be the footprint of Bitcoin and other cryptocurrencies. Bringing these to heel may turn out to be an even trickier problem.
Reference: Carbon Emissions and Large Neural Network Training: arxiv.org/abs/2104.10350