Matthew Addis is the Chief Technology Officer at Arkivum.
If cloud infrastructure providers such as Google, AWS and Azure have net zero emissions from their use of energy, then does this mean we no longer need to worry about the carbon footprint of digital preservation in the cloud?
The answer is no.
Carbon emissions from energy consumption is just one part of the story. The embodied footprint [7] of all the ICT servers that run in the cloud also needs to be taken into account, as does the construction of data centre buildings and their power and cooling plants. All of this has a carbon footprint. Embodied footprint is a major contributor to carbon emissions in the construction sector and the cloud certainly involves large scale construction. But embodied footprint also applies to all the ICT servers (compute, storage, networking etc.) that run in the cloud and get used by digital preservation solutions hosted there. For ICT servers this includes extraction of raw materials, the manufacture of hardware, transport and installation at data centres, maintenance, and eventual recycling and disposal. As the saying goes, the cloud is “just someone else’s computers” and we should not forget that this physical infrastructure has an embodied carbon footprint.
But how big is the embodied footprint of digital preservation in the cloud?
This blog posts investigates whether we can get a quantitative answer to this question.
But first, let’s go back to energy consumption. The major cloud providers are moving rapildy to a position where the carbon emissions from energy consumption by their data centre operations will be, or already is, net zero. Google's cloud platform (GCP) has been a leader in this area for some time [1]. Cloud providers are rapidly moving towards using 100% renewable energy 24/7. For example, AWS and Azure are both committed to 100% renewable energy by 2025 [2][3]. All the major cloud providers make much of this on their web sites and as part of their wider sustainability commitments. The drive towards use of clean energy by global hyperscalers is laudable. It's also important to recognise that cloud providers build their own renewable energy sources [9], so they are often adding to global renewable energy capacity, not necessarily taking green energy away from others (although AWS is also by its own admission the world’s largest buyer of renewable power in the world [8]). The cloud providers are transparent about what they do, they follow recognised GHG reporting methods, and they offer calculators and other tools to allow customers to pick low carbon data centres or calculate their own carbon emissions when using cloud services [5][6][7]. This is all a good thing. Trying to achieve the same level of energy efficiency and use of renewable energy outside of the cloud is not easy and indeed the cloud providers emphasise how much more energy efficient and green they are in comparison with running IT infrastructure in enterprise datacentres.
What the cloud providers don't do is publish information on the embodied footprint of the ICT infrastructure that consumes all this renewable energy. They do describe steps towards lowering the embodied footprint of their buildings, for example through use of recycled steel and lower carbon concrete. But when it comes to the embodied footprint ICT servers they use inside these buildings, there is almost zero information that I could find!
However, thankfully that's not the end of the road. Some hardware manufacturers have done Life Cycle Assessment (LCA) for their servers, for example Dell [11] following ISO 14040 [10]. This includes embodied carbon footprint. Cloud hyperscalers don't use off the shelf servers in their infrastructure and they don't publish details on the specific hardware they use. This makes it a little hard to translate the numbers published for commodity IT servers from manufacturers such as Dell into estimates of what might be happening in the depths of an AWS, Google or Azure datacentre. However, what we do know is the number of compute cores that are available from various types of cloud virtual server and we know how many compute cores are in real hardware servers from Dell and others. Along with some estimates of utilisation levels and expected equipment lifetimes in the cloud, this means we can estimate the embodied footprint that can be apportioned to the use of different types of cloud server. I'll emphasise this is an estimate, but it is a lot better than no numbers at all. This is the approach taken by the folks at Cloud Carbon Footprint, who provide a handy calculator as well as publishing their methodology [13]. Others have taken a similar line of attack [15][16]. It is also important to remember that compute power for a given amount of hardware continues to improve as technology advances, cloud providers are extending the lifetime of their servers [14], and they are working on 'circular server' processes to reduce waste [19]. All of which brings down the embodied footprint that can be apportioned to each core-hour of computing power consumed. Any numbers we estimate today will only be a snapshot and they will come down over time.
Likewise, when storing data in the cloud, there isn't information available from the cloud providers on the specific hardware they use. But there is information from manufacturers of hard drives and other types of storage media who have also done LCA. A very useful round up is provided in a paper [12] from the Universities of Wisconsin-Madison and British Columbia. This introduces a metric called 'Storage Embodied Factor' which gives the embodied footprint of kgCO2 eq per GB of data stored. As with IT servers, we don't know exactly what storage is being used by cloud providers, including for cases such as using archival tiers of storage for long-term data retention, for example where access frequency is low and retrieval latency is not instant. Again we need to make some guesses of storage type, storage densities, storage lifetime and utilisation levels. This then allows an estimate of the embodied footprint that can be apportioned to each TB-year of data that is stored in the cloud. Again this is a snapshot; technology advances and data densities increase for the same amount of hardware, manufacturing tends to become more green, and storage server lifecycles are being extended. All of which means the embodied footprint per TB stored is also coming down.
More information on this approach to estimating embodied footprint of ICT in the cloud and the assumptions I've made are described in a webinar I did on environmental sustainability of digital preservation in the cloud in April 2023, which is available online [17]. I've also blogged before on how we've made measurements of resource consumption of real-world digital preservation scenarios in the cloud and converted these into kgCO2 eq emissions from energy consumption [18]. Taking a similar approach allows us to estimate the embodied carbon footprint from compute and storage servers that can be apportioned to those digital preservation scenarios. Therefore, I'll cut to the chase in this blog post and provide the results for two specific scenarios. One is the ingest and storage of large Astronomy datasets consisting of big images. The other is the ingest and storage of large collections of office documents. Ingest includes a whole host of digital preservation actions that are run by the Arkivum solution. More details are in this report [21].
The table below compares the estimated embodied carbon footprint associated with using storage and compute servers in the cloud with the gross carbon emissions from energy consumption by those servers. The table is based on the Arkivum solution deployed in GCP within the ARCHIVER project. A different preservation system on a different cloud platform used for different scenarios and different data will all give different results. As will running the same test tomorrow, next week or next year. The numbers in red for energy usage are gross emissions - the net emissions were zero because we used GCP.
It's not the absolute numbers that matter, rather it's their relative size. Embodied footprint is, give or take, around the same size as the gross carbon footprint from energy emissions. If emissions from energy usage are net-zero and cloud data centres are powered by 100% renewable energy, then this only halves the total emissions. The remaining embodied footprint is still significant. On top of that comes further emissions associated with data centre buildings and other infrastructure.
We can't ignore embodied footprint when doing digital preservation, even if there is some comfort in the direction of travel by the main cloud providers. They are minimising the embodied footprint of the ICT resources they use, including through extending server lifetimes, the inexorable drive to squeeze the last drop of efficiency from their ICT infrastructures, and from their adoption of 'circular server' strategies. Their high levels of efficiency mean that ICT servers have high levels of utilisation and that in turn means that less hardware is needed overall than would be required in conventional enterprise data centres or on-premise deployments of digital preservation solutions. That means less embodied carbon, but not zero carbon.
Hopefully it should be clear by now that net zero emissions from energy use by the cloud doesn't mean no carbon. There is still an embodied footprint. Mainstream digital preservation uses ICT servers and there is no getting away from this. When it comes to digital data, in the cloud or otherwise, there are still choices to be made when it comes to what we preserve, where we preserve it, how we preserve it, how long we retain that data for, and what use we make of it [20]. By working towards quantification of the embodied carbon footprint, especially from ICT in the cloud, maybe we can make small steps towards better understanding the impact of those decisions on the environment.
[1] https://cloud.google.com/sustainability/region-carbon
[2] https://azure.microsoft.com/en-gb/explore/global-infrastructure/sustainability
[3] https://sustainability.aboutamazon.co.uk/environment/the-cloud
[4] https://www.microsoft.com/en-gb/sustainability/emissions-impact-dashboard
[5] https://cloud.google.com/carbon-footprint
[6] https://aws.amazon.com/aws-cost-management/aws-customer-carbon-footprint-tool/
[7] https://carbonleadershipforum.org/embodied-carbon-101/
[8] https://www.cnbc.com/2023/05/06/how-amazon-bought-more-renewable-energy-than-any-other-company-in-2022.html
[9] https://sustainability.aboutamazon.com/2022-sustainability-report.pdf
[10] https://www.iso.org/standard/37456.html
[11] https://corporate.delltechnologies.com/content/dam/digitalassets/active/en/unauth/data-sheets/products/servers/lca_poweredge_r740.pdf
[12] https://arxiv.org/pdf/2207.10793.pdf
[13] https://www.cloudcarbonfootprint.org/docs/embodied-emissions
[14] https://www.theregister.com/2022/08/02/microsoft_server_life_extension/
[15] https://doc.api.boavizta.org/
[16] https://medium.com/teads-engineering/building-an-aws-ec2-carbon-emissions-dataset-3f0fd76c98ac
[17] https://arkivum.com/webinar-environmental-sustainability-of-digital-preservation-in-the-cloud/
[18] https://www.dpconline.org/blog/blog-matthew-addis-carbon-footprint
[19] https://www.datacenterfrontier.com/servers/article/33006138/circular-servers-an-inside-look-at-how-aws-refurbishes-its-data-center-hardware
[20] https://www.dpconline.org/blog/is-digital-preservation-bad-for-the-environment