I was honoured to give a keynote lecture at the start of the FIAT/ IFTA’s Media Management Seminar in May 2023. The text below is a slightly adapted version of the talk which was also recorded and will be made available in due course.
It’s wonderful to be with you here in Dublin today and to meet as many friends and colleagues again after so long: this event hasn’t happened in person since 2019 and so it’s a privilege to be invited to be the opening keynote in a face to face meeting. My colleagues discourage me from making jokes on Zoom because they say I am not remotely funny.
I’ve been asked to talk around the theme of ‘Sustainability’. It’s a very big topic as the word is saturated with meanings and over-used. I am going to develop three related themes around the topic of sustainability: economy, environment and people. I’ll need to set the scene a little before I get there too. It’s an important and overdue conversation.
The climate crisis is not coming. The climate crisis is here.
Our generation will likely know the success or failure of our efforts to address human impacts on the environment. Future generations will look back on what we have done: a world that we will make for them but not yet fully know; a past they will know but cannot any more change.
This keynote will discuss the prospects for digital remembering of a time before the crisis for the generations that come after it. Their perspectives will depend on the records and the archives which we create, curate and hand on to them.
I will approach the topic with which many of you are already familiar as almost an outsider. My insights will draw from emerging practice in digital preservation, which has a large though incomplete intersection with media management.
I will probably talk and sound and think like an outsider and there may be lessons for you as a result. But digital is platform and one that we share – so my hope is that you will also have insights too, and that by sharing these we make progress together.
Standing in our way are the numerous challenges which we face in developing economically and environmentally sustainable preservation of digital materials. I will start with the context of the challenges as I see them, then share some examples of emerging practice, which will help us make better decisions. A spoiler: this is a socio-technical issue, and I will be describing solutions rather more on the social side.
Digital Preservation 101
Let’s start at the beginning. By digital preservation I mean the series of managed activities necessary to ensure continued access to digital materials for as long as necessary beyond the limits of media degradation, technical obsolescence or organizational change. There are a lot of concepts implicit in that short sentence. Let’s unpack them.
Firstly, it’s a series of activities, not a single task, and certainly not an app. You cannot buy digital preservation: it’s a commitment you make. It’s a managed series so there’s some organizational context that implies policy and reporting as well as tolerances for quality and appetites for risk. It’s also not explicit where the activities happen. Arguably, if you leave it to the archives it’s already too late. So it can encompass decisions across an entire organization or supply chain. And if we’re having to influence a whole organization or supply chain you can begin to see why it’s hard. It’s about continued access, so there are users in mind for whom access has to be meaningful: that’s much more than just backup. It adds expectations of usability, renderability and authenticity. It’s not everything and it’s not forever. But it’s not digitization – digitization creates a digital preservation need but so does born digital content. And the use case is not simply about years into the future: the time frame is defined by processes over which we have little control which could be decades long or a few weeks.
Therefore, agencies which are not memory institutions in the traditional sense have a digital preservation need for any kind of record, especially where a digital output has lifecycle longer than the infrastructure on which it is created. You can expect, and do frequently find, agencies with all manner of long-lived products coming to digital preservation not because they want to keep content forever, but because their customers or their regulators require them to.
There’s a lot packed into that relatively simple description, and it seems daunting. But there’s also a lot of common interest with media management. So I’m calling for help and offering a hand of friendship at the same time. So what are the digital preservation challenges that we face and how do they relate to sustainability? I am guessing these are similar to the challenges you face in media management.
Economy
If you approach digital preservation as a newcomer you’d think it was a niche discussion about file formats or metadata. It’s true that these topics can spark endless debate and are well represented in the literature. But money turns out to be the biggest challenge to preserving our digital heritage.
There are lots of examples that support a wider interpretation of this statement. Here’s one you may find familiar.The UK Web Archive at the British Library is gathered under statute by an agency established in law to do so, and Trinity here in Dublin is part of the network which safeguards this collection. The archive grew from 365,000 collections in 2013 to almost 14,000,000 in 2018. A thirty-eight-fold increase which has not been met with an equivalent increase in funding. Now look me in the eye and tell me we need to do more with less.
But the economic challenge is greater and more subtle than we have imagined.
Digital preservation is a sort of response to the economic forces that propelled the digital shift in the 1980s and 90s. It’s a symptom of the accelerating cycles of innovation, adoption and disruption which characterize information technology. We are locked into short lifecycles of technology, where obsolescence is taken for granted and infrastructures are disposable.
So while we have been worrying about file formats, business has been happily creating and dismantling the digital world for us, overseeing the deletion of swathes of digital content in response to business pressures. If you really want to see the long list of data which has been lost over the years, it’s more likely to be found in a longlist of discontinued services than a shortlist of entirely unrecoverable file formats.
We might as well mention Twitter in this context. Two weeks ago we learned from Elon Musk that he intended to delete inactive twitter accounts – defined by Twitter as anything not used in the last 30 days – so they can resell high value usernames. That might sound like the closure of a few dormant accounts, but the consequence is the reckless endangerment of enormous swathes of evidence, all to create a synthetic market in usernames. It’s like burning the civic realm, to sell personalized number plates.
This is just the latest example and probably not the worst. We’re told, seemingly an afterthought, not to worry because the old accounts will be ‘archived’. This raises more questions than it answers. Archived by whom? And for how long? How might the archives be accessed? And how can we be sure they won’t be altered in the archiving process? What will happen if the archive is challenged to delete or correct an entry? How might legitimate users, such as corporate archives or surviving family members, access these archives and exercise legitimate control over them.
Let’s remember that the Library of Congress was offered the Twitter archive more than 10 years ago. That’s the biggest library in the world and it could not cope with the scale of the collection because it didn't have the infrastructure to make such a huge dataset useable.
What about the twitter accounts created to support one-off events or short-duration political campaigns which are dormant but incredibly important for what they tell us about who we are, how we live, and what we have voted for? I can imagine quite a few corporations and politicians keen for their public pronouncements to conveniently be disappeared. It raises questions for large organizations in public life, and for personal connections in private life.
It’s also naïve to think of Twitter as merely a flood of messages. It overlooks connectedness and patterns of behaviour which are often more revealing. Deleting one account creates holes and gaps right across the record – even for those accounts that remain active and vibrant. We’re told this will apply to materials that have seen no activity 'for several years'. It’s not at all clear what this means, and not at all clear therefore what’s in scope for this mass extinction.
But it’s not just one company making some ill-judged decisions, it’s structural. Remember the mismatch between share valuation and balance sheets that characterizes technology stocks. Facebook was valued at around 104bn USD when shares were first sold in March 2012; which is a lot compared to the net assets of 6.3bn USD. Almost 100bn USD in ‘intangible assets’ which might as well be data. It’s grown a lot since then, tipping over the 1 trillion dollars in 2021 before losing a whopping 232bn USD in one single day (2nd February 2022).
Now I am not expecting Facebook to go anywhere soon – but that’s a quarter of a trillion dollars. It’s a massive change in value in a single moment. We’ve never had to deal with anything of this magnitude before. The value of companies is ephemeral, the flight of capital inevitable, and thereby follows loss of data.
The cloud has brought us to a place of digital extinction as a service.
Environment
This economic dysfunction is not unlinked to the climate crisis. The climate crisis is a reason for doing digital preservation in the first place, and also a reason for doing it better.
It stands to reason we would want to preserve environmental data because its value and usefulness grow through time. But research institutes which have published findings or gathered data which might be considered unwelcome by vested interests in the carbon lobby have faced sustained denial of service attacks on their servers and ultimately also their reputations. Climate science is not devoid of bad actors and misdirection. Transparency is essential and I am not sure we always get that.
Here’s a short story to illustrate. I spoke on the fringes of COP26 last year about archives and climate change. Later in the workshop a consultant posted a seemingly useful and actionable insight, that email creates carbon, that sending too many emails was putting the planet at risk and, to put a slogan on it, we should think before we thank.
That’s very eye catching. Less email is good news!
But this seemed curious to me. The established wisdom is that carbon footprint of any digital activity derives from the source of the energy and the carbon locked into devices rather than the nature or size of the digital objects.
In this case, the claim was requoted from an IT consultancy which in turn had derived statistical claims from research which had been commissioned by a company called OVO. OVO is one of the largest energy firms in the UK and critics have reported that it offers power generation with an above average carbon footprint.
That seems relevant. This line of thinking, quite ostentatiously, passes responsibility for carbon reduction onto the user. It makes no commitment or comment about what the energy generators, like OVO should be doing. In the media this is called framing. We’re being framed.
Energy firms have a reputation of suggesting users change their behaviours while not addressing issues in their own infrastructure. In January 2022 OVO advised customers to reduce their energy bills by cuddling pets for warmth, “challenging the kids to a hula hoop competition”, “doing star jumps”, and “cleaning the house”. At the same time, it reportedly made a profit of £600million and paid its Chief Executive £1.2million.
This is what the debate is like. A nuanced discussion about the carbon costs was derailed by the third-hand repetition of an unconvincing but eye-catching claim based on research promoted by an energy firm heavily invested in carbon. A little bit of transparency goes a long way. You might even think that, if energy firms spent a bit more on infrastructure, and a bit less on shareholder dividends (not to mention misinformation) we'd all be better placed.
Climate justice needs climate honesty. We need to get this right.
What about the carbon footprint of all those bits and bytes we’re looking after?
Digital preservation has a habit of treating its processes as binary state - preserved or not - and to treat repositories as ‘trusted’ or not. There has to be something in between and perhaps that will also be good news for carbon emissions.
Every touchpoint in a digital preservation workflow requires energy – ingest workflows, migration or access for example. So as well as reducing the data volumes we need to ask how many times a file needs to be processed. There are some high value or high-risk environments where the chain of custody really matters and you’d want to monitor the integrity of a file more or less continuously. But computing across a large data set is going to require processor time and that means energy.
Migration is another example. Should we migrate and normalize files on receipt at the repository, or do we migrate only when the need arises? There are arguments both ways depending on the specific use case: but it’s time that we included the energy cost of migrating files in the discussion.
Finally, instant access means spinning disks and global access means data cached in numerous locations around the world. Spinning disks are very intensive for energy consumption as against tape or offline disk storage, but both of those are a lot slower in terms of delivery. Offline storage comes with slower access: so managing the expectations of the users might be an issue, but healthier for the planet. We need to admit more grey areas in preservation: files which are checked occasionally, formats that are not migrated unless someone really needs them, access which is slower but sustainable. And if you will permit me to mix my metaphors, allowing for some grey areas will unlock some of the green spaces.
People
All of that leads me to remember that we face a socio-technical challenge, so the solutions will be vested not only in tools but in people. This is also where we can almost certainly work more closely across our many sectors.
Any analysis of contemporary practice in archives will tell you that archivists have low confidence in their digital skills, and that digital preservation is on balance the least developed skill set of all. A survey in the UK by Jisc and TNA in 2019 spelled this out. It demonstrated that the majority of archives in the UK are not working to a dedicated digital or digital preservation strategy, most report inadequate funding (or none at all) and most report that organisational buy-in is a significant barrier to developing digital skills. Rightly or wrongly, most archivists claim that their archival qualifications have failed to prepare them with digital skills, even among those who graduated after 2010. The consequence, at least in 2019: 67% of archives were not taking steps to preserve digital documents, and 87% had no web archive program.
That’s set against massive increases in data production and consumption. Global data volumes continue on an exponential curve which is estimated to add around 50% more data every two years. But it’s not just about volume it’s also about greater complexity. The largest increases are reported in unstructured data. Non-database sources – video streaming, voice assistants and IoT devices – have led to an explosion in data diversity, crippling traditional approaches to data warehousing.
So, skills are in short supply but instead of getting out ahead of the problem, practice seems to be falling behind. Digital memory is in trouble not because we don’t know what to do, but because we urgently need a flexible and skilled workforce that can respond to the challenge. And that’s where agencies like DPC – and FIAT/IFTA can begin to help.
For many years now, really since we were founded, DPC has supported training. Over the years this has grown from simply publishing the Digital Preservation Handbook to a much wider and integrated program of Workforce Development - training, grants, and resources – which has reached many thousands of people. In 2020 an online training pathway called Novice to Know-How, funded by the National Archives (UK) and free to access replaced a more analogue training roadshow. Entry level training has been supplemented with more advanced tool demos and there’s a roadmap for deep dives into substantive content, such as email preservation modules which will be released in July.
There’s no end of demand. I wonder whether that’s something we could be doing together: first aid for AV? We’ve certainly heard from many of our members that they have an appetite for training on this topic. It’s also quite striking that everyone needs media managers and archivists now – it’s not a niche requirement. Since the pandemic and for many years before, all manner of video and audio content has been accumulating in institutions and agencies, so that, even if they don’t realise it, we’re all AV curators now.
Training has an important contribution to make to tackle some of these issues, and I have no doubt it’s impactful, but on its own it won’t make the lasting structural change we need. Bear in mind that digital preservation – really all preservation - is contingent. It’s not enough to have skills now, it’s also important to have a pathway for the renewal and upgrading of skills too, as well as continuity and succession-planning for staffing.
That in part has encouraged us to develop a competency framework and competency audit tool for digital preservation practitioners which aligns skills assessment to organizational maturity. There are multiple different professional frameworks which are relevant digital preservation, such as the DigCurVE framework, or the competencies listed by the Archives and Records Association: but as noted digital preservation is not all or only about archives: arguably it’s already too late by then.
These tools are useful for practitioners who want to benchmark their own capability and arguably over the longer term a new professional community will emerge here – we can already discern specialisms like web-archiving or research data management developing into distinct specialisms within digital preservation. However, by identifying skills gaps it challenges institutions to more effective recruitment and it targets training providers to more impactful curricula. It also points the finger to the right skills in the right place: that preservation needs to be embedded across the whole production lifecycle and therefore to become everyone’s problem. So, while there may be beneficial outcomes for individuals from these tools, the longer-range forecast is about much more.
In the final analysis though, these capabilities are not for the sake of an emerging profession, nor even for the sake of the bits and the bytes. We invest in them because of the people and opportunity: not because we fear a digital dark age but because we want to come good on a digital promise.
Sustainability
The climate crisis reminds us that long-term thinking matters, and that long term thinking needs longer term skills. There’s never been a more important time to take our stand with the future.
Acknowledgements
I am grateful to Sharon McMeekin, Angela Puggioni and Michael Popham who improved an earlier draft of this presentation and to Jennifer Wilson and Vicky Plaine who invited my participation at their conference.