Research Data Published through Repositories

Research Data Published through Repositories

 Vulnerable small

Research data published through digital repositories or other services providers with specialist skills to manage the data and an ongoing commitment to ensure preservation.

Digital Species: Research Outputs

Trend in 2023:

reduced risk Material improvement

Consensus Decision

Added to List: 2019

Trend in 2024:

No change No Change

Previously: Vulnerable

Imminence of Action

Action is recommended within three years, detailed assessment within one year.

Significance of Loss

The loss of tools, data or services within this group would impact on many people and sectors.

Effort to Preserve | Inevitability

It would require a small effort to preserve materials in this group, requiring the application of proven tools and techniques.

Examples

Recognized data repositories in specialist disciplines; institutional data repositories in subject specialist centres and partnerships.

‘Endangered’ in the Presence of Aggravating Conditions

Lack of long-term commitment; lack of user community; lack of visibility to potential depositors; lack of institutional commitment; insufficient documentation; uncertainty over IPR or the presence of orphaned works.

‘Lower Risk’ in the Presence of Good Practice

Certification and documented good practice; effective documentation requirements for depositors; proven financial sustainability; skilled staff including professionalising disciplinary and general data stewardship offering a clear career option; participation in the digital preservation community; research data management training by repositories and research funders offered to depositors, in particular new career researchers.

2023 Review

This entry was added in 2019 as a separate entry, but it was previously introduced in 2017 under ‘Published research outputs,’ though without explicit reference to the capacity of the repository infrastructure. The 2019 Jury split the entry into a range of contexts for research outputs, including this addition classified as Vulnerable; the preservation of research data published through a well-founded repository with the capacity and commitment to ensure preservation and capability through their own professional development activities made it a lower risk outcome for research data. The 2021 Jury agreed with this classification but commented on the improvements and initiatives towards the preservation of research data and outputs, leading to a 2021 trend towards reduced risk. The 2022 Taskforce identified a 2022 trend towards reduced risk based on material improvement over the last year that had not only offered examples of good research data management and preservation practices but also suggested a significant shift towards a culture of change and collaboration across different research communities and stakeholders. Those mentioned included (but were not limited to) improvements and initiatives by the European Open Science Cloud (EOSC), Science Europe, Research Data Alliance (RDA), Digital Curation Centre (DCC) and related projects on the preservation of research data and outputs.

The 2023 Council agreed with the Vulnerable classification and noted that there was a trend towards reduced risk due to increasing research data management and engagement activity by libraries, which should result in increasing amounts of datasets being deposited. The 2023 Council also noted it would be useful to see empirical data on depositing trends to assess this.

2024 Interim Review

These risks remain on the same basis as before, with no significant trend towards even greater or reduced risk (‘No change’ to trend).

Additional Comments

A key consideration with this entry is whether the data repository is integrated with a preservation system to facilitate long term access and usability of datasets.

The loss of tools, data or services within this group would impact on people and sectors around the world. Particularly those involved with reproducibility and those wishing to use the datasets for further research.

Although there have been improvements in current practice, policies and workflows, there is still a significant corpus of information that was deposited before these improvements came into force. It is unlikely that there will be the time, will or resources to bring this information up to current standards.

Creating additional preservation metadata to research data holdings may help render data more robust in the long term, where using a preservation system is not an option. With an emphasis on environmental sustainability, some repositories hesitate mandating additional copies of large datasets which may be in the region of hundreds of terabytes, as this adds to both storage cost and carbon footprint, especially when capturing and preserving the research methodology would enable recreating the dataset.

Case Studies or Examples:

See also:

  • A recent analysis from Martin Eve of CrossRef shows scholarly content at risk. The findings, based on the assessment of around 7.5 million of the e-books and articles for which CrossRef provides a fixed identifier or Digital Object Identifier, suggest that around a quarter of academic publications are not being preserved for the future. For c. 2 million articles in the study there were no evidence of them being preserved, and 4.3 of works studied were preserved in at least one place. See: Eve, M. P. (2024) ‘Digital Scholarly Journals Are Poorly Preserved: A Study of 7 Million Articles’. Journal of Librarianship and Scholarly Communication 12(1). Available at: https://doi.org/10.31274/jlsc.16288

  • Strecker, D., Pampel, H., Schabinger, R. & Weisweiler, N.L. (2023) ‘Disappearing repositories -- taking an infrastructure perspective on the long-term availability of research data’. Available at: https://doi.org/10.48550/arXiv.2310.06712 

  • L’Hours, H., Kleemola, M., von Stein, I., van Horik, R., Herterich, P., Davidson, J., Rouchon, O., Mokrane, M., & Huber, R. (2021) ‘FAIR + Time: Preservation for a Designated Community (01.00)’. Available at: https://doi.org/10.5281/zenodo.4783116 

  • Science Europe. (2021) ‘Practical Guide to Sustainable Research Data: Maturity Matrices for Research Funding Organisations, Research Performing Organisations, and Research Data Infrastructures’. Available at: https://www.scienceeurope.org/media/b3odxx3s/sepractical-guide-sustainable-research-data.pdf [accessed 24 October 2023]

  • European Open Science Cloud (EOSC) (n.d.) ‘Development and outputs of the European Open Science Cloud (EOSC) Long-Term Data Preservation Task Force’. Available at: https://www.eosc.eu/advisory-groups/long-term-data-preservation [accessed 24 October 2023]

Read More

Published Research Data Appended to Journal Articles

Published Research Data Appended to Journal Articles

   Endangered large

Closed research data sets produced and documented in accordance with good practice and appended to a journal article or transferred to a repository that does not have sufficient subject-matter expertise or funding commitment to ensure reliable or ongoing preservation for the long term.

Digital Species: Research Outputs

Trend in 2023:

reduced risk Material improvement

Consensus Decision

Added to List: 2019

Trend in 2024:

No change No Change

Previously: Endangered

Imminence of Action

Action is recommended within three years, detailed assessment within one year.

Significance of Loss

The loss of tools, data or services within this group would impact on people and sectors around the world.

Effort to Preserve | Inevitability

It would require a small effort to preserve materials in this group going forward, requiring the application of proven tools and techniques.

Examples

Supplementary data sets added to formally published papers in repositories that are designed primarily for papers; electronic journals offering data sets without obvious preservation capacity; institutional repositories servicing highly complex scientific data sets with insufficient subject-matter expertise.

‘Endangered’ in the Presence of Aggravating Conditions

Complex mix of formats; deposit in repositories that lack relevant expertise or knowledge or funding; poorly designed migration or normalization processes; poorly formed ingest and quality assurance procedures; rapid churn of staff; incoherent patterns of subject matter; lack of domain knowledge; no or very small numbers of users; weak or absent collecting policy; deposit to ensure minimal compliance with funder mandate; limited or dysfunctional data management planning and documentation; uncertainty over IPR or the presence of orphaned works.

‘Lower Risk’ in the Presence of Good Practice

Clear data management planning and documentation; deposit by publisher in a trusted repository; deposit by author/s in appropriate repositories with digital preservation expertise and mandate; clear licensing to enable digital preservation and access; strong user base; development roadmap; ability to transfer collections or share metadata with subject repositories or portals; demonstrable re-use of data; clear collecting policy; data management planning early in the data lifecycle.

2023 Review

This 2019 entry was previously introduced in 2017 under 'Published Research Outputs,' though without explicit reference to the research data appended to journal articles. The 2019 Jury split the entry into a range of contexts for research outputs, including this addition and ‘Research Data Published through Repositories’. The entry draws attention to services that take upon themselves commitments to preserve research data, but which may not deliver those promises through lack of capability. The 2021 Jury agreed with the Endangered classification but commented on the improvements and initiatives towards the preservation of research data outputs, with good practice documentation and replication in this space (e.g., collaborations with publishers and repositories, LOCKSS, CLOCKS, etc.). For these reasons, the 2021 trend was towards reduced risk.

The 2022 Taskforce agreed on a trend towards reduced risk based on material improvement over the last year that had not only offered examples of good research data management and preservation practices but also suggested a significant shift towards a culture of change and collaboration across different research communities and stakeholders. Those mentioned included (but were not limited to) improvements and initiatives by the European Open Science Cloud (EOSC), Science Europe, Research Data Alliance (RDA), Digital Curation Centre (DCC) and related projects on the preservation of research data and outputs.

In light of the identified 2021 and 2022 trends, the 2023 Council changed the classification from Endangered to Vulnerable. They noted that many, if not most, HEI libraries that produce research are doing more in terms of research data management, and the activities in this area are growing and scaling up. Due to increased focus on this area, it was recommended that the classification change to Vulnerable with 2023 trend of ‘Material improvement’. 

2024 Interim Review

These risks remain on the same basis as before, with no significant trend towards even greater or reduced risk (‘No change’ to trend).

A Council member recommended that, to add further clarity, it might be worth differentiating use cases—for closed research data sets produced and documented in accordance with good practice and appended to a journal article, and for closed research data sets produced and documented in accordance with good practice and transferred to a repository that does not have sufficient subject-matter expertise or funding commitment to ensure reliable or ongoing preservation for the long term.

Additional Comments

A number of aggravating conditions—those relating to poorly formed ingest and quality assurance procedures, rapid churn of staff, incoherent patterns of subject matter, lack of domain knowledge, no or very small numbers of users, weak or absent collecting policy, and deposit to ensure minimal compliance with funder mandate—are problems with some repositories, not all repositories.

Presenting different use cases can tease apart the use case for supplementary materials appended to journals (e.g., which CLOCKSS and Portico preserve) and those in repositories that are perhaps not tailored for this use case. Cases where data is transferred to a repository that does not have sufficient subject-matter expertise or funding commitment to ensure reliable or ongoing preservation for the long term are far more at risk.

Research data is complex and has specific requirements for documentation which may only be known to subject matter experts. However well intended, it is risky for institutions to attempt to replicate that level of expertise across all the domains within the institution, and it can be hard for smaller publishers to make commitments to sustain data in the long term.

The loss of tools, data or services within this group would impact on people and sectors around the world. Particularly those involved with reproducibility and those wishing to use the datasets for further research.

Although there have been improvements in current practice, policies and workflows, there is still a significant corpus of information that was deposited before these improvements came into force. It is unlikely that there will be the time, will or resource to bring this information up to current standards.

UK funders e.g. UKRI-NERC Environmental Data Service are educating researchers about data policies which mandate depositing master and raw data at the funder disciplinary repository. These repositories have a strong expertise in the research discipline ensuring data and metadata standardization and quality assurance. Any copies of datasets published in journal articles or similar are considered secondary copies and do not comply with data policy, hence risking obtaining future research funding by the institute attempting to use journal outputs as their funder-acknowledged datasets.

The significance and impact of this entry specifically depends on whether it is the only copy of the dataset in existence, or whether there is another copy hosted in a data repository.

Case Studies or Examples:

  • Analysis from Martin Eve of CrossRef shows scholarly content at risk. The findings, based on the assessment of around 7.5 million of the e-books and articles for which CrossRef provides a fixed identifier or Digital Object Identifier, suggest that around a quarter of academic publications are not being preserved for the future. For c. 2 million articles in the study there were no evidence of them being preserved, and 4.3 of works studied were preserved in at least one place. See: Eve, M. P. (2024) ‘Digital Scholarly Journals Are Poorly Preserved: A Study of 7 Million Articles’. Journal of Librarianship and Scholarly Communication 12(1). Available at: https://doi.org/10.31274/jlsc.16288

  • The FAIRsharing Collaboration with DataCite and Publishers. See McQuilton, P., Sansone, S.A., Cousijn, H., Cannon, M., Chan, W.M., Carnevale, I., Cranston, I., Edmunds, S., Everitt, N. and Ganley, E., (2019) ‘FAIRsharing Collaboration with DataCite and Publishers: Data Repository Selection, Criteria That Matter’. Available at: https://doi.org/10.17605/OSF.IO/N9QJ7

  • Resources and research outputs from the Enhancing Services to Preserve New Forms of Scholarship project, which examined a variety of enhanced eBooks and identified which features can be preserved at scale using tools currently available. Of particular note is the published guidelines for preserving new forms of scholarship. See Greenberg, J., Hanson, K., & Verhoff, D. (2021) ‘Guidelines for Preserving New Forms of Scholarship’ NYU Libraries. Available at: https://doi.org/10.33682/221c-b2xj.

  • The work by the Centre pour la Communication Scientifique Directe (CCSD) of France and the Confederation of Open Access Repositories (COAR) in creating a preprint repository directory which has been relevant to building a user community). See Centre pour la Communication Scientifique Directe (CCSD) of France and the Confederation of Open Access Repositories (COAR) (n.d.) ‘Directory of Open Access Preprint Repositories’. Available at: https://doapr.coar-repositories.org/ [accessed 24 October 2023]

Read More

Cloud Storage

Cloud Storage

   Vulnerable small

Materials routinely copied or backed up to an independently managed, off-site data storage facility and able to be restored under contractual terms

Digital Species: Cloud, Integrated Storage

Trend in 2023:

No change No Change

Consensus Decision

Added to List: 2019

Trend in 2024:

No change No Change

Previously: Vulnerable

Imminence of Action

Action is recommended as required, with periodic review every five years.

Significance of Loss

The loss of tools, data or services within this group would impact on many people and sectors around the world.

Effort to Preserve | Inevitability

Loss seems likely. By the time tools or techniques have been developed, the material will likely have been lost.

Examples

Remote network storage provided by a third-party service under contracts, such as DropBox, Amazon, Microsoft Azure, Dell EMC, Google Cloud Platform, Google Drive, IBM, Rackspace, Iron Mountain, SAP, and others.

‘Endangered’ in the Presence of Aggravating Conditions

Lack of skills, commitment or policy from corporate owners; Encryption; lack of routine maintenance; lack of storage replication; over-dependence on a single supplier; insufficient documentation; lack of local alternative; political or commercial instability; overly aggressive compression; poor information security; lack of transparent integrity-checking; lack of strategic investment; lack of migration plan; lack of exit strategy; unenforceable penalties; unstable pricing; unpredictable removal costs; uncertainty over IPR or the presence of orphaned works.

‘Lower Risk’ in the Presence of Good Practice

Backup to different technology; backup to diverse locations; documentation of assets; integrity checking; preservation licensing and planning; export functionality; resilient to hacking; version control; resilient funding; technology watch; enforceable contract; disaster planning and documentation; stable pricing; budgeted removal costs.

2023 Review

This entry was added in 2019 to ensure that the range of media storage is properly assessed and presented. The 2021 Jury noted increased risk in light of greater reliance on the cloud and localized disruptions to cloud services over the pandemic. A 2021 trend towards greater risk was based on the wider (global) dependence on these services, especially Google Drive, for record-keeping and business workflows. The impact of loss increased with more reliance on cloud services leading to greater risk; however, this should not deter people from using cloud storage. The 2022 review agreed with this assessment but noted no significant increase in trend for 2022.

The 2023 Council moved this entry to a new higher-level Cloud species as the previous Integrated Storage species worked less well (for hardware technologies). The Council agreed with the previous Vulnerable classification, with the overall risks remaining on the same basis as before so long as there are safeguards in place (‘No change’ to the 2023 trend). However, the Council noted that these safeguards may not, in all cases, be sufficient to address existing risks. Council members noted how some governments may cut off the internet in times of unrest, having a disastrous effect on access to cloud-based resources, and raised questions about the feasibility of recovering material after a major cloud vendor fails or due to malicious acts. For these materials, the significance of loss and effort to preserve is much greater, with the potential for a trend towards greater risk with the loss of existing safeguards.

2024 Interim Review

The 2024 Council agreed these risks remain on the same basis as before, with no significant trend towards even greater or reduced risk (‘No change’ to trend).

While overall risk remains on the same basis as before, some Council members pointed out how a lack of transparency in knowledge about how a cloud service is actually built and functions is worrying from a preservation perspective. Additionally, the overall political ‘threat situation’ worldwide seems to be increasing, which means that significant changes in national political regimes can affect the predictability of how the material is handled in a cloud service and, with that, the potential for increased risk.

Additional Comments

To add further clarity, Council members in the Integrated Storage species group noted that there is a distinction between ‘in-house’ physical storage and cloud storage, especially if one relies on cloud storage as the only storage provider for digital content. As they understand it, this ‘Cloud Storage’ entry focuses on material copied or backed up to a third-party cloud service. This is less threatening compared to using the cloud as the sole storage provider for content preservation.

The history of digital preservation suggests that the risk of vendors going out of business or shutting down services is the key issue here, over and above any specific technical solutions or risks.

Case Studies or Examples:

  • The example of the Microsoft outage in July 2024, in which a software update led to the cancellation of flights, healthcare disruptions and payroll issues. See: Sky News (2024) ‘Global IT outage: More than 5,000 flights cancelled; how security 'arms race' led to crash. As it happened’, 19 July 2024. Available at: https://news.sky.com/story/outages-latest-airports-business-and-broadcasters-experiencing-issues-worldwide-13180821 [accessed 06 September 2024]

  • Case of cloud storage provider who accidentally deleted a client account – including all replicas and backups. This emphasises that a single third-party provider should only really be considered a single copy regardless of the resilience the provider puts in place. Cloud introduces new single points of failure. See: Amadeo, R. (2024) ‘Google Cloud explains how it accidentally deleted a customer account’, Ars Technica. Available at: https://arstechnica.com/gadgets/2024/05/google-cloud-explains-how-it-accidentally-deleted-a-customer-account/ [accessed 17 June 2024]

  • Case of a cloud storage provider who suffered major data loss (or its clients suffered data loss) due to a fire in its data centre. Those clients suffered most who did not include geographically redundant storage in the contract with the storage provider as this was more expensive. See Rosemain, M. and Satter, R. (2021) ‘Millions of websites offline after fire at French cloud services firm’, Reuters. Available at: https://www.reuters.com/article/us-france-ovh-fire-idUSKBN2B20NU [accessed 24 October 2023]

  • Case of fired credit union employee accessing the financial institution's computer systems without authorization and destroying over 21 gigabytes of data via remote network storage. See Gatlan, S. (2021) ‘Fired NY credit union employee nukes 21 GB of data in revenge’, BleepingComputer. Available at: https://www.bleepingcomputer.com/news/security/fired-ny-credit-union-employee-nukes-21gb-of-data-in-revenge [accessed 24 October 2023]

  • The National Archives UK (2023) ‘Digital Services and carbon emissions in the heritage sector: some preliminary findings’, which noted areas relating to the cloud and cloud storage. They write “If we are looking for areas where significant carbon reductions could be made quickly, they are not to be found here. The evidence is that hosting digital services on site results in more carbon emissions than a sensibly located (i.e., in a territory with a high proportion of electricity generated from renewables) cloud host and that, where it might be felt that migrating services simply migrates emissions from scope 2 to scope 3, in practice cloud providers can offer the same storage and compute with lower emissions. Amazon in particular reports its view of the carbon ‘saved’ by using its services rather than your own, but these are estimates and should not be regarded as robust.”  See: The National Archives (UK) (2023) ‘Digital Services and carbon emissions in the heritage sector: some preliminary findings’. Available at: https://www.nationalarchives.gov.uk/archives-sector/digital-services-and-carbon-emissions-in-the-heritage-sector-some-preliminary-findings/ [accessed 24 October 2023]

Read More

Current Hard Disk Technologies

Current Hard Disk Technologies

   Vulnerable small

Materials saved to storage devices with a variety of underlying magnetic or solid-state (flash) technologies that are hardwired into a computer still under warranty or supported: typically hard disks that are less than five years old.

Digital Species: Integrated Storage

Trend in 2023:

No changeNo Change

Consensus Decision

Added to List: 2019

Trend in 2024:

No changeNo Change

Previously: Vulnerable

Imminence of Action

Action is recommended within five years, detailed assessment within three years.

Significance of Loss

The loss of tools, data or services within this group would impact on many people and sectors.

Effort to Preserve | Inevitability 

Loss of material in this group could be entirely avoidable if provided the means to deploy proven tools and techniques.

Examples

Direct Attached Storage (DAS) such as magnetic or solid-state drives integrated into individual laptops or workstations and into smaller scale storage facilities.

‘Endangered’ in the Presence of Aggravating Conditions

Encryption; poor handling; poor storage; lack of consistent replication; failure of external (dependencies, e.g., suppliers, security); political or commercial interference; failure of internal dependencies (e.g., power supply, disk controller); overly aggressive compression; poor information security; lack of integrity-checking; lack of strategic investment; lack of warranty; unenforceable warranty; Uncertainty over IPR or the presence of orphaned works.

‘Lower Risk’ in the Presence of Good Practice

Backup to different technology; backup to diverse locations; documentation of assets; integrity checking; preservation planning; refreshment planning; export functionality; resilient to hacking; selection and appraisal criteria; version control; resilient funding; technology watch; enforceable warranty; disaster planning.

2023 Review

This entry was added in 2019 to ensure that the range of media storage is properly assessed and presented. It was reviewed in 2021 with a noted trend towards greater risk in light of the continued shift towards reliance on cloud storage with computers increasingly reducing hard disk for solid-state storage and commercial motivations for less support, and reviewed in 2022 with no noted increase in trend towards even greater or reduced risk.

The 2023 Council agreed with the current Vulnerable classification, with overall risks remaining on the same basis as before (‘No change’ to trend), while also noting a slight decrease in the effort needed to preserve and the imminence of action required when compared to the 2021 Jury review.

2024 Interim Review

The 2024 Council agreed These risks remain on the same basis as before, with no significant trend towards even greater or reduced risk (‘No change’ to trend).

There were also noted areas of overlap with the Portable Media species group (See: ‘Current Portable Magnetic Media’). As people increasingly select other storage methods, such as cloud, they are less likely to maintain existing content on portable hard disks, which means the portable hard disks are more likely to be overlooked or ignored (e.g., left in drawers) rather than checked and refreshed. Questions arise concerning hard drives and SSDs packaged as portable devices, and for this reason, further cross-species review is recommended for the next 2025 review.

Additional Comments

There are also indications of increasing prevalence of soldered-in flash storage which cannot easily be accessed in the case of device failure.

Case Studies or Examples:

  • Some new technologies like shingling, HAMR/MAMR and multiple actuators have given HDD technology–and, more importantly for preservation, interfaces such as SATA and SAS–a new lease on life. Nevertheless, the writing is on the wall as flash and related technologies move to NVME and CXL interfaces. See Mellor, C. (2023) ‘Pure: No more hard drives will be sold after 2028’, Blocks & Files. Available at https://blocksandfiles.com/2023/05/09/pure-no-more-hard-drives-2028/ [accessed 24 October 2023]

  • For example, SSDs can be remarkably sensitive to storage conditions when unpowered. See Cox, A. (2013) ‘JEDEC SSD Specifications Explained’, JC-64.8. Available at: https://www.jedec.org/sites/default/files/Alvin_Cox%20%5bCompatibility%20Mode%5d_0.pdf [accessed 24 October 2023]

See also:

Read More

Recently Commissioned or Completed Media Art

Recently Commissioned or Completed Media Art

  Vulnerable small

Media art currently displayed in a gallery or in the process of being displayed.

Digital Species: Media Art

Trend in 2023:

No changeNo Change

Consensus Decision

Added to List: 2019

Trend in 2024:

No changeNo Change

Previously: Vulnerable

Imminence of Action

Action is recommended within twelve months, detailed assessment is a priority.

Significance of Loss

The loss of tools, data or services within this group would impact on many people and sectors.

Effort to Preserve | Inevitability

It would require a small effort to preserve materials in this group, requiring the application of proven tools and techniques.

Examples

Media art recently acquired by galleries that utilize specific hardware and software in order to be accessed or exhibited.

‘Endangered’ in the Presence of Aggravating Conditions

Lack of documentation to enable maintenance; Uncertainty over IPR or the presence of orphaned works; complex interdependencies on specific hardware, software or operating systems; lack of capacity in the gallery or workshop; lack of strategic investment; complex external dependencies; lack of documentation about artist intent; lack of understanding of costs for display and preservation.

‘Lower Risk’ in the Presence of Good Practice

Strong documentation; clarity of preservation path and ensuing responsibilities; proven preservation plan; capacity of workshop to support artwork at de-installation; capacity of gallery to conserve after de-installation; capacity of gallery to re-install work; funding understood to re-install.

2023 Review

This entry was added in 2019 as a separate entry, but it was previously introduced in 2017 under ‘Media Art’ with particular reference to historical media art. It was added for greater specificity for its recommendations, to represent works acquired and commissioned in the last five years where there is a reasonable expectation that documentation has been produced or could still be obtained. While the 2020 Jury found no change in trend, the 2021 Jury discussed how prospects for long-term preservation depend entirely on whether the artwork is collected post-commission and by an organization with the resources to care for it. They agreed that the classification remains Vulnerable but with a trend towards greater risk because the imminence of action is time-sensitive, requiring working with the artist to get the documentation from them about their work and what is needed before it is too late. Furthermore, there remains a vulnerability for the smaller museums or others that do not take the preservation of media art as seriously.

The 2023 Council agreed with the Vulnerable classification with overall risks remaining on the same basis as before (‘No change’ to trend), although noted a change in the imminence of action from 3 years to 12 months.

2024 Interim Review

The 2024 Council agreed These risks remain on the same basis as before, with no significant trend towards even greater or reduced risk (‘No change’ to trend). However, it was important to note that the ‘Effort to Preserve | Inevitability’ can vary. Some of the works take a huge effort to preserve and that perhaps this needs some middle ground in terms of communicating those aspects.

Additional Comments

By the time digital art, time-based media, etc., has entered into the permanent care of a stewarding institution, many of its technologies are already end-of-life, unsupported, or the hardware components have deteriorated. Often the expertise to maintain these many interacting components sits outside the host organization, with a technical supplier to the gallery, and this is in itself vulnerable to business change. Although there are a few exceptions, there is a need for greater capacity within the museum and gallery sector to address the challenges.

There have been new initiatives for guidance and examples of institutions taking wider sectoral responsibility for standards, which have helped with the effort to preserve, such as Matters in Media Art information resource and guidance.

Media artworks are often made with a network of knowledge that can be precarious. Documentation around production processes can be minimal, and hence acting quickly with known processes can gather information before the knowledge and people networks start to disperse. This can mean preservation of production environments and associated workflows can be preserved alongside the media.

Some art works specifically leverage the limitations and characteristics of the systems that they incorporate, often in unusual ways. This can be hard to migrate or emulate accurately.

Case Studies or Examples:

  • Resources and outputs from the Preserving and Sharing Born Digital and Hybrid Objects From and Across The National Collection project. See V&A Research Projects (n.d.) ‘Preserving and Sharing Born Digital and Hybrid Objects’. Available at: https://www.vam.ac.uk/research/projects/preserving-and-sharing-born-digital-and-hybrid-objects [accessed 24 October 2023].

  • This includes decision model work around acquisition of complex collections such as born digital and hybrid art. See Ensom, T, and McConnachie, S. (2022) ‘Preserving and sharing born-digital and hybrid objects from and across the National Collection’, Decision Model Report: March 2022. Available at: http://doi.org/10.5281/zenodo.7097489

  • Matters in Media Art (n.d.) ‘Guidelines for the care of media artworks’. Available at: http://mattersinmediaart.org/ [accessed 24 October 2023]

See also:

  • The DPC ‘Preserving Digital Art’ Technology Watch Guidance Note is aimed at institutions starting to collect digital art as part of a wider collecting remit. It offers basic guidance on the specificities of digital art and how it may differ from other digital content in an institution’s care. See: Falcão, P. (2024) ‘Preserving Digital Art’, DPC Technology Watch Guidance Note 24-02. Available at: http://doi.org/10.7207/twgn24-02

  • NEW MEDIA MUSEUMS: Creating Framework for Preserving and Collecting Media Arts in V4, initiated by the Olomouc Museum of Art as a joint international platform for sharing experience with building and maintaining collections of new media artworks across different types of institutions. The aim of the project is to find workable methods for heritage institutions to build and maintain collections of media arts, which are necessary for safeguarding this area for the benefit of society. See Central European Art Database (2021) ‘NEW MEDIA MUSEUMS: Creating Framework for Preserving and Collecting Media Arts in V4’. Available at: http://cead.space/Detail/projects/3797 [accessed 24 October 2023]

  • The Collaborative Infrastructure for sustainable access to digital art LIMA project, to prevent the loss of digital artworks and to commonly develop the knowledge to preserve these works in a sustainable way. The project ‘Infrastructure sustainable accessibility digital art’ invests in research, training, knowledge sharing and conservation to prevent the loss of both digital artworks and the knowledge to preserve them. See LIMA (n.d.) ‘Collaborative infrastructure for sustainable access to digital art’. Available at: https://www.li-ma.nl/lima/article/collaborative-infrastructure-sustainable-access-digital-art [accessed 24 October 2023]

Read More

PDF

 PDF

   Vulnerable small

Documents presented in PDF (Portable Document Format) format (ISO 32000:1 and ISO 32000:2) and other data wrapped inside them, including all variants and versions, including PDF/A.

Digital Species: Formats

Trend in 2023:

No change No Change

Consensus Decision

Added to List: 2017

Trend in 2024:

No change No Change

Previously: Vulnerable/Endangered

Imminence of Action

Action is recommended as required, with periodic review every five years.

Significance of Loss

The loss of tools, data or services within this group would impact on people and sectors around the world.

Effort to Preserve | Inevitability 

It would require a small effort to preserve materials in this group, requiring the application of proven tools and techniques.

Examples

Documents stored offline, or online in repositories or EDRMS, including reports, agenda, minutes, correspondence, contracts, essays, articles, or research papers, PDF 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7 and 2.0. PDF/A, PDF/X and PDF/E.

‘Endangered’ in the Presence of Aggravating Conditions

Lack of skills, commitment or policy from corporate owners; loss of context; loss of authenticity or integrity; external dependencies; poor storage; lack of understanding; significant diversity of data; poorly developed digitization specifications; lack of integrity checking; poorly developed migration or normalizations specifications; lack of virus control; poor storage or replication; lack of validation at the point of creation; encryption; uncertainty over IPR or the presence of orphaned works.

‘Lower Risk’ in the Presence of Good Practice

Well-managed data infrastructure; preservation planning; authenticity managed; use of persistent identifiers; reduction of dependencies; application of records management standards; recognition of preservation requirements beyond formats; strategic investment in digital preservation; preservation roadmap; clear licensing to enable digital preservation and deposit in a trusted archive; participation in digital preservation community; format validation; version control.

2023 Review

A PDF entry was added in 2017 and was split into two entries, ‘PDF/A’ and ‘PDF other than PDF/A’, in 2019 to emphasize the different threats faced by different types of PDF. The 2021 Jury agreed with this decision and noted that trends for the PDF other than PDF/A entry and the PDF/A entry were both towards a reduced risk. The 2022 Taskforce agreed these risks remain on the same basis as before, with no significant trend towards even greater or reduced risk (‘No change’ to trend).

The 2023 Council recommended merging the two previously split entries of ‘PDF/A’ and ‘PDF other than PDF/A’. After reviewing the two entries separately, they found more similarities than differences between the two and, indeed, across all types of PDF (not just PDF/A). Due to the level of commercial, open-source tools that are available to assist preservation, the risk of loss was less persistent than previously suggested. Therefore, a Vulnerable classification was assigned as the most appropriate for all PDF formats as a whole.

2024 Interim Review

The 2024 Council agreed these risks remain on the same basis as before (‘No change’ to trend).

Additional Comments

There is a lot of material produced and kept in PDF. Some of it is authoritative, in other words, the only available copy, while some of it is not. However, if it is the only copy and it is lost, it can have an impact on a lot of people

The challenge in evaluating the significance and impact of the loss of PDFs is that they’re quite often a surrogate of something else, whether a digitized record or a Word document, etc. Whether or not that record is retained may be a factor. We should also be considering PDF Portfolios, which are an extension of PDF 1.7. Portfolios contain embedded files and can include text documents, spreadsheets, PowerPoints, emails, Computer Aided Design (CAD) drawings.

Vulnerability also depends on if the PDF file conforms to the specific PDF/A standard or not. This is caused by a combination of 1) not conforming to the standard and 2) collection managers assuming that the file is resilient simply because it purports to be a PDF/A. This risk is less with the format and more with the understanding and experience in data management. Moreover, materials embedded in or attached to PDF/A-2 and PDF/A-3 may be at risk.

See also:

Read More

Published Research Papers

 Published Research Papers

   Endangered large

Completed research papers published in serials, monographs or theses which fall under specific collecting policies of research libraries or archives and are managed through dedicated repository infrastructures.

Digital Species: Research Outputs

Trend in 2023:

No change No Change

Consensus Decision

Added to List: 2017

Trend in 2024:

No change No Change

Previously: Vulnerable

Imminence of Action

Action is recommended within three years, detailed assessment within one year.

Significance of Loss

The loss of tools, data or services within this group would impact on people and sectors around the world.

Effort to Preserve | Inevitability

It would require a small effort to preserve materials in this group, requiring the application of proven tools and techniques.

Examples

Published research papers in scholarly E-Books and Electronic Journals; Electronic theses (E-theses).

‘Endangered’ in the Presence of Aggravating Conditions

Lack of skills, commitment or policy from publishers; uncertainty over IPR or the presence of orphaned work; embedded complex objects; unstable funding for repository; lack of strategic investment; complex external dependencies; lack of persistent identifiers; bespoke formats; lack of legal deposit mandate.

‘Lower Risk’ in the Presence of Good Practice

Strong documentation including intellectual property rights; clarity of preservation path and ensuing responsibilities; credible preservation plan; proven capacity of repository; legal deposit preservation copying; post-cancellation access service; persistent identifiers used consistently; non-proprietary formats used and validated; minimal or well managed external dependencies.

2023 Review

This entry was added in 2017 under 'Published research outputs,' though without reference to the capacity of the repository infrastructure. The 2019 Jury amended it to presume the existence of repository infrastructure and noted that the aggravating conditions (which introduce risks) and good practice enhancements (which reduce it) are most relevant to repository operations.

While the 2020 Jury found no change in trend, the 2021 Jury agreed it should remain Vulnerable and discussed improvements and initiatives towards the preservation of research data and outputs, pointing to a 2021 trend towards reduced risk. The 2022 Taskforce agreed risks were on the same basis as before (no change to the trend).

The 2023 Council agreed with the Vulnerable classification and risks remained on the same basis as before (‘No change’ to trend), also noting a slight decrease in imminence of action with no significant trends towards greater or reduced risk. Additionally, the 2023 Council recommended that a nomination received for a new ‘E-theses’ entry would provide a valuable example to this entry rather than as a new, standalone entry. The 2023 Council recognized that further scoping and input are needed for this entry and recommended that the next major review revisit and restructure the entry.

2024 Interim Review

These risks remain on the same basis as before, with no significant trend towards even greater or reduced risk (‘No change’ to trend).

Additional Comments

The 2023 nomination for E-theses highlights distinct risks tied to these digital published materials. E-theses tend to be sole documents which when published by universities may get harvested into other aggregators or resources but in many cases the only copy (with no physical/analogue copy) sits on an Institution's repository. In addition, many are deposited in PDF format (of many varieties and many don't even attempt to use PDF/A etc.) risking long term accessibility and re-use. However, the breadth of risks goes beyond just the PDF variety, as e-theses often include databases, audiovisual materials, websites, and more.

The loss of tools, data or services within this group would impact on people and sectors around the world. Particularly those involved with reproducibility and those wishing to use the datasets for further research.

Although there have been improvements in current practice, policies and workflows, there is still a significant corpus of information that was deposited before these improvements came into force. It is unlikely that there will be the time, will or resources to bring this information up to current standards.

See also:

  • A recent analysis from Martin Eve of CrossRef shows scholarly content at risk. The findings, based on the assessment of around 7.5 million of the e-books and articles for which CrossRef provides a fixed identifier or Digital Object Identifier, suggest that around a quarter of academic publications are not being preserved for the future. For c. 2 million articles in the study there were no evidence of them being preserved, and 4.3 of works studied were preserved in at least one place. See: Eve, M. P. (2024) ‘Digital Scholarly Journals Are Poorly Preserved: A Study of 7 Million Articles’. Journal of Librarianship and Scholarly Communication 12(1). Available at: https://doi.org/10.31274/jlsc.16288

  • Konstantelos, L., (2021) ‘Breaking down barriers in e-only thesis submission: how digital preservation contributes to the conversation at the University of Glasgow’, Digital Preservation Coalition Blog. Available at: https://www.dpconline.org/blog/wdpd/wdpd2021-konstantelos [accessed 24 October 2023]

  • Klungthanaboon, W., (2021) ‘From “research output'' to “research data'' - a willingness to move forward?’, Digital Preservation Coalition Blog. Available at: https://www.dpconline.org/blog/wdpd/research-output-to-research-data [accessed 24 October 2023]

  • Beagrie, N (2013) ‘Preservation, Trust and Continuing Access for E-Journals’, DPC Technology Watch Report 13-04. Available at: http://doi.org/10.7207/twr13-04

  • Morrissey, S, and Kirchhoff, A (2014) ‘Preserving E-Books’, DPC Technology Watch Report 14-01. Available at: http://doi.org/10.7207/twr14-01

  • Resources and recent outputs from Public Knowledge Project (PKP) Preservation Network, which developed to digitally preserve Open Journal Systems (OJS) journals. See Public Knowledge Project (n.d.) ‘PKP Preservation Network’. Available at: https://pkp.sfu.ca/pkp-pn/ [accessed 24 October 2023]

 

Read More

Local Network Storage

Local Network Storage

   Vulnerable small

Materials routinely copied or backed up to locally managed data storage facilities and able to be restored under institutional service arrangements.

Digital Species: Integrated Storage

Trend in 2023:

No change No Change

Consensus Decision

Added to List: 2019

Trend in 2024:

No change No Change

Previously: Vulnerable

Imminence of Action

Action is recommended as required, with periodic review every five years.

Significance of Loss

The loss of tools, data or services within this group would impact on many people and sectors.

Effort to Preserve | Inevitability 

Loss of material in this group could be entirely avoidable if provided the means to deploy proven tools and techniques.

Examples

Institutional or departmental network storage and institutional data centers based on technologies such as (NAS) Network Attached Storage, (SAN) Storage Area Networks, Gluster FS and related.

‘Endangered’ in the Presence of Aggravating Conditions

Encryption; lack of routine maintenance; lack of storage replication; over-dependence on a single supplier, technology or technician; insufficient documentation; single point of failure; political or commercial interference; failure of dependencies (e.g., power supply, controller software); overly aggressive compression; poor information security; lack of integrity-checking; lack of strategic investment; lack of warranty; unenforceable warranty, encryption; Uncertainty over IPR or the presence of orphaned works.

‘Lower Risk’ in the Presence of Good Practice

Backup to different technology; backup to diverse locations; documentation of assets; integrity checking; preservation planning; refreshment planning; export functionality; resilient to hacking; selection and appraisal criteria; version control; resilient funding; technology watch; enforceable warranty; disaster planning and documentation.

2023 Review

This entry was added in 2019 to ensure that the range of media storage is properly assessed and presented.

The 2023 Council agreed with the current Vulnerable classification with overall risks remaining on the same basis as before (‘No change’ to trend), while also noting a slight decrease in the effort needed to preserve and the imminence of action required when compared to the 2021 Jury review.

2024 Interim Review

The 2024 Council agreed These risks remain on the same basis as before, with no significant trend towards even greater or reduced risk (‘No change’ to trend).

Additional Comments

There has been a renewed interest in tape as offline storage is the only sure protection against advanced ransomware.

See also:

Read More

Pension, Mortgage and Insurance Records

Pension, Mortgage and Insurance Records

   Vulnerable small

Records of transactions for long-lived financial products and services contracted between individuals and corporations. These records typically contain or depend on significant amounts of personal information and outlast the infrastructure on which they were created.

Group: Sensitive Data

Trend in 2023:

No change No Change

Consensus Decision

Added to List: 2017

Trend in 2024:

No change No Change

Previously: Vulnerable

Imminence of Action

Action is recommended within three years, detailed assessment within one year.

Significance of Loss

The loss of tools, data or services within this group would impact on many people and sectors.

Effort to Preserve | Inevitability

It would require a small effort to preserve materials in this group, requiring the application of proven tools and techniques.

Examples

Applications, correspondence and ancillary records relating to pensions, mortgages and insurances and other contracts of long duration. This includes corporate databases, email, web archives and EDRMS, and may require some coordination of paper, microfiche, born-digital and digitized records. These records often include the scope and duration of the contract as well as any agreed changes during the lifetime of the product. It may also include evidence of mis-selling or other sharp practice, which only becomes apparent after the fact. This entry pertains to corporate records rather than personal records.

‘Endangered’ in the Presence of Aggravating Conditions

Lack of corporate preservation planning; lack of preservation within the procurement of corporate systems; companies conflating backup with preservation; loss of integrity and authenticity; loss of context and connections to provide meaning; lack of preservation capability within agencies; lack of preservation voice at executive level; poor planning and roadmap for corporate infrastructure; proliferation of legacy systems; slapdash procurement or migration of new systems; mergers and acquisitions leading to confusion of corporate systems; lack of compliance, audit or accountability at operational levels; encryption; Uncertainty over IPR or the presence of orphaned works owners.

‘Lower Risk’ in the Presence of Good Practice

Backup and documentation; use of open formats and open source software; considered data management planning; licencing that enables preservation; preservation capability in designated repository; resilient to hacking; selection and appraisal in place; authenticity and integrity of records managed; resilient funding and recognition at executive level; technology watch; regular preservation audits; accreditation and participation in the professional preservation community.

2023 Review

This entry was added in 2017 but was outside the competence of the judges to assess at that time. It was assessed in 2019 with additional expertise invited to the panel to support this assessment and reviewed again in 2020. The 2021 Jury agreed with the 2019 assessment and subsequent 2020 review, which classified these digital materials as Vulnerable with no trend towards greater or reduced risk.

The 2023 Council agreed with the Vulnerable classification with the overall risks remaining on the same basis as before (‘No change’ to trend).

2024 Interim Review

The 2024 Council recommends that a major rescoping of the Sensitive Data species is necessary, with plans to remove it as a species and incorporate key elements and examples to relevant entries for the next 2025 Bit List. This is because it is not clear how sensitive data works as a species, when many of the other species mentioned could have sensitive data concerns, and the sensitivity of the data is more like an extra category of risk that potentially applies across any species.

Additional Comments

The importance of retaining documentation in any kind of legal agreement offers this kind of material more protection than most but legal organizations may conflate backup with preservation and not always have consistent records management systems.

See also:

Read More

Scroll to top