Research Data Published through Repositories
Research Data Published through Repositories
Research data published through digital repositories or other services providers with specialist skills to manage the data and an ongoing commitment to ensure preservation. |
||
Digital Species: Research Outputs |
Trend in 2023: Material improvement |
Consensus Decision |
Added to List: 2019 |
Trend in 2024: No Change |
Previously: Vulnerable |
Imminence of Action Action is recommended within three years, detailed assessment within one year. |
Significance of Loss The loss of tools, data or services within this group would impact on many people and sectors. |
Effort to Preserve | Inevitability It would require a small effort to preserve materials in this group, requiring the application of proven tools and techniques. |
Examples Recognized data repositories in specialist disciplines; institutional data repositories in subject specialist centres and partnerships. |
||
‘Endangered’ in the Presence of Aggravating Conditions Lack of long-term commitment; lack of user community; lack of visibility to potential depositors; lack of institutional commitment; insufficient documentation; uncertainty over IPR or the presence of orphaned works. |
||
‘Lower Risk’ in the Presence of Good Practice Certification and documented good practice; effective documentation requirements for depositors; proven financial sustainability; skilled staff including professionalising disciplinary and general data stewardship offering a clear career option; participation in the digital preservation community; research data management training by repositories and research funders offered to depositors, in particular new career researchers. |
||
2023 Review This entry was added in 2019 as a separate entry, but it was previously introduced in 2017 under ‘Published research outputs,’ though without explicit reference to the capacity of the repository infrastructure. The 2019 Jury split the entry into a range of contexts for research outputs, including this addition classified as Vulnerable; the preservation of research data published through a well-founded repository with the capacity and commitment to ensure preservation and capability through their own professional development activities made it a lower risk outcome for research data. The 2021 Jury agreed with this classification but commented on the improvements and initiatives towards the preservation of research data and outputs, leading to a 2021 trend towards reduced risk. The 2022 Taskforce identified a 2022 trend towards reduced risk based on material improvement over the last year that had not only offered examples of good research data management and preservation practices but also suggested a significant shift towards a culture of change and collaboration across different research communities and stakeholders. Those mentioned included (but were not limited to) improvements and initiatives by the European Open Science Cloud (EOSC), Science Europe, Research Data Alliance (RDA), Digital Curation Centre (DCC) and related projects on the preservation of research data and outputs. The 2023 Council agreed with the Vulnerable classification and noted that there was a trend towards reduced risk due to increasing research data management and engagement activity by libraries, which should result in increasing amounts of datasets being deposited. The 2023 Council also noted it would be useful to see empirical data on depositing trends to assess this. |
||
2024 Interim Review These risks remain on the same basis as before, with no significant trend towards even greater or reduced risk (‘No change’ to trend). |
||
Additional Comments A key consideration with this entry is whether the data repository is integrated with a preservation system to facilitate long term access and usability of datasets. The loss of tools, data or services within this group would impact on people and sectors around the world. Particularly those involved with reproducibility and those wishing to use the datasets for further research. Although there have been improvements in current practice, policies and workflows, there is still a significant corpus of information that was deposited before these improvements came into force. It is unlikely that there will be the time, will or resources to bring this information up to current standards. Creating additional preservation metadata to research data holdings may help render data more robust in the long term, where using a preservation system is not an option. With an emphasis on environmental sustainability, some repositories hesitate mandating additional copies of large datasets which may be in the region of hundreds of terabytes, as this adds to both storage cost and carbon footprint, especially when capturing and preserving the research methodology would enable recreating the dataset. Case Studies or Examples:
See also:
|
Published Research Data Appended to Journal Articles
Published Research Data Appended to Journal Articles
Closed research data sets produced and documented in accordance with good practice and appended to a journal article or transferred to a repository that does not have sufficient subject-matter expertise or funding commitment to ensure reliable or ongoing preservation for the long term. |
||
Digital Species: Research Outputs |
Trend in 2023: Material improvement |
Consensus Decision |
Added to List: 2019 |
Trend in 2024: No Change |
Previously: Endangered |
Imminence of Action Action is recommended within three years, detailed assessment within one year. |
Significance of Loss The loss of tools, data or services within this group would impact on people and sectors around the world. |
Effort to Preserve | Inevitability It would require a small effort to preserve materials in this group going forward, requiring the application of proven tools and techniques. |
Examples Supplementary data sets added to formally published papers in repositories that are designed primarily for papers; electronic journals offering data sets without obvious preservation capacity; institutional repositories servicing highly complex scientific data sets with insufficient subject-matter expertise. |
||
‘Endangered’ in the Presence of Aggravating Conditions Complex mix of formats; deposit in repositories that lack relevant expertise or knowledge or funding; poorly designed migration or normalization processes; poorly formed ingest and quality assurance procedures; rapid churn of staff; incoherent patterns of subject matter; lack of domain knowledge; no or very small numbers of users; weak or absent collecting policy; deposit to ensure minimal compliance with funder mandate; limited or dysfunctional data management planning and documentation; uncertainty over IPR or the presence of orphaned works. |
||
‘Lower Risk’ in the Presence of Good Practice Clear data management planning and documentation; deposit by publisher in a trusted repository; deposit by author/s in appropriate repositories with digital preservation expertise and mandate; clear licensing to enable digital preservation and access; strong user base; development roadmap; ability to transfer collections or share metadata with subject repositories or portals; demonstrable re-use of data; clear collecting policy; data management planning early in the data lifecycle. |
||
2023 Review This 2019 entry was previously introduced in 2017 under 'Published Research Outputs,' though without explicit reference to the research data appended to journal articles. The 2019 Jury split the entry into a range of contexts for research outputs, including this addition and ‘Research Data Published through Repositories’. The entry draws attention to services that take upon themselves commitments to preserve research data, but which may not deliver those promises through lack of capability. The 2021 Jury agreed with the Endangered classification but commented on the improvements and initiatives towards the preservation of research data outputs, with good practice documentation and replication in this space (e.g., collaborations with publishers and repositories, LOCKSS, CLOCKS, etc.). For these reasons, the 2021 trend was towards reduced risk. The 2022 Taskforce agreed on a trend towards reduced risk based on material improvement over the last year that had not only offered examples of good research data management and preservation practices but also suggested a significant shift towards a culture of change and collaboration across different research communities and stakeholders. Those mentioned included (but were not limited to) improvements and initiatives by the European Open Science Cloud (EOSC), Science Europe, Research Data Alliance (RDA), Digital Curation Centre (DCC) and related projects on the preservation of research data and outputs. In light of the identified 2021 and 2022 trends, the 2023 Council changed the classification from Endangered to Vulnerable. They noted that many, if not most, HEI libraries that produce research are doing more in terms of research data management, and the activities in this area are growing and scaling up. Due to increased focus on this area, it was recommended that the classification change to Vulnerable with 2023 trend of ‘Material improvement’. |
||
2024 Interim Review These risks remain on the same basis as before, with no significant trend towards even greater or reduced risk (‘No change’ to trend). A Council member recommended that, to add further clarity, it might be worth differentiating use cases—for closed research data sets produced and documented in accordance with good practice and appended to a journal article, and for closed research data sets produced and documented in accordance with good practice and transferred to a repository that does not have sufficient subject-matter expertise or funding commitment to ensure reliable or ongoing preservation for the long term. |
||
Additional Comments A number of aggravating conditions—those relating to poorly formed ingest and quality assurance procedures, rapid churn of staff, incoherent patterns of subject matter, lack of domain knowledge, no or very small numbers of users, weak or absent collecting policy, and deposit to ensure minimal compliance with funder mandate—are problems with some repositories, not all repositories. Presenting different use cases can tease apart the use case for supplementary materials appended to journals (e.g., which CLOCKSS and Portico preserve) and those in repositories that are perhaps not tailored for this use case. Cases where data is transferred to a repository that does not have sufficient subject-matter expertise or funding commitment to ensure reliable or ongoing preservation for the long term are far more at risk. Research data is complex and has specific requirements for documentation which may only be known to subject matter experts. However well intended, it is risky for institutions to attempt to replicate that level of expertise across all the domains within the institution, and it can be hard for smaller publishers to make commitments to sustain data in the long term. The loss of tools, data or services within this group would impact on people and sectors around the world. Particularly those involved with reproducibility and those wishing to use the datasets for further research. Although there have been improvements in current practice, policies and workflows, there is still a significant corpus of information that was deposited before these improvements came into force. It is unlikely that there will be the time, will or resource to bring this information up to current standards. UK funders e.g. UKRI-NERC Environmental Data Service are educating researchers about data policies which mandate depositing master and raw data at the funder disciplinary repository. These repositories have a strong expertise in the research discipline ensuring data and metadata standardization and quality assurance. Any copies of datasets published in journal articles or similar are considered secondary copies and do not comply with data policy, hence risking obtaining future research funding by the institute attempting to use journal outputs as their funder-acknowledged datasets. The significance and impact of this entry specifically depends on whether it is the only copy of the dataset in existence, or whether there is another copy hosted in a data repository. Case Studies or Examples:
|
Cloud Storage
Cloud Storage
Materials routinely copied or backed up to an independently managed, off-site data storage facility and able to be restored under contractual terms |
||
Digital Species: Cloud, Integrated Storage |
Trend in 2023: No Change |
Consensus Decision |
Added to List: 2019 |
Trend in 2024: No Change |
Previously: Vulnerable |
Imminence of Action Action is recommended as required, with periodic review every five years. |
Significance of Loss The loss of tools, data or services within this group would impact on many people and sectors around the world. |
Effort to Preserve | Inevitability Loss seems likely. By the time tools or techniques have been developed, the material will likely have been lost. |
Examples Remote network storage provided by a third-party service under contracts, such as DropBox, Amazon, Microsoft Azure, Dell EMC, Google Cloud Platform, Google Drive, IBM, Rackspace, Iron Mountain, SAP, and others. |
||
‘Endangered’ in the Presence of Aggravating Conditions Lack of skills, commitment or policy from corporate owners; Encryption; lack of routine maintenance; lack of storage replication; over-dependence on a single supplier; insufficient documentation; lack of local alternative; political or commercial instability; overly aggressive compression; poor information security; lack of transparent integrity-checking; lack of strategic investment; lack of migration plan; lack of exit strategy; unenforceable penalties; unstable pricing; unpredictable removal costs; uncertainty over IPR or the presence of orphaned works. |
||
‘Lower Risk’ in the Presence of Good Practice Backup to different technology; backup to diverse locations; documentation of assets; integrity checking; preservation licensing and planning; export functionality; resilient to hacking; version control; resilient funding; technology watch; enforceable contract; disaster planning and documentation; stable pricing; budgeted removal costs. |
||
2023 Review This entry was added in 2019 to ensure that the range of media storage is properly assessed and presented. The 2021 Jury noted increased risk in light of greater reliance on the cloud and localized disruptions to cloud services over the pandemic. A 2021 trend towards greater risk was based on the wider (global) dependence on these services, especially Google Drive, for record-keeping and business workflows. The impact of loss increased with more reliance on cloud services leading to greater risk; however, this should not deter people from using cloud storage. The 2022 review agreed with this assessment but noted no significant increase in trend for 2022. The 2023 Council moved this entry to a new higher-level Cloud species as the previous Integrated Storage species worked less well (for hardware technologies). The Council agreed with the previous Vulnerable classification, with the overall risks remaining on the same basis as before so long as there are safeguards in place (‘No change’ to the 2023 trend). However, the Council noted that these safeguards may not, in all cases, be sufficient to address existing risks. Council members noted how some governments may cut off the internet in times of unrest, having a disastrous effect on access to cloud-based resources, and raised questions about the feasibility of recovering material after a major cloud vendor fails or due to malicious acts. For these materials, the significance of loss and effort to preserve is much greater, with the potential for a trend towards greater risk with the loss of existing safeguards. |
||
2024 Interim Review The 2024 Council agreed these risks remain on the same basis as before, with no significant trend towards even greater or reduced risk (‘No change’ to trend). While overall risk remains on the same basis as before, some Council members pointed out how a lack of transparency in knowledge about how a cloud service is actually built and functions is worrying from a preservation perspective. Additionally, the overall political ‘threat situation’ worldwide seems to be increasing, which means that significant changes in national political regimes can affect the predictability of how the material is handled in a cloud service and, with that, the potential for increased risk. |
||
Additional Comments To add further clarity, Council members in the Integrated Storage species group noted that there is a distinction between ‘in-house’ physical storage and cloud storage, especially if one relies on cloud storage as the only storage provider for digital content. As they understand it, this ‘Cloud Storage’ entry focuses on material copied or backed up to a third-party cloud service. This is less threatening compared to using the cloud as the sole storage provider for content preservation. The history of digital preservation suggests that the risk of vendors going out of business or shutting down services is the key issue here, over and above any specific technical solutions or risks. Case Studies or Examples:
|
Current Hard Disk Technologies
Current Hard Disk Technologies
Materials saved to storage devices with a variety of underlying magnetic or solid-state (flash) technologies that are hardwired into a computer still under warranty or supported: typically hard disks that are less than five years old. |
||
Digital Species: Integrated Storage |
Trend in 2023: No Change |
Consensus Decision |
Added to List: 2019 |
Trend in 2024: No Change |
Previously: Vulnerable |
Imminence of Action Action is recommended within five years, detailed assessment within three years. |
Significance of Loss The loss of tools, data or services within this group would impact on many people and sectors. |
Effort to Preserve | Inevitability Loss of material in this group could be entirely avoidable if provided the means to deploy proven tools and techniques. |
Examples Direct Attached Storage (DAS) such as magnetic or solid-state drives integrated into individual laptops or workstations and into smaller scale storage facilities. |
||
‘Endangered’ in the Presence of Aggravating Conditions Encryption; poor handling; poor storage; lack of consistent replication; failure of external (dependencies, e.g., suppliers, security); political or commercial interference; failure of internal dependencies (e.g., power supply, disk controller); overly aggressive compression; poor information security; lack of integrity-checking; lack of strategic investment; lack of warranty; unenforceable warranty; Uncertainty over IPR or the presence of orphaned works. |
||
‘Lower Risk’ in the Presence of Good Practice Backup to different technology; backup to diverse locations; documentation of assets; integrity checking; preservation planning; refreshment planning; export functionality; resilient to hacking; selection and appraisal criteria; version control; resilient funding; technology watch; enforceable warranty; disaster planning. |
||
2023 Review This entry was added in 2019 to ensure that the range of media storage is properly assessed and presented. It was reviewed in 2021 with a noted trend towards greater risk in light of the continued shift towards reliance on cloud storage with computers increasingly reducing hard disk for solid-state storage and commercial motivations for less support, and reviewed in 2022 with no noted increase in trend towards even greater or reduced risk. The 2023 Council agreed with the current Vulnerable classification, with overall risks remaining on the same basis as before (‘No change’ to trend), while also noting a slight decrease in the effort needed to preserve and the imminence of action required when compared to the 2021 Jury review. |
||
2024 Interim Review The 2024 Council agreed These risks remain on the same basis as before, with no significant trend towards even greater or reduced risk (‘No change’ to trend). There were also noted areas of overlap with the Portable Media species group (See: ‘Current Portable Magnetic Media’). As people increasingly select other storage methods, such as cloud, they are less likely to maintain existing content on portable hard disks, which means the portable hard disks are more likely to be overlooked or ignored (e.g., left in drawers) rather than checked and refreshed. Questions arise concerning hard drives and SSDs packaged as portable devices, and for this reason, further cross-species review is recommended for the next 2025 review. |
||
Additional Comments There are also indications of increasing prevalence of soldered-in flash storage which cannot easily be accessed in the case of device failure. Case Studies or Examples:
See also:
|
Recently Commissioned or Completed Media Art
Recently Commissioned or Completed Media Art
Media art currently displayed in a gallery or in the process of being displayed. |
||
Digital Species: Media Art |
Trend in 2023: No Change |
Consensus Decision |
Added to List: 2019 |
Trend in 2024: No Change |
Previously: Vulnerable |
Imminence of Action Action is recommended within twelve months, detailed assessment is a priority. |
Significance of Loss The loss of tools, data or services within this group would impact on many people and sectors. |
Effort to Preserve | Inevitability It would require a small effort to preserve materials in this group, requiring the application of proven tools and techniques. |
Examples Media art recently acquired by galleries that utilize specific hardware and software in order to be accessed or exhibited. |
||
‘Endangered’ in the Presence of Aggravating Conditions Lack of documentation to enable maintenance; Uncertainty over IPR or the presence of orphaned works; complex interdependencies on specific hardware, software or operating systems; lack of capacity in the gallery or workshop; lack of strategic investment; complex external dependencies; lack of documentation about artist intent; lack of understanding of costs for display and preservation. |
||
‘Lower Risk’ in the Presence of Good Practice Strong documentation; clarity of preservation path and ensuing responsibilities; proven preservation plan; capacity of workshop to support artwork at de-installation; capacity of gallery to conserve after de-installation; capacity of gallery to re-install work; funding understood to re-install. |
||
2023 Review This entry was added in 2019 as a separate entry, but it was previously introduced in 2017 under ‘Media Art’ with particular reference to historical media art. It was added for greater specificity for its recommendations, to represent works acquired and commissioned in the last five years where there is a reasonable expectation that documentation has been produced or could still be obtained. While the 2020 Jury found no change in trend, the 2021 Jury discussed how prospects for long-term preservation depend entirely on whether the artwork is collected post-commission and by an organization with the resources to care for it. They agreed that the classification remains Vulnerable but with a trend towards greater risk because the imminence of action is time-sensitive, requiring working with the artist to get the documentation from them about their work and what is needed before it is too late. Furthermore, there remains a vulnerability for the smaller museums or others that do not take the preservation of media art as seriously. The 2023 Council agreed with the Vulnerable classification with overall risks remaining on the same basis as before (‘No change’ to trend), although noted a change in the imminence of action from 3 years to 12 months. |
||
2024 Interim Review The 2024 Council agreed These risks remain on the same basis as before, with no significant trend towards even greater or reduced risk (‘No change’ to trend). However, it was important to note that the ‘Effort to Preserve | Inevitability’ can vary. Some of the works take a huge effort to preserve and that perhaps this needs some middle ground in terms of communicating those aspects. |
||
Additional Comments By the time digital art, time-based media, etc., has entered into the permanent care of a stewarding institution, many of its technologies are already end-of-life, unsupported, or the hardware components have deteriorated. Often the expertise to maintain these many interacting components sits outside the host organization, with a technical supplier to the gallery, and this is in itself vulnerable to business change. Although there are a few exceptions, there is a need for greater capacity within the museum and gallery sector to address the challenges. There have been new initiatives for guidance and examples of institutions taking wider sectoral responsibility for standards, which have helped with the effort to preserve, such as Matters in Media Art information resource and guidance. Media artworks are often made with a network of knowledge that can be precarious. Documentation around production processes can be minimal, and hence acting quickly with known processes can gather information before the knowledge and people networks start to disperse. This can mean preservation of production environments and associated workflows can be preserved alongside the media. Some art works specifically leverage the limitations and characteristics of the systems that they incorporate, often in unusual ways. This can be hard to migrate or emulate accurately. Case Studies or Examples:
See also:
|
Documents presented in PDF (Portable Document Format) format (ISO 32000:1 and ISO 32000:2) and other data wrapped inside them, including all variants and versions, including PDF/A. |
||
Digital Species: Formats |
Trend in 2023: No Change |
Consensus Decision |
Added to List: 2017 |
Trend in 2024: No Change |
Previously: Vulnerable/Endangered |
Imminence of Action Action is recommended as required, with periodic review every five years. |
Significance of Loss The loss of tools, data or services within this group would impact on people and sectors around the world. |
Effort to Preserve | Inevitability It would require a small effort to preserve materials in this group, requiring the application of proven tools and techniques. |
Examples Documents stored offline, or online in repositories or EDRMS, including reports, agenda, minutes, correspondence, contracts, essays, articles, or research papers, PDF 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7 and 2.0. PDF/A, PDF/X and PDF/E. |
||
‘Endangered’ in the Presence of Aggravating Conditions Lack of skills, commitment or policy from corporate owners; loss of context; loss of authenticity or integrity; external dependencies; poor storage; lack of understanding; significant diversity of data; poorly developed digitization specifications; lack of integrity checking; poorly developed migration or normalizations specifications; lack of virus control; poor storage or replication; lack of validation at the point of creation; encryption; uncertainty over IPR or the presence of orphaned works. |
||
‘Lower Risk’ in the Presence of Good Practice Well-managed data infrastructure; preservation planning; authenticity managed; use of persistent identifiers; reduction of dependencies; application of records management standards; recognition of preservation requirements beyond formats; strategic investment in digital preservation; preservation roadmap; clear licensing to enable digital preservation and deposit in a trusted archive; participation in digital preservation community; format validation; version control. |
||
2023 Review A PDF entry was added in 2017 and was split into two entries, ‘PDF/A’ and ‘PDF other than PDF/A’, in 2019 to emphasize the different threats faced by different types of PDF. The 2021 Jury agreed with this decision and noted that trends for the PDF other than PDF/A entry and the PDF/A entry were both towards a reduced risk. The 2022 Taskforce agreed these risks remain on the same basis as before, with no significant trend towards even greater or reduced risk (‘No change’ to trend). The 2023 Council recommended merging the two previously split entries of ‘PDF/A’ and ‘PDF other than PDF/A’. After reviewing the two entries separately, they found more similarities than differences between the two and, indeed, across all types of PDF (not just PDF/A). Due to the level of commercial, open-source tools that are available to assist preservation, the risk of loss was less persistent than previously suggested. Therefore, a Vulnerable classification was assigned as the most appropriate for all PDF formats as a whole. |
||
2024 Interim Review The 2024 Council agreed these risks remain on the same basis as before (‘No change’ to trend). |
||
Additional Comments There is a lot of material produced and kept in PDF. Some of it is authoritative, in other words, the only available copy, while some of it is not. However, if it is the only copy and it is lost, it can have an impact on a lot of people The challenge in evaluating the significance and impact of the loss of PDFs is that they’re quite often a surrogate of something else, whether a digitized record or a Word document, etc. Whether or not that record is retained may be a factor. We should also be considering PDF Portfolios, which are an extension of PDF 1.7. Portfolios contain embedded files and can include text documents, spreadsheets, PowerPoints, emails, Computer Aided Design (CAD) drawings. Vulnerability also depends on if the PDF file conforms to the specific PDF/A standard or not. This is caused by a combination of 1) not conforming to the standard and 2) collection managers assuming that the file is resilient simply because it purports to be a PDF/A. This risk is less with the format and more with the understanding and experience in data management. Moreover, materials embedded in or attached to PDF/A-2 and PDF/A-3 may be at risk. See also:
|
Published Research Papers
Published Research Papers
Completed research papers published in serials, monographs or theses which fall under specific collecting policies of research libraries or archives and are managed through dedicated repository infrastructures. |
||
Digital Species: Research Outputs |
Trend in 2023: No Change |
Consensus Decision |
Added to List: 2017 |
Trend in 2024: No Change |
Previously: Vulnerable |
Imminence of Action Action is recommended within three years, detailed assessment within one year. |
Significance of Loss The loss of tools, data or services within this group would impact on people and sectors around the world. |
Effort to Preserve | Inevitability It would require a small effort to preserve materials in this group, requiring the application of proven tools and techniques. |
Examples Published research papers in scholarly E-Books and Electronic Journals; Electronic theses (E-theses). |
||
‘Endangered’ in the Presence of Aggravating Conditions Lack of skills, commitment or policy from publishers; uncertainty over IPR or the presence of orphaned work; embedded complex objects; unstable funding for repository; lack of strategic investment; complex external dependencies; lack of persistent identifiers; bespoke formats; lack of legal deposit mandate. |
||
‘Lower Risk’ in the Presence of Good Practice Strong documentation including intellectual property rights; clarity of preservation path and ensuing responsibilities; credible preservation plan; proven capacity of repository; legal deposit preservation copying; post-cancellation access service; persistent identifiers used consistently; non-proprietary formats used and validated; minimal or well managed external dependencies. |
||
2023 Review This entry was added in 2017 under 'Published research outputs,' though without reference to the capacity of the repository infrastructure. The 2019 Jury amended it to presume the existence of repository infrastructure and noted that the aggravating conditions (which introduce risks) and good practice enhancements (which reduce it) are most relevant to repository operations. While the 2020 Jury found no change in trend, the 2021 Jury agreed it should remain Vulnerable and discussed improvements and initiatives towards the preservation of research data and outputs, pointing to a 2021 trend towards reduced risk. The 2022 Taskforce agreed risks were on the same basis as before (no change to the trend). The 2023 Council agreed with the Vulnerable classification and risks remained on the same basis as before (‘No change’ to trend), also noting a slight decrease in imminence of action with no significant trends towards greater or reduced risk. Additionally, the 2023 Council recommended that a nomination received for a new ‘E-theses’ entry would provide a valuable example to this entry rather than as a new, standalone entry. The 2023 Council recognized that further scoping and input are needed for this entry and recommended that the next major review revisit and restructure the entry. |
||
2024 Interim Review These risks remain on the same basis as before, with no significant trend towards even greater or reduced risk (‘No change’ to trend). |
||
Additional Comments The 2023 nomination for E-theses highlights distinct risks tied to these digital published materials. E-theses tend to be sole documents which when published by universities may get harvested into other aggregators or resources but in many cases the only copy (with no physical/analogue copy) sits on an Institution's repository. In addition, many are deposited in PDF format (of many varieties and many don't even attempt to use PDF/A etc.) risking long term accessibility and re-use. However, the breadth of risks goes beyond just the PDF variety, as e-theses often include databases, audiovisual materials, websites, and more. The loss of tools, data or services within this group would impact on people and sectors around the world. Particularly those involved with reproducibility and those wishing to use the datasets for further research. Although there have been improvements in current practice, policies and workflows, there is still a significant corpus of information that was deposited before these improvements came into force. It is unlikely that there will be the time, will or resources to bring this information up to current standards. See also:
|
Local Network Storage
Local Network Storage
Materials routinely copied or backed up to locally managed data storage facilities and able to be restored under institutional service arrangements. |
||
Digital Species: Integrated Storage |
Trend in 2023: No Change |
Consensus Decision |
Added to List: 2019 |
Trend in 2024: No Change |
Previously: Vulnerable |
Imminence of Action Action is recommended as required, with periodic review every five years. |
Significance of Loss The loss of tools, data or services within this group would impact on many people and sectors. |
Effort to Preserve | Inevitability Loss of material in this group could be entirely avoidable if provided the means to deploy proven tools and techniques. |
Examples Institutional or departmental network storage and institutional data centers based on technologies such as (NAS) Network Attached Storage, (SAN) Storage Area Networks, Gluster FS and related. |
||
‘Endangered’ in the Presence of Aggravating Conditions Encryption; lack of routine maintenance; lack of storage replication; over-dependence on a single supplier, technology or technician; insufficient documentation; single point of failure; political or commercial interference; failure of dependencies (e.g., power supply, controller software); overly aggressive compression; poor information security; lack of integrity-checking; lack of strategic investment; lack of warranty; unenforceable warranty, encryption; Uncertainty over IPR or the presence of orphaned works. |
||
‘Lower Risk’ in the Presence of Good Practice Backup to different technology; backup to diverse locations; documentation of assets; integrity checking; preservation planning; refreshment planning; export functionality; resilient to hacking; selection and appraisal criteria; version control; resilient funding; technology watch; enforceable warranty; disaster planning and documentation. |
||
2023 Review This entry was added in 2019 to ensure that the range of media storage is properly assessed and presented. The 2023 Council agreed with the current Vulnerable classification with overall risks remaining on the same basis as before (‘No change’ to trend), while also noting a slight decrease in the effort needed to preserve and the imminence of action required when compared to the 2021 Jury review. |
||
2024 Interim Review The 2024 Council agreed These risks remain on the same basis as before, with no significant trend towards even greater or reduced risk (‘No change’ to trend). |
||
Additional Comments There has been a renewed interest in tape as offline storage is the only sure protection against advanced ransomware. See also:
|
Pension, Mortgage and Insurance Records
Pension, Mortgage and Insurance Records
Records of transactions for long-lived financial products and services contracted between individuals and corporations. These records typically contain or depend on significant amounts of personal information and outlast the infrastructure on which they were created. |
||
Group: Sensitive Data |
Trend in 2023: No Change |
Consensus Decision |
Added to List: 2017 |
Trend in 2024: No Change |
Previously: Vulnerable |
Imminence of Action Action is recommended within three years, detailed assessment within one year. |
Significance of Loss The loss of tools, data or services within this group would impact on many people and sectors. |
Effort to Preserve | Inevitability It would require a small effort to preserve materials in this group, requiring the application of proven tools and techniques. |
Examples Applications, correspondence and ancillary records relating to pensions, mortgages and insurances and other contracts of long duration. This includes corporate databases, email, web archives and EDRMS, and may require some coordination of paper, microfiche, born-digital and digitized records. These records often include the scope and duration of the contract as well as any agreed changes during the lifetime of the product. It may also include evidence of mis-selling or other sharp practice, which only becomes apparent after the fact. This entry pertains to corporate records rather than personal records. |
||
‘Endangered’ in the Presence of Aggravating Conditions Lack of corporate preservation planning; lack of preservation within the procurement of corporate systems; companies conflating backup with preservation; loss of integrity and authenticity; loss of context and connections to provide meaning; lack of preservation capability within agencies; lack of preservation voice at executive level; poor planning and roadmap for corporate infrastructure; proliferation of legacy systems; slapdash procurement or migration of new systems; mergers and acquisitions leading to confusion of corporate systems; lack of compliance, audit or accountability at operational levels; encryption; Uncertainty over IPR or the presence of orphaned works owners. |
||
‘Lower Risk’ in the Presence of Good Practice Backup and documentation; use of open formats and open source software; considered data management planning; licencing that enables preservation; preservation capability in designated repository; resilient to hacking; selection and appraisal in place; authenticity and integrity of records managed; resilient funding and recognition at executive level; technology watch; regular preservation audits; accreditation and participation in the professional preservation community. |
||
2023 Review This entry was added in 2017 but was outside the competence of the judges to assess at that time. It was assessed in 2019 with additional expertise invited to the panel to support this assessment and reviewed again in 2020. The 2021 Jury agreed with the 2019 assessment and subsequent 2020 review, which classified these digital materials as Vulnerable with no trend towards greater or reduced risk. The 2023 Council agreed with the Vulnerable classification with the overall risks remaining on the same basis as before (‘No change’ to trend). |
||
2024 Interim Review The 2024 Council recommends that a major rescoping of the Sensitive Data species is necessary, with plans to remove it as a species and incorporate key elements and examples to relevant entries for the next 2025 Bit List. This is because it is not clear how sensitive data works as a species, when many of the other species mentioned could have sensitive data concerns, and the sensitivity of the data is more like an extra category of risk that potentially applies across any species. |
||
Additional Comments The importance of retaining documentation in any kind of legal agreement offers this kind of material more protection than most but legal organizations may conflate backup with preservation and not always have consistent records management systems. See also:
|