Illustration by Jørgen Stamp digitalbevaring.dk CC BY 2.5 Denmark
Introduction
This section provides a brief novice to intermediate level overview of metadata and documentation, with a focus on the PREMIS digital preservation metadata standard. It draws on the 2nd edition of the DPC Technology Watch Report on Preservation Metadata. The report itself discussies a wider range of issues and practice in greater depth with extensive further reading and advice (Gartner and Lavoie, 2013). It is recommended to readers who need a more advanced level briefing.
Metadata is data about a digital resource that is stored in a structured form suitable for machine processing. It serves many purposes in long-term preservation, providing a record of activities that have been performed upon the digital material and a basis on which future decisions on preservation activities can be made in the future, as well as supporting discovery and use. The information contained within a metadata record often encompasses a range of topics. There is no clear line between what is preservation metadata and what is not, but ultimately the purpose of preservation metadata is to support the goals of long-term digital preservation, which are to maintain the availability, identity, persistence, renderability, understandability, and authenticity of digital objects over long periods of time.
Documentation is the information (such as software manuals, survey designs, and user guides) provided by a creator and the repository that supplements the metadata and provides enough information to enable the resource's use by others. It is often the only material providing insight into how a digital resource was created, manipulated, managed and used by its creator and it is often the key to others to make informed use of the resource.
There are a number of factors which make metadata and documentation particularly critical for the continued viability of digital materials and they relate to fundamental differences between traditional and digital resources:
- Technology. Digital resources are dependent on hardware and software to render them intelligible. Technical requirements need to be recorded so that decisions on appropriate preservation and access strategies may be made.
- Change. While traditional materials may be preserved by predominantly passive preventive preservation programmes, digital materials will be subject to repeated actions, and there will be many different operators and quite possibly different institutions influencing the management of digital materials over a prolonged period of time. Recording actions taken on a resource and changes occurring as a result will provide a key to future managers and users of the resource.
- Authenticity. Metadata and documentation may be the major, if not the only, means of reliably establishing the authenticity of material following changes.
- Rights management. While traditional resources may or may not be copied as part of their preservation programme, digital resources must be copied if they are to remain accessible. Managers need to know that they have the right to copy for the purposes of preservation, what (if any) technologies have been used to control rights management and what (if any) implications there are for controlling access.
- Future re-use. It may not be possible for others to use the material without adequate documentation.
- Cost. It is expensive to create metadata manually and preservation metadata may not always be easily generated automatically. Additional metadata for digital preservation needs therefore requires careful cost/benefit trade-offs.
The PREMIS (PREservation Metadata: Implementation Strategies) Standard
PREMIS (PREservation Metadata: Implementation Strategies) is the international standard for metadata to support the preservation of digital objects and ensure their long-term usability. Developed by an international team, PREMIS is implemented in digital preservation projects around the world, and support for PREMIS is incorporated into a number of commercial and open-source digital preservation tools and systems.
The PREMIS Data Dictionary (PREMIS, 2013) is organized around a data model consisting of five entities associated with the digital preservation process:
- Intellectual Entity - a coherent set of content that is described as a unit: e.g., a book
- Object - a discrete unit of information in digital form, e.g., a PDF file
- Event - a preservation action, e.g., ingest of the PDF file into the repository
- Agent - a person, organization, or software program associated with an Event, e.g., the publisher of a PDF file
- Rights - one or more permissions pertaining to an Object, e.g., permission to make copies of the PDF file for preservation purposes
Taken together, the semantic units defined in the PREMIS Data Dictionary represent the 'core' information needed to support digital preservation activities in most repository contexts. However, the concept of 'core' in regard to PREMIS is loosely defined: not all of the semantic units are considered mandatory in all situations, and some are optional in all situations. The Data Dictionary attempts to strike a balance between recognizing that there will be a significant overlap of metadata requirements across different repository contexts, while at the same time acknowledging that all contexts are different in some way, and therefore their respective metadata requirements will rarely be exactly the same.
Implementation
Although the PREMIS Data Dictionary is not a formal standard, in the sense of being managed by a recognized standards agency, it has achieved the status of the accepted standard for preservation metadata in the digital preservation community. A strength but also a limitation of the PREMIS Data Dictionary is that it must be tailored to meet the requirements of the specific context; it is not an off-the-shelf solution in the sense that an archive simply implements the Data Dictionary wholesale. Only a portion may be relevant in some digital preservation circumstances; alternatively, the repository may find that additional information beyond what is defined in the Dictionary is needed to support their requirements. For example, the Data Dictionary makes no provisions for documenting information about a repository's business/policy dependencies, which may be needed to support preservation decision-making.
In short, each repository will need to invest some effort to adapt preservation metadata and documentation standards to its particular circumstances and requirements.
During implementation an institution normally identifies its own minimum standard of information required for catalogued items in the collection. Each institution can also identify its preferred levels of metadata and documentation for acquisitions and may notify and encourage suppliers or depositors to supply this information. Staff review and revise supplied information to ensure it conforms to institutional guidelines and they generate catalogue records for deposited data incorporating cataloguing and documentation standards to ensure that information about those items can be made available to users through appropriate catalogues. In many cases the contextual information for resources will be crucial to their future use and this aspect of documentation should not be overlooked.
The level of cataloguing and documentation accompanying or subsequently added to an item, and any limitations these may impose, can be documented for the benefit of future users. Where data resources are managed by third parties but made available via an institution, information may be supplied by the third party in an agreed form which conforms to institution guidelines or in the supplier's native format.
Where a need for enhanced access exists, an Institution may undertake to enhance documentation and cataloguing information to a higher standard to meet new requirements. Retrospective documentation or catalogue enhancement should also occur when the validation or audit of the documentation and cataloguing for a resource shows this to be below a minimum acceptable standard.
A significant number of both users and suppliers of preservation metadata have adopted PREMIS and many of the initial obstacles to implementation have been addressed by them. The process of implementing PREMIS in a working environment is made easier by a number of tools which can extract metadata from digital objects and output PREMIS XML. The PREMIS Maintenance Activity maintains a webpage listing the most important tools available for use with PREMIS. It also includes an active email discussion list and a wiki for sharing documents. For further information see Resources and case studies below.
See also related sections of the Handbook including Acquisition and appraisal, and Preservation planning.
Resources
PREMIS Data Dictionary for Preservation Metadata, Version 3.0
http://www.loc.gov/standards/premis/v3/index.html
The PREMIS Data Dictionary and its supporting documentation is a comprehensive, practical resource for implementing preservation metadata in digital archiving systems. The Data Dictionary is built on a data model that defines five entities: Intellectual Entities, Objects, Events, Rights, and Agents. Each semantic unit defined in the Data Dictionary is a property of one of the entities in the data model. Version 3.0 was released in June 2015 (273 pages).
Preservation Metadata (2nd edition), DPC Technology Watch Report
http://dx.doi.org/10.7207/twr13-03
This report focuses on new developments in preservation metadata made possible by the emergence of PREMIS as a de facto international standard. It focuses on key implementation topics including revisions of the Data Dictionary; community outreach; packaging (with a focus on METS), tools, PREMIS implementations in digital preservation systems, and implementation resources. Published in 2013 (36 pages).
Tools for preservation metadata implementation
http://www.loc.gov/standards/premis/tools_for_premis.php
The PREMIS Maintenance Activity maintains a webpage listing the most important tools available for use with PREMIS. This contains entries on tools, in addition to pointers to others which may be used to generate METS (Metadata Encoding and Transmission Standard - an XML schema for packaging digital object metadata) files in conjunction with PREMIS. The majority of the tools listed are for extracting technical metadata from digital objects and converting it for encoding within the PREMIS Object entity. Others can be used for checking formats, or validating files against checksums
PREMIS website
http://www.loc.gov/standards/premis/index.html
The PREMIS Editorial Committee coordinates revisions and implementation of the PREMIS standard, which consists of the Data Dictionary, an XML schema, and supporting documentation. The PREMIS Implementers' Group forum, hosted by the PREMIS Maintenance Activity, includes an active email discussion list and a wiki for sharing documents. The wiki is a particularly useful resource for new implementers, as it includes materials from PREMIS tutorials, a collection of examples of PREMIS usage and links to information on PREMIS tools. The PREMIS Maintenance Activity maintains an active registry of PREMIS implementations.
Documenting your data
http://www.data-archive.ac.uk/create-manage/document
An excellent set of resources to assist researchers with the documention and metadata for their research studies, drawn together by the UK Data Archive.
Archaeology Data Service Guidelines for Depositors
http://archaeologydataservice.ac.uk/advice/guidelinesForDepositors
The ADS Guidelines for Depositors provide guidance on how to correctly prepare data and compile metadata for deposition with ADS and describe the ways in which data can be deposited. There is also a series of shorter summary worksheets and checklists covering: data management; selection and retention; preferred file formats and metadata. Other resources for the use of potential depositors include a series of Guides to Good Practice, which complement the ADS Guidelines and provide more detailed information on specific data types.
Case studies
DPC case note: British Library ASR2 using METS to keep data and metadata together for preservation
http://www.dpconline.org/component/docman/doc_download/474-casenoteasr2.pdf
This Jisc-funded case study examines the 'Archival Sound Recordings 2' project from the British Library, noting that one of the challenges for long term access to digitised content is to ensure that descriptive information and digitised content are not separated from each other. The British Library has used a standard called METS to prevent this. July 2010 (4 pages).
Designing Metadata for Long-Term Data Preservation:DataONE Case Study
https://doi.org/10.1002/meet.14504701435
A short description of how PREMIS was utilized to specify the requirements for preservation metadata for DataONE (Data Observation Network for Earth) science data. 2010 (2 pages).
Preservica Case Study: Q&A with Glen McAninch, Kentucky Department for Libraries and Archives
ttps://preservica.com/uploads/resources/Preservica-Kentucky-QA-2014_NEW.pdf
Glen McAninch discusses the Importance of Provenance, Context and Metadata in Preserving Digital Archival Records.
PREMIS Implementations Registry
http://www.loc.gov/standards/premis/registry/index.php
The PREMIS Maintenance Activity maintains an active registry of over 40 PREMIS implementations with details of the repository and its use of PREMIS. Although not formally case studies, entries have details of practical experience e.g., Creating a digital repository at the Swedish National Archives using PREMIS.
References
Gartner, R. and Lavoie, B., 2013. Preservation Metadata (2nd edition), DPC Technology Watch Report 13-3 May 2013. Available: http://dx.doi.org/10.7207/twr13-03
PREMIS, 2013. Data Dictionary for Preservation Metadata, Version 3.0. Available: http://www.loc.gov/standards/premis/v3/index.html