The table below lists some of the metadata fields that you may wish to capture from an EDRMS or other record keeping system when migrating records of long term value to a preservation system.
Recognising that different organisations may require different fields depending on their context and the anticipated future users and use cases of the records, a set of metadata fields are listed with some description and notes and a list of reasons why it might be important to capture in particular contexts.
It should be noted that not all record keeping systems will capture and store all of the metadata fields described below. Many of the fields may be commonly found in EDRMS, but perhaps not other less controlled systems in which records are stored.
Decisions on which metadata to capture will need to factor in the following considerations:
-
Does the record keeping system store this information?
-
Can this information be extracted from the record keeping system?
-
Can this information be stored within the digital archive?
Record level metadata
You may wish to capture the following metadata at record level:
Metadata field |
Definition |
Notes |
Why you might need this |
File name |
The file name of the record as stored in the record keeping system |
Note that the system may allow duplicate file names or may allow file names to include special characters that may cause problems once the files are exported into a file system (e.g. \/:?*”<>|). If this is the case, files may be renamed on export and it is important to ensure that the metadata includes details of the original file name of the object as stored in the system. |
You should consider capturing this information in the following circumstances:
|
File format |
The file format of the record as defined in the system |
An EDRMS or other system may record the file format of each record. This may not be as thorough or accurate as the file format identification that you would wish to carry out within a digital archive (for example it may state the file is a PDF but not which version). It seems likely that file format identification would be carried out outside of the system, either as a pre-ingest step or as a part of the ingest process as records move into the digital archive. |
You should consider capturing this information in the following circumstances:
|
Previous file format or file extension |
The previous file format or extension of a record |
In certain circumstances, a record keeping system may change the format of a file on capture or upload. An example that has been noted is the conversion of emails to a format specific to the EDRMS in which they are stored. If a conversion such as this has occurred, there may be evidence of this within the metadata. |
You should consider capturing this information in the following circumstances:
|
MIME type |
The MIME type of the record as defined in the system |
The system may store information about the MIME type of each record, but also is typically captured as part of pre-ingest or ingest routines within a digital archive. |
You should consider capturing this information in the following circumstances:
|
File size |
The size of the record (in KB/MB as appropriate) |
The record keeping system may store information about the file size of each record, but this information is also typically captured as part of pre-ingest or ingest routines within a digital archive. |
You should consider capturing this information in the following circumstances:
|
Number of files |
The number of files that make up a single record within the system. For example this may apply to the contents of ZIP file, emails with attachments or number of messages within a PST file. |
This metric will only apply to certain records. Note that metadata about number of files in total within a transfer or export is discussed under ‘Transfer level metadata’. |
You should consider capturing this information in the following circumstances:
|
Digital object specific dimensions |
This metadata would be specific to particular types of digital object and could include:
|
The system may contain metadata relating to the dimensions of digital objects and this will be specific to the types of records contained within it. |
You should consider capturing this information in the following circumstances:
|
Language |
Language of the digital object. |
|
You should consider capturing this information in the following circumstances:
|
Character encoding |
Character encoding of the digital object. For example ASCII, Unicode, UTF-8. |
|
You should consider capturing this information in the following circumstances:
|
Unique identifier (system generated) |
The unique reference of a record within originating system (typically assigned automatically) |
Note that there may be more than one version of this identifier that can be captured. The identifier may reflect the function, context or structure of the record and how it was used. |
You should consider capturing this information in the following circumstances:
|
Agency assigned identifier |
Catalogue or local identifier of the record within the system (typically assigned by a human operator) |
May reflect the function/context of the object and how it was used. |
You should consider capturing this information in the following circumstances:
|
Previous identifier |
A previous identifier allocated to a record |
A previous identifier metadata field may be of value where records have previously been migrated from another system. The previous identifier field may be particularly important If relationships between documents are defined using these identifiers. |
You should consider capturing this information in the following circumstances:
|
Title |
Title or short description of the record |
Sometimes records may not have meaningful titles assigned, or a set of records will share a very generic title. In some cases a short description field may be present instead of a title. |
You should consider capturing this information in the following circumstances:
|
Description |
More detailed description of the digital object |
|
You should consider capturing this information in the following circumstances:
|
Export date |
Date record was exported from the system |
This date does not exist within the system but may be included as part of an export or transfer process. Can help demonstrate provenance. May also help with disaster recovery. |
You should consider capturing this information in the following circumstances:
|
Creation date |
Date record was originally created |
Note that this date may still be attached to the files as system info once the record is extracted, but system dates are vulnerable to change so extracting this date as metadata is a sensible precaution. Note that this date may reflect the date a record was originally uploaded to the system rather than the original creation date. |
You should consider capturing this information in the following circumstances:
|
Last modified date |
Date record was last modified |
Note that this date may still be attached to the files as system info once the record is extracted, but system dates are vulnerable to change so extracting this date as metadata is a sensible precaution. The system may be configured to capture a full audit trail, including dates of all edits to a record. Consider what level of detail is required for the digital archive. |
You should consider capturing this information in the following circumstances:
|
Date folder was closed |
The date an folder was closed may act as a trigger date for export to digital archive. |
Depending on local practices, this action may be manually applied or automatically generated. |
You should consider capturing this information in the following circumstances:
|
Review date |
If a record is closed to the public this is the date it needs to be reviewed to see If it can be opened (unless a date open is already recorded - see below). |
Note that this may be more broadly categorised as date of next action (where other proposed actions relating to a record are recorded) |
You should consider capturing this information in the following circumstances:
|
Date open to public |
The date a record can be (or was) opened for public access. |
|
You should consider capturing this information in the following circumstances:
|
Date that the file became a record |
The date that a file is marked as a record. |
This may be a feature of some EDRMS and will depend on local practices. Depending how the field is used in practice, it may not be particularly meaningful. For example sometimes a file may be marked as a record years after the record was created and/or last edited. |
You should consider capturing this information in the following circumstances:
|
Disposal date |
Date that record can be disposed of. |
This field will not be applicable to all organisations and implementations, but in some cases records transferred to archive will need to be disposed of at a later date. |
You should consider capturing this information in the following circumstances:
|
Creator |
Individual or group primarily responsible for creating the record |
There may be more than one - depending on context you may want to record more granular roles. Note that there may be issues relating how to this is configured within the system (for example just as an identifier, which would need additional information to interpret). Important to ensure you get the details you need. Note also that there may be inaccuracies within the metadata. Systems and local practices will vary, but ensure you understand how it was generated. Is it added manually, extracted from the embedded metadata of a document or does the system generate it based on who uploaded the record (which may be different to who created the document)? |
You should consider capturing this information in the following circumstances:
|
Creating organization |
Details of organization responsible for creating record |
As above there may be issues relating how to this is configured in the system (for example just as an identifier, which would need additional information to interpret). It is important to ensure you get the details you need. Note also that there may be inaccuracies within the metadata. Systems and local practices will vary, but ensure you understand how it was generated. Is it added manually or does the system generate it based on who uploaded the record (which may be different to who created the document)? |
You should consider capturing this information in the following circumstances:
|
Edited by |
Information about who has edited the record since creation |
Record keeping systems may capture a full audit trail for a record, including details of any edits made. You may want to capture this information alongside edit dates (described above under ‘last modified date’) |
You should consider capturing this information in the following circumstances:
|
Classification code |
Classification code |
Also relevant is the record identifier (discussed earlier) |
You should consider capturing this information in the following circumstances:
|
Classification |
Human readable description of above code |
The classification code may consist of a series of acronyms which are hard for a user to interpret. The system may also store a more human readable description of this code |
You should consider capturing this information in the following circumstances:
|
Permissions |
Who has rights to read/copy/edit a record within the system |
May be applicable in some circumstances - depends on the context. Can cover a variety of things. |
You should consider capturing this information in the following circumstances:
|
IPR and holder |
Including copyrights |
|
You should consider capturing this information in the following circumstances:
|
Checksum |
Checksum for the record |
Alongside the checksum itself it may also be helpful to extract details of the date the checksum was generated and the algorithm used. Note that if a batch of records have been imported into an EDRMS or other system they may have come with a checksum. It would also be useful to capture this information about previous checksum If it is present. |
You should consider capturing this information in the following circumstances:
|
Versioning |
The version of the record |
Multiple versions of any one record may exist within the system |
You should consider capturing this information in the following circumstances:
|
Location within folder structure or hierarchy |
Records within an EDRMS or other record keeping system may be placed in a particular structure/hierarchy or ‘tagged’. |
Where a record sits within a structure can give valuable context to a record. It may not make sense once it is moved out of this structure. The location of the record within the structure should be captured in some way, this may or may not be through the metadata export. |
You should consider capturing this information in the following circumstances:
|
Relationships with other records |
Relationships with other records (not apparent through the folder structure or hierarchy) |
Relationships with other records within the system may be present in other ways outside of the relationships described through a record hierarchy or folder structure within the system. For example an email record may contain an attachment or multiple files may form a single logical record (for example a GIS layer or a website) |
You should consider capturing this information in the following circumstances:
|
Other descriptive metadata |
Other descriptive metadata that exists within the system |
Local practices will dictate what additional descriptive metadata is contained within any record keeping system and this will typically be used to help current users with locating and interpreting the records. |
You should consider capturing this information in the following circumstances:
|
Transfer level metadata
You may wish to capture the following metadata at for the batch of records as a whole (rather than at record level):
Metadata field |
Definition |
Notes |
Why you might need this |
Total number and total size of files/records |
The number of and size of files and/or records extracted from the system |
Totals for records and files may be different (for example one record may consist of multiple files) so two different figures may need to be captured here. |
You should consider capturing this information in the following circumstances:
|
System details |
Details of the system that the records are being transferred from (for example name and version) |
This information may need to be captured manually and incorporated into the metadata for each record. Additional documentation may also be required (see below) |
You should consider capturing this information in the following circumstances:
|
Additional documentation
Some organizations will also wish to capture a full set of system documentation relating to the record keeping system and how it was configured and used. This may include a data dictionary, records management policy and procedure, users manual and documentation relating to the configuration or set up of the system. This level of documentation will provide an additional level of detail about the system and provide context for the records that are being preserved.