It is important to gather information about the individual files or digital objects that make up the records. This will help inform your preservation work and will enable you to flag some of the more specific preservation challenges you face. 

Depending on the record keeping system in use, there may be no easy way to gather the necessary information. Ideally you would be able to run a tool such as DROID across the contents of the record keeping system to characterise the files and gain an understanding of the content, but depending on how the digital objects are stored, this will not always be possible. You may alternatively be able to find out about the digital objects from metadata that is held within the system itself, or by asking questions to users and administrators of the system. Gather what information you can but accept the fact that you may not be able to gain a full and clear understanding of the situation until the records have been exported and are held outside of the system.

“EDRMSs do not have digital preservation built into them, even though long-term temporary records need to be kept for 75+ years. In Australia we point agencies to sustainable formats that have the best chance of remaining accessible over time, but the business needs of agencies may require, in particular cases, formats that aren’t sustainable in the long term.” 

James Doig, National Archives of Australia

 

File formats

Try to establish which file formats are included within the system. Are there any restrictions on formats that a user can create and/or upload or are users able to save records in whatever format they like? If you are unable to run file identification tools over the records stored within the system, find out if the system metadata includes any potentially useful information such as MIME type or file extension. 

 

Complex records

Find out whether the system includes complex records that consist of multiple digital objects - for example, a webpage, an email with attachments or a geospatial record consisting of multiple files. If this is the case you will need to be particularly careful that your preservation methodology maintains the relationships between the files and their metadata and be prepared to test and check that they are captured and preserved satisfactorily.

 

File migrations

Find out whether any file format migrations have been carried out within the system and if so, whether details have been logged. Most record keeping systems won’t carry out file format migration but some examples have been noted of systems creating PDF versions of documents or altering the file format of emails. If file formats have been changed within the system, find out whether this is recorded or logged anywhere and whether this information can be extracted alongside the records.

 

Legacy formats

Find out when the digital objects were created or last modified? Have they been accessed recently? Again, some of this information can be gathered using characterisation tools such as DROID, though you may find that more detailed audit logs are available within the system metadata. Perhaps your investigations and conversations up to this point will have highlighted problems in accessing records of a certain age or held in legacy file formats, and this information will help you further understand risks and challenges related to specific records.

 

Use the information that you have gathered to better understand some of the preservation risks to the content and to gain a basic understanding of additional preservation actions you may need to take once you have these files in the digital archive.

In his case study from the National Archives of Australia, James Doig stresses the importance of gaining this understanding of the digital objects prior to transfer in order to flag up and understand some of the issues that you will encounter.


Scroll to top