Amanda Tomé is Preservation Coordinator for the Digital Research Alliance of Canada
Background
2024 is the year of file formats for the Digital Research Alliance of Canada’s Federated Research Data Repository (FRDR). It’s been a year of being confused by file formats, overwhelmed by file formats, and generally trying to figure out what to do with the file formats in our repository.
After reviewing the results of a file metrics scan undertaken on the repository in March 2024, which effectively confirmed our suspicions that many of the file formats in FRDR were not present in file format registries, I knew there was a lot of work to be done to get a better handle on the file formats in our repository. However, I wasn’t entirely sure how to get started or the approach to undertake. I also wanted this work to benefit the Canadian research data community and the digital preservation community.
Turns out, I didn’t need to start from scratch or reinvent the wheel. The digital preservation community has developed many wonderful resources that are publicly available. I was able to tap into these resources, modify them as needed and incorporate the resources into our digital preservation work.
I decided that the file format work would have many phases, the first being to enhance the identification of files when datasets were deposited in FRDR. This work would involve creating file format signatures for the unknown and the misidentified file formats we found.
Digital Preservation Community to the Rescue
I had never created a file format signature, and I needed some guidance on how to start. My first stop on the file format signature development tour was the wonderful PRONOM Starter Pack. Not only did the Starter Pack provide a great guide to understanding file format signatures and the tools needed to develop signatures, the links to external resources, such as blog posts, which outlined experiences related to file format signature development, were also invaluable in helping shape my understanding of this work.
Next up on the tour is the Registries of Good Practice project. It happened that around the time I started thinking about file formats in FRDR, the Digital Preservation Coalition and Yale University Library’s Registries of Good Practice project was about to kick off. Attending and participating in the monthly meetings has been helpful in understanding the registry landscape, the diverse tools that are available, like the format aggregator, and it was particularly exciting to see the development of the DigiPres Workbench as an output of this project (thank you, Andy Jackson!).
Admittedly, even with the Starter Pack, the blog posts, and the Registries of Good Practice project, there was still a bit of apprehension as to whether I possessed the necessary skills to develop file format signatures. Luckily, PRONOM Drop-in meetings exist.
The PRONOM Drop-in sessions are a fantastic way to hear about the file formats the community is encountering, and it provides an inviting forum to collaborate and ask questions about signature development. As a lone digital preservationist, connecting with the PRONOM community was particularly helpful as it made me feel less isolated when conducting our file format work. The sessions reinforced what I had discovered during my research and added considerably to my knowledge of file format signatures. There were many, “I didn’t know you could do that,” moments throughout our journey.
Beyond the Horizon
I hope that by developing file format signatures and submitting these signatures for inclusion in PRONOM, that we are giving back to the community that has helped us get started.
But this isn’t the end of our file format tour. Phase 2 of this work will involve documenting these unknown formats so we can share the information with Canadian data repositories. It is likely we will look once again at what the digital preservation community has developed to help do this work.
Without the resources developed by the community and without the opportunity to have a space to ask questions about file format signature development, this work would likely not have gotten off the ground.
Thank you all for your hard work and for making these resources available!