Niamh Murphy is the Digital Preservation Librarian for the University College Dublin Library.
A few years ago, I published a series of blog posts for the DPC, where I outlined the benefits of using Brunnhilde and provided a beginner's guide to its installation and use. Since then, I’ve received feedback from members of the digital preservation community, who have incorporated this resource into their training procedures and workflows. However, significant updates have been made to Brunnhilde and its dependencies since those posts were written.
As I continue to use Brunnhilde in my day-to-day work at UCD Library - particularly during an audit of our digital holdings - I’ve begun revising my documentation to reflect these updates.
At iPRES, I casually mentioned to peers that I was considering releasing an updated version of the documentation, and to my surprise, there was great interest.
So, here we are!
First, a quick overview of Brunnhilde: Created by Tessa Walsh, it builds upon Richard Lehane’s Siegfried software for file format identification and generates comprehensive reports on data such as file formats, dates, and duplicates, etc. which assist digital preservation professionals in managing and understanding digital collections. Brunnhilde’s reports also provide clickable links to PRONOM for more detailed file format information, and the software integrates additional tools like ClamAV for virus scanning and Bulk Extractor for scanning of sensitive data.
One of the most notable recent updates is the expansion of Windows support. Brunnhilde now offers support for running ClamAV (with thanks to code contributions from Kieran O'Leary) and Bulk Extractor in Windows environments. Accordingly, I’ve included instructions for installing Brunnhilde in a Windows environment to the guide, to help more users navigate the setup process.
With these features and updates, Brunnhilde remains a hugely beneficial tool for digital preservation professionals, and with that I’m pleased to share an updated and improved Brunnhilde Installation and User Guide.
With the release of this guide on World Digital Preservation Day 2024, my wish is to celebrate the digital preservation community, highlighting our shared commitment to enhancing our practices and resources.
With that in mind, I now pass the floor to Kieran O’Leary from the National Library of Ireland and Raelene Casey from University College Cork, who will discuss their experiences with Brunnhilde and the benefits it has brought to their respective institutions.
Kieran O’Leary is the Digital Repository Services Manager for the National Library of Ireland.
Brunnhilde is an essential component of our pre-ingest workflow in the National Library of Ireland when working with born-digital archival collections. NLI staff create a Brunnhilde report for every transfer that we receive. We have a dedicated written procedure for using and interpreting the outputs of the tool. In keeping with our Total Cost of Stewardship approach to this work, the reports generated by Brunnhilde aids our decision-making in relation to planning the processing of collections, for example with cataloguing, getting a sense of the scope and complexity of a collection, or the scale of duplicates within the material.
Initially, we used Brunnhilde within the Bitcurator environment, as a means of launching several tools at once without the overhead of writing a customised automated script ourselves. We included the ClamAV and Bulk Extractor flags as part of the workflow, but increasingly, we began to run all of those tools individually outside of BitCurator in a native Windows environment. This move away from BitCurator was initially driven by the need to adapt to remote working during the pandemic, rather than relying on on-site Forensic Workstations that were not connected to any network. We found that even when returning to the offices, working in Windows allowed for more flexibility and streamlined our workflows. The Siegfried report itself that is generated by Brunnhilde can be reused for many purposes, and having the beautiful Brunnhilde HTML report is a goldmine for having an easily digestible summary.
One of the great advantages of Brunnhilde is that it’s open source, meaning we were able to contribute code to add extra support for running ClamAV in Windows, which wasn’t possible before. For more info, see here: Add clamav Windows Support (#55) · tw4l/brunnhilde@373c167 · GitHub . Niamh Murphy’s guide from the Digital Preservation Coalition has been a helpful reference for us, especially for more detailed aspects of the workflow, like the tricky installation of ClamAV and setting up the configuration files.
Even though the NLI is exploring Forensic Toolkit (FTK 8) for some processing of collections which could reduce our reliance on Brunnhilde and Bulk Extractor, Brunnhilde’s reports provide unique insights that will continue to be relevant to our work with born-digital material in the NLI for the foreseeable future.
Raelene Casey is the Digital Archivist in UCC Library, University College Cork.
In this first year of developing a digital archive in UCC Library, one of our primary tasks has been the creation of a Digital Asset Register (DAR). A key objective in creating the DAR has been to ensure that decisions for requirements gathering for long-term digital preservation solution will be focused on a thorough understanding of the content held by UCC Library. This also means anticipating how that content will grow and change over the next decade.
With this in mind, we wanted the information gathered on the register to be as accurate as possible.
Enter: Brunnhilde. After reading Niamh Murphy’s DPC blogs from 2022, I started using the reporting tool on Brunnhilde to help me build out a much more accurate picture of the data registered on the DAR.
I use Brunnhilde to record earliest creation date, file count, duplication count, format, format versions and size in gigabytes. Brunnhilde also helps speed up searches for controlled vocabularies as the MIME type for each identified format is printed on the html report.
The Siegfried function has ensured that every single file within a storage space can be identified and validated, even if it has an extension mismatch, or has no extension at all.
Abhijeet Madhusudan Rao, a Data Science Masters student at UCC, worked with me in UCC Library’s internship programme, to develop scripts to help build a workflow for receipt and pre-ingest preservation activities. Abhijeet incorporated the Brunnhilde command, using Siegfried and ClamAV instances, as an argument in the metadata extraction functions in these scripts.
Our workflow now wraps each collection of objects that represents an entry in the DAR into a custom but interoperable pre-ingest information package that includes fixity as well as representation, technical and creation data. The Brunnhilde outputs are stored within these information packages. The html report is also attached to each record on the DAR so that my colleagues can access this easy-to-read data when they use the register as a high-level searching aid.
When the time comes to gather requirements for a longer-term digital preservation solution or appraise for ingest, the data gathered by Brunnhilde sits waiting for us in our DAR and in our information packages. It will help us make decisions for preservation and management of the digital objects we hold and use the data collected as a tool for advocacy to ensure that digital preservation activities are adequately supported and resourced.
Acknowledgements:
Thanks are due to Abhijeet Madhusudan Rao, Niamh Murphy and Paul Davidson.
Scripts used in UCC Library’s pre-ingest workflow can be found here: https://github.com/UCC-Library/UCC-Library-DigiPres.