Introduction
The PAWDOC collection was initiated in 1981 to support research into new office systems at the National Computing Centre in Manchester. The author continued to use the collection to manage all his documents throughout his subsequent IT career. The collection’s index contains six fields (Ref No, Title, Movement Status, Publication Date, Date Last Accessed, and Creation Date). Each of the 17,000 Index entries related to one or more of some 29,000 electronic files of a wide variety of file types stored in a Document Management System. The collection provides a unique snapshot of the development of computer use in industry, and of the impact on the day to day information load on professionals across the period of the introduction of the internet; and therefore seems worth trying to preserve.
The Project
In 2014, I set out to find a simple digital preservation workflow that I could apply to PAWDOC, and the results of that investigation are documented in the DPC Case Note of April 2016 which also included templates which are easily adapted to other personal digital collections. Those templates were updated during this applications process and are free to anyone to use:
- Scoping document
- Preservation Project Plan Description
- Preservation Project Plan Chart
- Preservation Maintenance Plan
This blog post summarises my experiences in putting the workflow and templates to use on the PAWDOC collection. The work was performed in the following three phases:
Scoping phase: Jan2017 - Feb2018
This work included completing the Scoping document and performing the tasks that the Scoping document identified as needing to be done before planning could start.
Planning phase: Dec2017 - Feb2018
During this phase the Preservation Project Plan DESCRIPTION and CHART documents were produced.
Implementation Phase: Feb2018 - May2018
The tasks specified in the Preservation Project Plan DESCRIPTION and CHART documents were completed during this phase.
Lessons Learned
The work addressed ALL aspects of preserving a personal collection. Some of these aspects (such as the removal of the expensive Document Management System (DMS) and the creation of a User Guide) may not normally be addressed within a digital preservation project, but they were considered important to the long term accessibility of the Collection. Decisions about what to do with particular files were taken with respect to what was practically feasible for the private owner, and may not follow the approach that would be taken by digital preservation specialists within institutions. Such is the reality for personal digital collections.
During the Scoping Phase the following control documents were created:
- Alternative Document Management Systems
- DROID Analysis
- Files that won't open
- Physical disks
These addressed most of the preservation issues explored in the Scoping phase, and their outcomes subsequently found their way into the tasks in the Preservation Project Plan. The following key points emerged concerning work that is done during the Scoping phase:
- It's worth establishing a comprehensive record keeping tool (probably a spreadsheet) at the beginning of the Scoping phase to support the management and reporting of the work. Columns should be included for all statistical information, so that any numbers you need are automatically calculated and visible (preferably above the column headings, not at the bottom of the spreadsheet).
- The flexibility to be unstructured and to pursue unfamiliar avenues of investigation during the Scoping phase is a huge advantage. For example, the process of trying to track down organisations and people to convert specific file types can take varying lengths of time with no certainty of success.
- Full use of the internet should be made to identify advice, guidance and conversion services. For example, the approach taken to convert Lotus Notes files was found in a discussion forum.
- Knowledgeable collaborators are extremely useful to have on hand so that they can be asked for specific guidance about which file types to convert, how to convert them, and what to convert them to.
- For small numbers of relatively unusual files, by the time the conversion process has been discovered and tried out, you may have nearly completed the task. In that case, it is probably better to finish it off within the Scoping phase rather than waiting to complete the work as part of the downstream project plan. This was definitely the best course of action for two Lotus Notes files and two MS Help files that were both dealt with in the Scoping phase of this project.
- If a solution can't be found in the Scoping phase and you don't want to spend any more time, consider including them in the Preservation Maintenance Plan for them to be worked on at a later date. That way, the project doesn't get bogged down, and the problem areas don't get forgotten.
During the Planning Phase a Project Plan DESCRIPTION document and associated Project Plan CHART were produced. The following observations emerged concerning the Planning phase:
- The Principles, Assumptions, Constraints and Risks (PACRs) are particularly powerful prompts for driving out solutions (for example, assumptions about the longevity of certain file formats). So, it's worth starting to think about them during the Scoping phase.
- Seeking input on the PACRs from experts in the field is well worth doing.
- If tasks which are still unquantifiable and which could adversely affect the project's timescales, still remain after the Scoping work has finished, consider specifying them as Risks to the project with a mitigation whereby work on the tasks concerned is stopped after a certain number of days and an entry is made in the Preservation Maintenance Plan for the tasks to be looked at in the future.
- In order to define section 6 (Project Milestones and Deliverables) of the DESCRIPTION document, it is easier to do a quick draft of the Project Plan CHART first, and then to iteratively develop the two documents (section 6 and the Chart) in parallel.
The implementation phase comprised the following main activities:
- Export files from the Document Management System (DMS)
- Adjust problem Zip files
- Adjust problem files identified by DROID
- Deal with Files That Won't Open (mainly by using the Zamzar conversion facility)
- Deal with Physical Disks (by copying them onto the laptop hard drive)
- Deal with Double Unsorted files (reordering the pages to rectify scans of front side first then reverse side)
- Revise the backup and DR arrangements
- Create Preservation Maintenance Plan
- Produce User Guide
- Close down Pawdoc DP project
The following points emerged during the Implementation phase:
- Problems were either recorded as issues in the Progress Report while a solution was investigated; or entered into the Baseline change log; or entered into Section 3 of the Preservation Maintenance Plan so that they could be investigated at a later date.
- The Weekly Progress Report was an effective tool for preventing issues from falling off the radar; and for providing motivation to complete chunks of work.
- Some files appeared in both the DROID and the 'Files that Won't Open' categories of investigation. This was confusing. It would have been well worth eliminating duplications in the spreadsheets in the Scoping Phase.
- Although ORIGINAL versions of files that have been UPDATED are being kept in the main Collection, it is not clear if this is the right long-term approach.
- It's better to overestimate timescales and come in early than underestimate and come in late.
- Despite putting a lot of effort into the Scoping phase, and in testing the DMS export utility, some unknowns were still encountered in implementation. The more investigation that can be done in the Scoping phase the better.
- If the Scoping phase takes longer than anticipated, things may change after plans are made. For example, it was planned to use Live Mail to open eml mail messages; but a system crash occurred after that decision was made and Live Mail was not available in the rebuilt system.
- The Zamzar service [Zamzar, 2018] was used very successfully to convert about 170 files, and only failed in 4 cases.
- Converting hundreds of files is undoubtedly a slog. Motivational aids such as spreading conversions across several tasks, or setting intermediate goals for individual tasks, are worth exploring.
- Combining the user guide and the backup documentation, and having quick guides to each, one on the front and back covers respectively, seems to work well.
- The ability to save UPDATED versions of MS Office files in DOCX, XLSX and PPTX format makes it easy to distinguish those files from older versions with DOC, XLS and PPT extensions.
Overall the project was a success - all major objectives were achieved and the Collection now has a Preservation Maintenance Plan in place. The following overall conclusions can be drawn from the work:
- The Scoping phase was integral to the project's success. The more that can be discovered, tested and verified at that stage the better.
- It is worth taking the time early in the Scoping phase to construct control spreadsheets which will support both the planning activity and the subsequent implementation actions; and which will facilitate progress assessment.
- The Templates were very useful in getting the Scoping and Planning phases off to quick starts, and in providing direction for the work.
- Although a Preservation Maintenance Plan has been constructed, the validity of that particular Template still needs testing in a follow-on preservation exercise.
- Private owners of personal collections with little knowledge of professional preservation tools and techniques can still make a success of addressing preservation issues. However, they would be well advised to seek guidance from Preservation professionals to help them in the work.
If you want to read the full experience of this project, head over to Paul WIlson's website to check it out.