This section provides guidance on the information gathering methods you might use to gather the data for your DAR. Use this to find out how to gather the necessary information to populate your DAR.
It is important to carefully plan out how you will gather information for your DAR before beginning the process. This will help you to ensure you have considered all of the key factors:
-
The scope of the content you are hoping to gather information on.
-
The time, resources, and tools you have available to help with the information gathering, including the people who will be undertaking the task and the skills they have.
-
The sources (people, documentation, and systems) you will gather information from.
It is essential that you are realistic about the amount of information you will be able to gather and that you ensure you have the focus time needed to dedicate to this task. Without a clear and realistic plan your DAR may become a half-finished resource that is never fit for purpose. Do remember that capturing something simple is better than nothing and you should not get caught up in worrying about it being perfect from the outset. Figure out the essential information fields and populate those first. These can be built out over time with more data elements as you have time and/or they are needed.
You may also consider a federated approach to capturing information and updating the DAR. In this approach you might be responsible for setting up the structure of it, and managing and monitoring its use, but responsibilities for information updates sit with data owners or creators. This approach can be helpful with advocacy and engagement or if you are capturing information for a large organization and/or with a wide scope. It does, however, also mean that you are beholden to the data owners, their willingness to complete the DAR, and the quality of data they produce. Before using such an approach, consider what level of buy-in there is for the process, whether colleagues can be relied upon to keep it up to date. Including DAR updates in process documentation, putting out regular reminders, and carrying out checks can all help with ensuring the necessary data is added. Some practitioners have also found environmental sustainability (not keeping unnecessary data) to be a good motivator to encourage participation. You may also need to consider what the consequences might be if colleagues do not keep the data up to date, in particular, the impact it might have on collections management capabilities.
We will now look in a little more detail at some of the key data sources and methods for capturing data for your DAR in the following sub-sections:
Existing Data Sources
No matter the scope of your DAR, you will almost certainly look to gather information from the range of existing systems and documentation maintained by your organization. Data sources may include:
-
Accession/acquisition records and paperwork
-
Documentation provided by depositors
-
Records in cataloguing systems
-
Reports from Digital Asset Management or Digital Preservation Repository systems
Before beginning information gathering it can be useful to create a list of these relevant systems and documentation maintained by your organization, the types of information they may contain, and the format of that information. This can then be used to plan how to prioritize gathering information and what you will look for where. For example:
-
Accession records and acquisition paperwork may be the best place to find information on ownership and intellectual property rights;
-
Descriptions of the digital content can potentially be harvested from cataloguing systems; and,
-
A repository report might include details of the number of files included and their total size.
Physical Storage Media
Depending on the processing status of the content held by your organization, you may need to carry-out a survey of unprocessed external/legacy storage media. The extent of this survey will depend on the time and resources you have available. Some issues to consider are:
-
Is there a list, or can a list be generated, of all of the accessions/digital content containing such storage media?
-
Where is the media stored? Has it all been collected together and is easily accessible, or is it stored amongst physical content, and therefore might be difficult to locate?
-
Is there any documentation that lists what is on the media?
-
What types of media are there? Will you require technical help, new hardware, or to consider outsourcing data access and copying?
You will also need to consider carefully if you will simply be viewing the content to gather information, or do you have the time and resources to undertake copying of the content onto more secure storage ready for processing? This may significantly increase the time required for the task, but overall, it will be a more efficient use of resources, be less risky than multiple uses of the media, and mean the content will likely be on more stable and secure storage.
Ideally, surveying and processing digital content held on physical media would be a priority due to the high level of risk associated with these storage formats. But in reality, you may find that carrying out such a survey is not feasible as part of your initial information gathering, due to time available, the complexity of the media, and how it is stored. If this is the case, it is recommended that you consider addressing physical storage media as a possible next step or follow-on project, to ensure your DAR is comprehensive and that you have mitigated the high level of risk to content stored on this type of media.
Characterization Tools
Characterization is the process of identification and description of what a file is and its defining technical characteristics, such as file format, size, and the software used to create it. There are a number of characterization tools that can be used to generate this information for a group of files held in one or more folders. These are, therefore, particularly useful for capturing summary information such as “Number of Files” and “Total Size” for your DAR, without having to manually count and calculate everything.
A characterization tool can generate this information for small amounts of digital content in seconds, although longer processing times might be required for larger amounts of digital content. Indeed, some tools may struggle with particularly large-scale digital content. In this case, you will need a computer with sufficient processing power and/or you may need to analyze the content in sections.
You can find more information on the range of characterization tools that are available via the COPTR tools registry and some of the commonly used tools are included in the Useful Resources section below. Some tools are accessed via a user interface like the software and apps you will be familiar with using, while others can only be run using the Command Line interface. The free training course Novice to Know-How also includes more detailed information on understanding digital files and characterization, as well as providing demos of a tool called DROID. It is possible to run DROID from a user interface as well as from the Command Line. The DROID tool demos are also included in the Digital Preservation Handbook.
Survey
If your DAR's scope is relatively wide, for example if you are aiming to include details of digital content at your organization still in active or semi-active use, you may find it helpful to capture information for your DAR by circulating a survey. The benefits of using a survey are that you can gather information for a large group of people quickly, you may capture details of content that you were unaware of, and, if the survey is well-designed, the data captured can be in easily processable formats that might translate directly to DAR fields. The downsides of using surveys are that you must rely on the goodwill of others to ensure the right people complete the survey, and the data may vary in quality.
If you do plan to use a staff survey to gather information, you should consider the following:
-
Does your organization subscribe and provide access to a specialist survey tool, such as Qualtrics, SurveyMonkey, or TypeForm? These can provide useful functionality for creating, structuring, and analyzing the data from your survey. If these are not available, you may consider using a free service like Google Forms.
-
Make sure to carefully consider the questions you add to your survey. Aim for clear, jargon-free questions and an emphasis on easily processable data where possible. For example, use answers with check boxes, pick lists, or data validation to ensure data is entered in a consistent way.
-
Limit the number of questions that allow text-based answers. The information gathered is more likely to be of variable quality and will take significantly longer to process.
-
Consider how you will communicate about the survey. Will you distribute it via email, an intranet site, and/or a newsletter? Are there particular members of staff you should send targeted messages to? How many reminders should you send? Would offering accompanying information sessions help to increase engagement? Do you need to work with managers to have them encourage staff to complete the survey?
Finally, it is important not to underestimate the amount of time you may need to spend analyzing the data captured from a survey before it can be included in your DAR. As mentioned above, text-based answers can take a significant amount of time to process, and even more straight-forward quantitative data will likely require some quality checking to make sure it is fit for purpose.
Interviews
Interviews are another potential information gathering approach that can be helpful for building your DAR. They can be particularly useful when information gathering is being carried out by an individual or small team, but the content is managed and/or created by a larger number of people. Interviews will provide an opportunity for an in-depth dive into the knowledge an individual may have amassed, but perhaps not documented, about content they manage or have created. For example, they are very good at unearthing details about content hidden away in a forgotten corner of the staff shared storage, that might otherwise have been missed. Interviews are, however, very labor intensive to plan, stage, and analyze, so it is important to carefully consider if you have the time and resources available to include them as part of your information gathering process.
If you do plan to include interviews, then it is useful to consider the following:
-
What questions do you need answered? You will find that you are able to cover fewer questions than you might think in a session.
-
Are your questions clear and jargon-free? Is there any specialist terminology that you need to include and will have to explain?
-
What type of interview format will work better? Structured, semi-structured, or unstructured?
-
What information do you need to provide to the interviewee(s) ahead of time? A participant information sheet with a description of what will happen during and after the interview is useful. This can include venue information, if you wish to record the interview, and the list of questions.
-
Is there a suitable venue available for the interview(s)? This can be in-person or virtual. It should be somewhere that both you and the interviewee feel comfortable.
-
Make sure you are well prepared for the interview; with all of the documents you require and any note-taking tools or recording equipment ready.
An example set of interview questions to help with building a DAR are included in this toolkit. You can use this list as a starting point for developing a question set that is suitable for your own needs, matching the DAR template you have designed. Also, as with surveys, make sure you have adequate time available to analyze the outcomes of the interviews. Extracting the information you need and formatting for inclusion in your DAR might take a significant amount of time.