Beacon Street Diary blog

Of Metadata and Increasing Visibility

By Zachary Bodnar, Archivist

Note: Right-click and "view image" will allow you to see the full sized versions of images used within this blog post.

As you are aware, the CLA’s staff has been working from home for the better part of three months now. But while the staff has been unable to handle physical collections during that time, they certainly have not been slacking. Documentation, policy development, digital processing, professional development and virtual conference attendance, exhibit planning, and fact-finding missions have all been happening while we are away from our office desks. In my case, a significant portion of my time has been spent on cleaning up the metadata of archival collections for the purpose of introducing our materials to the widest audience possible.

Metadata, or information about an object, is the bread and butter of the library and archives professions. The title of a book, name of an author, and publication date are all examples of descriptive metadata. Librarians and archivists gather and document this metadata entirely for the sake of our users. By documenting this metadata, we make an object, whether it be a published book or an unpublished volume of church meeting minutes, findable by the various systems employed in our professions, such as an online catalog. On the archival side, this purveyor of easily digestible and browsable metadata is the finding aid, though you might be surprised by how many different versions of a finding aid exist side-by-side.

For most users, the finding aid is the piece of paper they look at when deciding which boxes and folders within a collection they are interested in leafing through. But in fact, the CLA’s archivists produces four different versions of the finding aid. One version of the finding aid exists in a cloud-based platform which serves as the single source of knowledge for every single archival collection held by the CLA. One version serves as the user’s browsable version and is indexed by google. Another is placed within the CLA’s online catalog. And a final version of the finding aid is uploaded to GitHub for external data harvesting. While each version of the finding aid is distinct, each furthers our goal of increasing the visibility of our materials and ensuring the widest possible audience can find our collections.

All metadata about an archival collection is stored in the CLA’s cloud instance of ArchivesSpace. ArchivesSpace is, in effect, the standard archival management tool used by archivists in the United States today. Through the ArchivesSpace backend interface, staff can record nearly endless amounts of information about a collection. But more practically, it is the tool that allows the archival staff to describe and arrange a collection. Description refers to the process of assigning descriptive metadata to the collection while arrangement refers to the process of assigning an intellectual order to the physical materials within a collection. By processing a collection and inputting all of our gathered data into ArchivesSpace, we create the single source of truth (an Orwellian sounding term, drawn from the information sciences fields, that simply means the single source of editable data from which all other instances of the same data are derived) from which we create all the other versions of the finding aid.

The next version of the finding aid is the one most recognizable by our users. It is the paper version of the finding aid that can be found at the reference desk. This is the version intended for human eyes and is therefore the easiest to read and understand. Before it is printed though, this finding aid exists as a PDF derivative of every piece of public metadata that is input into ArchivesSpace. The PDF is uploaded to the CLA’s website and is searchable from there under the “Electronic Finding Aids” header. Uploading the PDF also allows for the PDF to become indexed by google which vastly improves a collection’s visibility to the wider internet world.

The next version of the finding aid that we produce is a MARC record which is ingested into the CLA’s online catalog. MARC is one of the oldest metadata standards used by librarians and the basis upon which nearly every library catalog is built upon. The MARC version of the finding aid is actually a stripped down version that focuses solely on the top level metadata associated with the whole collection, such as the title of the collection, the collection’s creator(s), and subject headings associated with the collection. Fortunately, you never see the raw MARC metadata; the catalog interprets that MARC file and displays it in a way that is familiar to all our users. We produce this version of the finding aid so that archival collections may be found alongside print materials within the catalog. This makes the online catalog the CLA’s single destination to search everything the CLA holds. This also ensures that our archival collections are automatically linked to related resources through linked metadata, such as subject headings.

The last version of the finding aid is one which you have probably never seen. This is the EAD version of the finding aid which is stored on the CLA’s GitHub. EAD is an XML based international archival metadata standard. Like MARC, EAD is not actually intended for human eyes; EAD is intended to be read by machine systems that interpret the data stored within the XML file. The CLA stores these files in GitHub so that they may be harvested by archival aggregators such as ArchivesGrid. These aggregator sites are another way for the CLA to vastly improve the findability of our materials by placing it within systems with vastly wider user bases.

Which brings me all the way back to my metadata cleanup project. In 2019 the CLA converted from producing EAD2 documents to EAD3 documents. Collections processed prior to that were therefore instantly left out of our EAD3 offerings on GitHub which means that harvesters such as ArchivesGrid would never see these older collections. Over time we have been able to go back and convert some of them, but prior to the pandemic, there were still more than 80 collections that needed the necessary metadata cleanup to ready these collections for the eventual creation of EAD3 finding aids. While the pandemic has halted our ability to process new archival collections, it has given me the time to shed even more light on these collections processed prior to 2019 and I can now say that the number of collections needing cleanup is in the single digits.

The library and archival fields are always trying to improve access to collections. Most visibly this happens when we archivists describe a collection and produce a finding aid for it. But as I hope this blog post has shown, there are significantly less visible ways in which we create access. And we are always looking towards the future for new ways to increase access and findability and ensure that everyone who might wish to look at our materials can find our materials.