Digitizing Union College’s Alumni Files: Scanning and Student Engagement

By India Spartz

As you can imagine, when an undergraduate liberal arts institution like Union College is more than 200 years old, it generates and receives an astonishing amount of information about its alumni: newspaper clippings, notes, letters, and ephemera related to centuries of graduates.

In an attempt to maintain active relationships with its alumni, Union’s College Relations department established a series of alumni files. Several decades ago, files dating from 1795 to 1929 were transferred to the Schaffer Library’s Special Collections and Archives department for research purposes and safekeeping (the materials are currently stored in a secured climate controlled environment).

The files are a rich and valuable resource for documenting the history of Union College, and Special Collections receives frequent requests from students, faculty, and researchers who seek information pertaining to its alumni.

When I arrived as head of Special Collections & Archives in 2014, the alumni files were stored on shelves in the library’s basement. This arrangement meant that student workers and library staff descended three flights of stairs to pull files. Aside from the amount of time it took to retrieve and return the files, frequent handling put the materials at risk for damage. I immediately recognized the need to relocate the alumni files into the Special Collections & Archives department and began developing a way to scan the contents of each folder for long term preservation and access.

Union College is like most small undergraduate liberal arts institutions: we currently lack a sophisticated digital lab and have limited resources for outsourcing large digital scanning projects. So I determined that the best way to tackle scanning hundreds of alumni files would be to utilize student workers. In 2015, through a generous bequest, the library acquired a Zeutschel OS12002 overhead scanner. The Zeutschel makes it possible to customize scanning settings, which allowed us to streamline the digitization process. This feature also allows students to be quickly trained to scan the alumni files using basic Photoshop commands.

Screen Shot 2017-12-14 at 8.37.00 AM
Explanation of and example of file naming structure employed throughout the project.

In order to capture the necessary metadata, Special Collections & Archives collaborated with the Technical Services department to create protocols for assigning numbers for each item within a folder. The metadata naming schema uses the surname and first initial of the alum (as shown in image at right).

Training
The Technical Services department trains students on how to use the Zeutschel. The Special Collections & Archives staff also train students on handling and handwriting the metadata (using a #2 pencil) on each document. More experienced students have also acted as trainers by teaching each other how to scan and code the documents. So far, the students have successfully scanned folders dating from 1795 to 1866.

Challenges

  • It turns out that the more recent folders require additional time to complete, as they contain more information. This has slowed the pace of scanning considerably.
  • As students graduate, it’s imperative that the Special Collections & Archives department hire new student workers each year and provide training in order to continue the project.
  • Folders post-1900 are likely to contain confidential information since the College Relations department began sending questionnaires to alumni. These questionnaires requested information about their personal lives (names and birthdates of children, home addresses, etc.). Therefore, it may be necessary to restrict post-1900 alumni files because of confidentiality concerns.
  • As of this writing, scanned alumni files are only accessible via an in-house server. Special Collections & Archives staff are authorized to reference these files for researchers. Patrons may receive copies of the digitized files upon request. The Schaffer Library recently launched an institutional repository called Union Digital Works. This platform will likely serve as the online access point for digitized alumni files. Once online, researchers will have full access to the pre-1900 alumni files via the internet.

 

Student engagement
While the coding and scanning process can be tedious, students have surprisingly embraced these tasks. More than once, they have commented that coding and digitizing the documents provides an opportunity to listen to music on their headphones, relax, and escape their rigorous schedules.

Anouk, a senior majoring in environmental sciences, has worked on the project for two years. I asked her what she likes about coding and scanning the documents. She responded that she felt a sense of ownership of the project and commented that the project has taught her a great deal about the history of Union College and the importance of preserving historical documents for the future.

Quality assurance (QA)
The student work is overseen and checked by an Archives Assistant. This includes retrieving the saved digital files and spot checking everyone tenth one. Should a digital file need to be corrected or modified, the Archives Assistant consults with the student to fix the problem.  Once QA is complete, the finished digital files are stored on a secured “Library dark archive” or L: drive while access copies are made available to authorized staff via the “Library preservation” or M: drive.

Preserving the originals
After the folders are scanned and quality assurance is complete, they are filed in acid-free boxes for long-term storage. At the beginning of the project, we decided to keep the original manilla folders rather than re-house the documents into acid-free folders due to time and funding constraints.

Timeline and follow up
Because scanning the alumni files is an in-house project, there is no set timeline for completion. However, the department dedicates several students each year to scan the files. The post-1929 alumni files continue to be stored at College Relations. As this phase of the scanning project moves toward completion, efforts will be made to reach out and acquire the more recent alumni files for scanning and long-term storage.

Conclusion
While this project not only preserves the alumni files, it has also made it possible to eventually make the alumni files virtually accessible to researchers worldwide. By incorporating student workers into the process, the project has been able to move forward because of their enthusiasm and dedication. The students have also embraced the opportunity to work with historical documents and learn about Union College’s unique history while acquiring new technical skills that will serve them in the future.


India Spartz is the Head of Special Collections and Archives at Union College in Schenectady, NY. She holds an B.A. from the University of Alaska (her home state), MLIS from UC Berkeley, and M.A. in Museum Studies from the Institute of Archaeology, University College London. She’s a member of the Academy of Certified Archivists and serves on SAA’s College & University Archives Steering Committee.

Advertisement

Assessing Digital Asset Management Tools at Texas A&M University

By Greg Bailey

In January 2014 I started my position at Texas A&M University with Cushing Memorial Library and Archives, which holds the University Archives and Special Collections at A&M.  Our only digital presence consisted of a Flickr account hosting items from the University Archives and some items from Special Collections that were put into the Institutional Repository (OAK Trust).  Eight months after I started a new Associate Dean of Special Collections and Director of Cushing Library was hired.  The new director and I started to voice our opinion that we needed to increase our presence on the web, but also have a system to handle both digitized and born digital materials. In time the Dean of the Libraries organized a retreat for interested parties and out of that a task force was formed to investigate Digital Asset Management (DAM) tools and to come up with a recommendation for implementation.

In the fall of 2014 the task force was established with the objective of investigating and making recommendations for a solution or solutions that would enable the Texas A&M University Libraries to store, display, and preserve digitized and born digital university records and research.  In the spring of 2015, the charge expanded to include attention to broader campus needs.

After defining an assessment process and expanding our scope to include campus, the task force first worked to conduct a campus needs assessment, to identify and develop use cases, and to distill core requirements. This became the basis of our testing rubrics. We ran multiple stages of assessment to identify and test systems, as well as to analyze the results of those tests. A recommendation was reached on the basis of this analysis and further inquiries.

Our analysis of twenty-six systems allowed us to confidently assert that no one digital asset management product would meet library and campus needs. Broadly, “digital asset management consists of management tasks and decisions surrounding the ingestion, annotation, cataloguing, storage, retrieval, and distribution” of image, multimedia, and text files.[1] These tasks are performed by systems (DAMS) that differ in their approach to functions and range of associated capabilities. Given campus needs, and our experience as a leading developer with DSpace, which the Libraries uses as our IR, the task force was attuned to the particular importance of the data models embedded in these systems, which guide and constrain other functionality.

DAME
Digital Asset Management Ecosystem model. Image created by Jeremy Huff, Senior Software Applications Developer for the TAMU Libraries.

We were convinced that modular solutions to discrete needs for storing, displaying, and preserving digital assets are warranted, and that these solutions are likely to require customization. We recommended building a digital asset management ecosystem (DAME) rather than attempting to meet all needs with a single DAMS.

The choice of the word ecosystem, as opposed to “system” (as with a DAMS) is explained by the DAME’s emphasis on a distributed service architecture. This is an architecture in which the discrete roles of a DAMS are handled not by one application, but instead by a collection of applications, each one suited for the role it plays. The DAME’s structure will certainly vary from institution to institution, and in fact this flexibility is perhaps the DAME’s strongest quality. In general, a DAME’s ecosystem will be divided into the following layers:

  • Management
  • Persistence
  • Presentation
  • Authorization
  • File service
  • Storage
  • Preservation

In the DAME, the management layer is conceived of as a collection of web services that handle record creation, curation, and discovery. It does not, itself, handle the actual assets, but instead records the assets’ location and metadata, and allows for the management and retrieval of this information. The management layer should be comprised of at least two elements, the first being a custom web service and the second a repository with a fully featured application profile interface (API). The repository application can be one of the many popular DAMS solutions that are currently in use, the only requirement being that it exposes all desired functionality through an API.

It may seem that a repository with a fully featured API would be sufficient to satisfy the needs of a management layer, but there are several good reasons for including a custom web service in this layer. The first reason is that this web service will act as an interface for all communication with the management layer, and by so doing, the DAME is repository agnostic. All other applications in the ecosystem will be programmed against the consistent API of the custom service, and the job of interfacing with the repository’s API is left solely to the custom web service. If the decision is made to switch repositories, the only thing that needs to be updated in the DAME will be the custom web service, and the rest of the ecosystem will not realize the change took place. The second reason for this separation is it allows you to employ multiple repository solutions side-by-side, with the web service aggregating responses. Finally, in record retrieval, the  and authentication of the user can be handled by the custom web service, relieving the repository of any need to be compatible with the institution’s authentication and authorization strategy.

This management layer thus communicates with the persistence layer, which is not, by necessity, one of the more complicated portions of the DAME’s architecture. It is simply the data source, or collection of data sources, needed to support the repository. Most repositories that would work well in the DAME are likely to have varied options when it comes to persistence, making the persistence layer one of the more flexible aspects of the DAME. In general this layer will store the assets’ URI, metadata, and possibly even application-specific information needed by the presentation layer.

The preservation layer, which had already been under development would continue and integrated into the new system.  A processing layer would be connected to local redundant storage.  That local storage would be also connected to dark archives storage and rarely accessed.

Every system that we tested consisted of different tools and components, bundled together as a single system. Part of the argument for a DAME over a DAMS is the ability to determine the components in these bundles locally, and to swap them out to meet evolving needs.

With that in mind the task forced recommended the deployment of modular digital asset management components to meet the complex needs of the Texas A&M University Libraries and campus. These include:

  • The deployment of a system to manage and store digital assets and metadata. Our recommended open-source system is Fedora 4, to be coupled with Blacklight and Solr for search and retrieval. Solr indexes content managed by the repository, and Blacklight enables search and retrieval across the indexed content.
  • The development of custom user interfaces as appropriate (likely, public user interface and administrative interfaces).
  • The deployment of a triple store to enable linked data, along with Apache Camel and Fuseki as the basis of connecting Fedora to the triple store and to Solr indexing software.
  • The deployment of an exhibition system.  Our recommended open-source exhibition layer would be Spotlight, which is an extension to Blacklight and will easily integrate into our DAME.
  • The deployment of a preservation system that would consist of Artefactual’s Archivematica that connects to localized redundant storage.  Redundant storage it connected to dark archive of the Digital Preservation Network (DPN) and Amazon’s Glacier via Duracloud.

The development of the ecosystem has started.  The Libraries’ IT team has started working on bringing up Fedora 4, along with the other components recommended by the task force.  As mentioned above the preservation layer had already been in development, and the final kinks are being worked out in that part of the system.  The hope is that the ecosystem will be fully functional within a year.

Overall, the work of the task force was beneficial.  We had input from a number of stakeholders that brought forward desired functionality that one specific group of users might not have considered.  There was a very strong presence on the Task Force representing the Special Collections, but also our preservation unit which had very similar ideas have groups that are regularly working together. The addition of subject/reference librarians and cataloging and the expertise of the Digital Initiatives group (Library IT) brought yet other perspectives. Having some university representatives also gave us an idea what units around Texas A&M require when dealing with digital materials.  The task force had sent out surveys to a number of units on campus and we were able to gather a larger amount of useful info.  At a minimum I now know of some units that have large amounts of electronic files that we will have to prepare for in the near future as we bring up the DAME and continue to develop our digital archiving process at Texas A&M.  In the end this diverse group with expertise in a number of areas allowed us to test a large number of software solutions.  We were able to robustly test the functionality of these solutions and we were able to collect data on strengths and weaknesses of the different softwares.  The solution of a DAME built off of Fedora 4 and bringing in a number of other open source solutions might not work for other institutes as we are heavily reliant on the expertise of our IT to bring all of these components together, but the process of creating a task force for a diverse group (including those outside the library) was beneficial.  We now have buy-in that had not existed before from multiple units in the library and interests from outside the Libraries, specifically in the area of materials related to the University Archives.


Greg Bailey is the University Archivist at Texas A&M University, a position he has help since January 2014.  Prior to that, he served at the University Archivist and Records Manager at Stephen F. Austin State University.  He is currently a member of the College and University Archives Section’s Steering Committee.  

[1] https://en.wikipedia.org/wiki/Digital_asset_management