Major progress on AV media accessioning

March 29, 2020
Hannah Frost
Chart of accessioned media measured in TB deposited over time

TL;DR: The Stanford Media Preservation Lab (SMPL) and the Stanford Digital Repository (SDR) have together reached a major milestone with over half a petabyte of preserved AV media content accessioned in the repository. This summer, SMPL expects to complete working through the backlog of digital audio and video files accumulated over the past decade. 

The Stanford Media Preservation Lab (SMPL)  -- the arm of DLSS Digitization Services focused on archival reformatting of SUL’s sound and moving image collections -- has been digitizing AV media materials for over a decade, and in the course of this work, a great number of large digital files have been generated. To give you a sense for the volume: one hour of analog video captured digitally for preservation (10-bit uncompressed) is 100 GB, and one hour of analog audio is 4 GB (24-bit, 96 kHz uncompressed). Since 2009, we have preserved over 26,000 hours of content from physical media: 20,000+ audio items; 6,000+ video items; 156 films; and thousands of born-digital media items.

During the same time the SMPL team was busy digitizing, the Stanford Digital Repository was under active, steady development, but for much of this time, the SDR’s content accessioning capabilities and processes were not yet geared towards media content, and the preservation storage systems had limited capacity. As a result, SMPL has been accumulating an accessioning backlog.

However, with the recent development of new SDR accessioning tooling (most noteably the pre-assembly app) and significant expansions in storage, this picture is changing. The chart above illustrates the progress. While in 2015 a total of 22 TB of media content was accessioned, the following year that number more than doubled (48 TB accessioned), and then in 2017 a whopping 138 TB were added to the SDR for a cumulative total of 228 TB. With 248 TB added in the following two years, the SDR contained 476 TB at the end of 2019. Another 60 TB was accessionned in this Winter 2020 quarter, bringing the cumulative total to 541 TB stored today. (Fun fact: this amount of content translates to over 860 JIRA tickets in the LEGACY queue.) Out of the 375 collections we’ve touched, 43% of this digital footprint is taken up by the following seven collections:

This progress is important, because it ensures that our “preservation copies” of valuable, at-risk content are protected from corruption and loss, and more material can be discovered and accessed online by researchers. At this current rate, SMPL expects to have completely worked through its accessioning backlog in summer 2020!


- By Hannah Frost and Geoff Willard