Stanford Digital Repository ingests 125,000 Google-scanned books and other good news

It’s Preservation Week 2011 and the Stanford Digital Repository (SDR) -- SULAIR’s system for preserving digital content -- has plenty to celebrate.

As of Sunday, April 17, the repository system has taken in over 125,000 digital objects produced from SULAIR collections as part of the Google Book project. Google has been scanning books from Stanford’s shelves since 2006. The ingested objects all represent published works that are in the public domain.

One digital object is equivalent to one bound volume, and typically includes hundreds of individual digital files: an image file for each scanned page, a text file generated from OCR processing of the page image, and metadata files. In total, the 125,000 scanned volumes add up to over 50 million files and 15.5 Terabytes of digital information.

The SDR preserves the files by storing them in a secure, managed digital storage environment. Multiple copies of the files are created and stored in different locations in order to protect against “bit rot”, hardware failure, storage media failure, and other eventualities that can inhibit access to or alter the original information in the file. Unlike other content managed by the SDR, the Google-scanned files are not made available to library users by Stanford, because the files are readily available online at Google.

In addition to reaching this major milestone, the SDR has other reasons to celebrate Preservation Week. Currently Digital Library Systems and Services is undertaking two major development projects to design and deploy web-based interfaces that will facilitate the deposit of other streams of digital content into the SDR repository environment:

  • Born-digital content in Special Collections and University Archives: A system to support the stewardship of files originally stored on hard drives, floppy disks and other hand-held media within archival collections is one of Stanford’s contributions to the AIMS collaboration;
  • Works produced by the Stanford community: Scholarly articles, data sets, academic papers, presentations, and other output of research, teaching, and learning at Stanford have enduring value to the institution and the broader academic community. The new deposit interface, like SULAIR’s Electronic Thesis and Dissertation system, will provide the Stanford faculty, staff and students with a means to preserve and provide access to the fruits of their work. Search and browse functionality for these works will be similar to SearchWorks.

Both of these systems are scheduled for release in 2011. By Preservation Week 2012, SULAIR will have made significant gains in its capacity to meet the demands for digital preservation services at Stanford.

